Property-Based Testing Transcript

This transcript shows how Claude identifies a bug in numpy.random.wald by inferring properties from documentation and tests, then using Hypothesis to find counterexamples.

✻ Thinking

Now let me use introspection to understand the numpy.random.wald function better - get its signature, docstring, and source code if available.

⏺ Tool Call: Bash

python3 -c "
                        import numpy…

⎿ Tool Output

=== SIGNATURE === (mean, scale, size=None) … +79 lines (ctrl+o to expand)

✻ Thinking

Good, I have the function signature and docstring. It takes mean and scale parameters (both must be > 0) and an optional size parameter. Let me explore the numpy random module structure to understand the implementation better.

⏺ Tool Call: Bash

ls -la
                        /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random/
                    

⎿ Tool Output

total 5488 drwxr-xr-x 34 redacted admin 1088 Jun 23 16:56 . drwxr-xr-x 52 redacted admin 1664 Jun 23 16:56 .. … +32 lines (ctrl+o to expand)

✻ Thinking

Let me check the init.py file to see how wald is exposed. I should also look into mtrand module which is likely where the legacy random functions are implemented.

⏺ Tool Call: Read

                        /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random/__init__.py
                    

⎿ Tool Output

Read 214 lines (ctrl+o to expand)

✻ Thinking

So the wald function is imported from mtrand (line 187). Let me check if I can find more information about the actual implementation. Let me also check for tests of the wald function.

⏺ Tool Call: Bash

python3 -c "
                        import numpy.random…

⎿ Tool Output

Function name: wald Module: numpy.random Type: <class 'method'> … +2 lines (ctrl+o to expand)

✻ Thinking

Let me look for tests to understand how the wald function is typically used and what properties are expected. This will help me propose properties based on evidence.

⏺ Tool Call: Search

pattern: "**/test*wald*.py", path:
                        "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random"
                    

⎿ Tool Output

Found 0 files

✻ Thinking

Let me look for general test files in the numpy random tests directory.

⏺ Tool Call: Bash

ls
                        /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random/tests/
                    

⎿ Tool Output

__init__.py __pycache__ data … +10 lines (ctrl+o to expand)

✻ Thinking

Let me search for "wald" in the test files to see how it's tested.

⏺ Tool Call: Search

pattern: "wald", path:
                        "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random/tests",
                        output_mode: "content"

⎿ Tool Output

Found 40 lines (ctrl+o to expand)

✻ Thinking

Good, I found existing tests for wald. Let me look at one of these test files to understand better what properties are being tested and what constraints exist.

⏺ Tool Call: Read

                        /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/numpy/random/tests/test_random.py
                    

⎿ Tool Output

Read 30 lines (ctrl+o to expand)

✻ Thinking

Now I understand the constraints better: 1. mean must be > 0 (lines 1446, 1452, 1454 show tests for ValueError when mean = 0) 2. scale must be > 0 (lines 1447, 1453, 1455 show tests for ValueError when scale <= 0)

It then proposes some properties based on this analysis:

✻ Thinking

Based on my understanding of the wald function, let me propose evidence-based properties: 1. All outputs should be positive: Since the Wald distribution produces positive values only, all samples should be > 0 2. Parameter validation: mean and scale must be > 0, otherwise ValueError should be raised 3. Shape property: Output shape should match the specified size parameter 4. Broadcasting property: When mean and scale are arrays, they should broadcast correctly 5. Deterministic with seed: Same seed should produce same results 6. No NaN/Inf in output: Valid inputs should never produce NaN or infinity values Let me now write tests for these properties.

⏺ Assistant

Based on my analysis, here are the evidence-based properties I've identified for numpy.random.wald: 1. All outputs must be positive - The Wald distribution only produces positive values 2. Parameter constraints - mean and scale must be > 0 (documented and tested in existing tests) 3. No NaN/Inf values - Valid inputs should produce finite values 4. Shape consistency - Output shape should match the size parameter 5. Broadcasting - mean and scale arrays should broadcast correctly

Next, it translates these properties to Hypothesis tests in a new Python file. For example, when writing the tests, it constrains the mean and scale parameters to be positive, as the documentation requires. When it executes the tests, they fail, and it then reflects on the failure:

✻ Thinking

Excellent! I found bugs. The tests reveal that numpy.random.wald can produce: 1. Zero values (should be positive only) 2. Negative values with extreme parameters (-0 in the large values test) Let me now move to bug triage to verify these are legitimate bugs and create minimal reproductions.

⏺ Assistant

Great! I found potential bugs. Let me now triage these failures to verify they're legitimate bugs.

The property it was attempting to test is that the Wald distribution should only return positive values, and, when executing the test with Hypothesis, found parameter values that return zero or negative values. It writes a minimal reproduction script to test the parameters that make this property fail, to ensure the bug is valid. The bug indeed reproduces, so it then writes a bug report.

✻ Thinking

Now let me create the bug report.

⏺ Tool Call: Write

bug_report_numpy_random_wald_2025-09-15_19-33_bjyl.md

The bug report it produced is:

📋 Bug Report: numpy.random.wald Produces Zero Values

Target: numpy.random.wald
Severity: High
Bug Type: Logic
Date: 2025-09-15

Summary

The numpy.random.wald function produces zero values when given certain valid parameter combinations, violating the mathematical property that the Wald distribution only produces positive values.

Property-Based Test

from hypothesis import given, strategies as st, settings
                            import numpy as np

                            positive_floats = st.floats(min_value=1e-10, max_value=1e6, allow_nan=False,
                            allow_infinity=False)

                            @given(
                            mean=positive_floats,
                            scale=positive_floats,
                            size=st.integers(min_value=1, max_value=1000)
                            )
                            @settings(max_examples=1000)
                            def test_wald_all_outputs_positive(mean, scale, size):
                            """Test that all Wald distribution samples are positive."""
                            samples = np.random.wald(mean, scale, size)
                            assert np.all(samples > 0), f"Found non-positive values: {samples[samples <= 0]}"

Failing input: mean=265150.0, scale=0.00390625, size=1

Reproducing the Bug

import numpy as np

                                    mean = 265150.0
                                    scale = 0.00390625

                                    np.random.seed(0)
                                    samples = np.random.wald(mean, scale, 10)
                                    print(f"Samples: {samples}")
                                    print(f"Contains zero: {np.any(samples == 0)}")

Output:

Samples: [0. 0.02444708 0. 0.00395104 0.36664443 0.02321238
                                    0.00691433 0.26385558 0.01975521 0.03506551]
                                    Contains zero: True

Why This Is A Bug

The Wald (inverse Gaussian) distribution is mathematically defined to produce only positive values. Its probability density function is:

P(x;mean,scale) = sqrt(scale/(2πx³)) *
                                    exp(-scale(x-mean)²/(2*mean²*x))

This PDF is only defined for x > 0. Producing zero values violates this fundamental property of the distribution and can cause downstream issues in applications that rely on this mathematical guarantee.

Fix

The issue likely occurs due to numerical underflow or improper handling of extreme parameter ratios. The implementation should ensure that:

Very small computed values are clamped to a small positive epsilon rather than zero
The algorithm handles extreme ratios of mean/scale more robustly

A potential fix would involve checking the output and replacing any zero or negative values with the smallest positive float value, or adjusting the internal algorithm to prevent underflow.

# In the wald implementation:
                                    - return computed_value
                                    + return max(computed_value, np.finfo(float).tiny)

Property-Based Testing: Finding Bugs in numpy.random.wald

📋 Bug Report: numpy.random.wald Produces Zero Values

Summary

Property-Based Test

Reproducing the Bug

Why This Is A Bug

Fix