feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

fpelliccioni · 2025-02-04T19:35:17Z

Summary

This PR extends the hypergeometric distribution's PMF to support non-integer values of x using cubic Hermite interpolation. If x is not an integer, the implementation selects at least three valid integer points for interpolation. If fewer than three points are available, it raises a domain error.

Known Issue

This change introduces a dependency on cubic_hermite, which throws exceptions instead of using Boost.Math’s policies::raise_domain_error. As a result, some compile-time tests fail due to exceptions being disabled in those tests.

Next Steps

A separate PR will update cubic_hermite (and potentially other interpolators) to conform to Boost.Math’s policy-based error handling.
This PR is left in draft so reviewers can provide feedback on the approach.

Question: Best way to test this?

Hypergeometric test data in test/hypergeometric_test_data.ipp appears to be auto-generated from Mathematica, with some values removed due to absolute vs relative error concerns.
Since we now support non-integer values, what is the best approach to generate a new test dataset? Does anyone know of a reliable source for non-integer hypergeometric PMF values?

Fixes #1240

mdhaber · 2025-02-14T07:36:41Z

Thanks for this @fpelliccioni!

@steppi would interpolation be ok, or were you hoping for the definition to match that of evaluation in terms of binom?

@fpelliccioni it might solve some of the testing and policy issues to implement this using binomial coefficients for small arguments and (the existing?) Lanczos approximation for large arguments; e.g. see scipy/scipy#22312 (comment). Then reference data could be computed in Mathematica using Binomial.

steppi · 2025-02-14T14:42:23Z

@steppi would interpolation be ok, or were you hoping for the definition to match that of evaluation in terms of binom?

Yeah, I was hoping this would evaluate

$$\frac{{r \choose k}{N -r \choose k}}{{N \choose n}}$$

for non-integral arguments. Interpolating with Hermite polynomials would actually work for our application in stats though, but is kind of arbitrary, and thus like you and @fpelliccioni said, there would be no natural reference implementation to compare to. I'm pretty sure the Lanczos code path in Boost's hypergeometric pmf implementation will just work out of the box for non-integral arguments.

NAThompson · 2025-02-14T16:46:49Z

Since we now support non-integer values, what is the best approach to generate a new test dataset? Does anyone know of a reliable source for non-integer hypergeometric PMF values?

Could you do a quadrature over the entire domain and assert its close to unity? That would at least catch gross errors, or demonstrate one way or another whether the particular interpolator chosen is somehow "respecting" the structure of the PMF.

Another question: You are using cubic_hermite so it appears you know how to recover derivatives of this at integer arguments. Can you recover multiple derivatives and use quintic_hermite?

jzmaddock · 2025-02-14T18:19:12Z

Before we get too carried away with interpolation, why not just use the binomial definition (and our existing code) over the real domain? That would seem to be more accurate, and probably rather more efficient as well.

A more pressing (and perhaps difficult) question is what we want to have happen? Is there any precedence for evaluating the hypergeometric over a non-discrete domain? Or to put this another way: is anyone relying on the hypergeometric PMF generating an error when a non-integer is passed?

mdhaber · 2025-02-14T18:26:46Z

@jzmaddock This request ultimately stems from scipy/scipy#22312 (comment). We asked to be able to evaluate the function at non-integer arguments (like some other Boost discrete disributions) so that we can approximate infinite sums involving the PMF in a monotonically decreasing summand with an integral. Any shape-preserving interpolation would work (so PCHIP might be preferred to avoid ringing), but the original suggestion (repeated here, #1244 (comment)) was indeed to use the binomial definition and the existing code where possible.

feat: Extend hypergeometric distribution PMF for non-integral arguments

a4fe870

fpelliccioni marked this pull request as draft February 4, 2025 19:35

fpelliccioni mentioned this pull request Feb 12, 2025

feat: use policy-based error handling in interpolators #1245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

fpelliccioni commented Feb 4, 2025

mdhaber commented Feb 14, 2025

steppi commented Feb 14, 2025 •

edited

Loading

NAThompson commented Feb 14, 2025

jzmaddock commented Feb 14, 2025

mdhaber commented Feb 14, 2025 •

edited

Loading

feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

Are you sure you want to change the base?

feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

Conversation

fpelliccioni commented Feb 4, 2025

Summary

Known Issue

Next Steps

Question: Best way to test this?

mdhaber commented Feb 14, 2025

steppi commented Feb 14, 2025 • edited Loading

NAThompson commented Feb 14, 2025

jzmaddock commented Feb 14, 2025

mdhaber commented Feb 14, 2025 • edited Loading

steppi commented Feb 14, 2025 •

edited

Loading

mdhaber commented Feb 14, 2025 •

edited

Loading