Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Extend hypergeometric distribution PMF for non-integral arguments #1244

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

fpelliccioni
Copy link

Summary

This PR extends the hypergeometric distribution's PMF to support non-integer values of x using cubic Hermite interpolation. If x is not an integer, the implementation selects at least three valid integer points for interpolation. If fewer than three points are available, it raises a domain error.

Known Issue

This change introduces a dependency on cubic_hermite, which throws exceptions instead of using Boost.Math’s policies::raise_domain_error. As a result, some compile-time tests fail due to exceptions being disabled in those tests.

Next Steps

  • A separate PR will update cubic_hermite (and potentially other interpolators) to conform to Boost.Math’s policy-based error handling.
  • This PR is left in draft so reviewers can provide feedback on the approach.

Question: Best way to test this?

Hypergeometric test data in test/hypergeometric_test_data.ipp appears to be auto-generated from Mathematica, with some values removed due to absolute vs relative error concerns.
Since we now support non-integer values, what is the best approach to generate a new test dataset? Does anyone know of a reliable source for non-integer hypergeometric PMF values?

Fixes #1240

@mdhaber
Copy link

mdhaber commented Feb 14, 2025

Thanks for this @fpelliccioni!

@steppi would interpolation be ok, or were you hoping for the definition to match that of evaluation in terms of binom?

@fpelliccioni it might solve some of the testing and policy issues to implement this using binomial coefficients for small arguments and (the existing?) Lanczos approximation for large arguments; e.g. see scipy/scipy#22312 (comment). Then reference data could be computed in Mathematica using Binomial.

@steppi
Copy link

steppi commented Feb 14, 2025

@steppi would interpolation be ok, or were you hoping for the definition to match that of evaluation in terms of binom?

Yeah, I was hoping this would evaluate

$$\frac{{r \choose k}{N -r \choose k}}{{N \choose n}}$$

for non-integral arguments. Interpolating with Hermite polynomials would actually work for our application in stats though, but is kind of arbitrary, and thus like you and @fpelliccioni said, there would be no natural reference implementation to compare to. I'm pretty sure the Lanczos code path in Boost's hypergeometric pmf implementation will just work out of the box for non-integral arguments.

@NAThompson
Copy link
Collaborator

Since we now support non-integer values, what is the best approach to generate a new test dataset? Does anyone know of a reliable source for non-integer hypergeometric PMF values?

Could you do a quadrature over the entire domain and assert its close to unity? That would at least catch gross errors, or demonstrate one way or another whether the particular interpolator chosen is somehow "respecting" the structure of the PMF.

Another question: You are using cubic_hermite so it appears you know how to recover derivatives of this at integer arguments. Can you recover multiple derivatives and use quintic_hermite?

@jzmaddock
Copy link
Collaborator

Before we get too carried away with interpolation, why not just use the binomial definition (and our existing code) over the real domain? That would seem to be more accurate, and probably rather more efficient as well.

A more pressing (and perhaps difficult) question is what we want to have happen? Is there any precedence for evaluating the hypergeometric over a non-discrete domain? Or to put this another way: is anyone relying on the hypergeometric PMF generating an error when a non-integer is passed?

@mdhaber
Copy link

mdhaber commented Feb 14, 2025

@jzmaddock This request ultimately stems from scipy/scipy#22312 (comment). We asked to be able to evaluate the function at non-integer arguments (like some other Boost discrete disributions) so that we can approximate infinite sums involving the PMF in a monotonically decreasing summand with an integral. Any shape-preserving interpolation would work (so PCHIP might be preferred to avoid ringing), but the original suggestion (repeated here, #1244 (comment)) was indeed to use the binomial definition and the existing code where possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend hypergeometric distribution PMF for non-integral arguments
5 participants