-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Extend hypergeometric distribution PMF for non-integral arguments #1244
base: develop
Are you sure you want to change the base?
feat: Extend hypergeometric distribution PMF for non-integral arguments #1244
Conversation
Thanks for this @fpelliccioni! @steppi would interpolation be ok, or were you hoping for the definition to match that of evaluation in terms of @fpelliccioni it might solve some of the testing and policy issues to implement this using binomial coefficients for small arguments and (the existing?) Lanczos approximation for large arguments; e.g. see scipy/scipy#22312 (comment). Then reference data could be computed in Mathematica using |
Yeah, I was hoping this would evaluate for non-integral arguments. Interpolating with Hermite polynomials would actually work for our application in stats though, but is kind of arbitrary, and thus like you and @fpelliccioni said, there would be no natural reference implementation to compare to. I'm pretty sure the Lanczos code path in Boost's hypergeometric pmf implementation will just work out of the box for non-integral arguments. |
Could you do a quadrature over the entire domain and assert its close to unity? That would at least catch gross errors, or demonstrate one way or another whether the particular interpolator chosen is somehow "respecting" the structure of the PMF. Another question: You are using |
Before we get too carried away with interpolation, why not just use the binomial definition (and our existing code) over the real domain? That would seem to be more accurate, and probably rather more efficient as well. A more pressing (and perhaps difficult) question is what we want to have happen? Is there any precedence for evaluating the hypergeometric over a non-discrete domain? Or to put this another way: is anyone relying on the hypergeometric PMF generating an error when a non-integer is passed? |
@jzmaddock This request ultimately stems from scipy/scipy#22312 (comment). We asked to be able to evaluate the function at non-integer arguments (like some other Boost discrete disributions) so that we can approximate infinite sums involving the PMF in a monotonically decreasing summand with an integral. Any shape-preserving interpolation would work (so PCHIP might be preferred to avoid ringing), but the original suggestion (repeated here, #1244 (comment)) was indeed to use the binomial definition and the existing code where possible. |
Summary
This PR extends the hypergeometric distribution's PMF to support non-integer values of
x
using cubic Hermite interpolation. Ifx
is not an integer, the implementation selects at least three valid integer points for interpolation. If fewer than three points are available, it raises a domain error.Known Issue
This change introduces a dependency on
cubic_hermite
, which throws exceptions instead of using Boost.Math’spolicies::raise_domain_error
. As a result, some compile-time tests fail due to exceptions being disabled in those tests.Next Steps
cubic_hermite
(and potentially other interpolators) to conform to Boost.Math’s policy-based error handling.Question: Best way to test this?
Hypergeometric test data in
test/hypergeometric_test_data.ipp
appears to be auto-generated from Mathematica, with some values removed due to absolute vs relative error concerns.Since we now support non-integer values, what is the best approach to generate a new test dataset? Does anyone know of a reliable source for non-integer hypergeometric PMF values?
Fixes #1240