Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip clustering if k > n_topics #65

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

scottgigante-immunai
Copy link
Contributor

I get a ValueError if the number of topics after density filtering is less than the number of desired components.

Traceback (most recent call last):
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/cnmf/cnmf.py", line 727, in consensus
    kmeans_model.fit(l2_spectra)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py", line 1426, in fit
    self._check_params_vs_input(X)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py", line 1362, in _check_params_vs_input
    super()._check_params_vs_input(X, default_n_init=10)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/cluster/_kmeans.py", line 859, in _check_params_vs_input
    raise ValueError(
ValueError: n_samples=19 should be >= n_clusters=41.

This PR solves that issue.

I get a `ValueError` if the number of topics after density filtering is less than the number of desired components.
If a single component is all zeroes (which can be the case now that k too large doesn't throw an exception), this introduces a div0 which turns the results into NaNs.
@scottgigante-immunai
Copy link
Contributor Author

@dylkot would you please take a look and consider merging this PR?

If density_threshold is too low, a very confusing error is raised:

```
Traceback (most recent call last):
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 192, in wrapper
    return func(*args, **kwargs)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/decomposition/_nmf.py", line 1106, in non_negative_factorization
    est._validate_params()
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/base.py", line 600, in _validate_params
    validate_parameter_constraints(
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 97, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'n_components' parameter of NMF must be an int in the range [1, inf) or None. Got 0 instead.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/cnmf/cnmf.py", line 739, in consensus
    rf_usages = self.refit_usage(norm_counts.X, median_spectra)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/cnmf/cnmf.py", line 658, in refit_usage
    _, rf_usages = self._nmf(X, nmf_kwargs=refit_nmf_kwargs)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/cnmf/cnmf.py", line 551, in _nmf
    (usages, spectra, niter) = non_negative_factorization(X, **nmf_kwargs)
  File "/Users/scottgigante/envs/immunaISR/lib/python3.8/site-packages/sklearn/utils/_param_validation.py", line 203, in wrapper
    raise InvalidParameterError(msg) from e
sklearn.utils._param_validation.InvalidParameterError: The 'n_components' parameter of non_negative_factorization must be an int in the range [1, inf) or None. Got 0 instead.
```

This PR makes the error more comprehensible.
Raise error if zero components meet density threshold
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant