-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursive Feature Elimination RFE - Feature Request? #426
Comments
Do you have a good reason to believe this would be any faster if implemented in Julia? I would guess that most of the work in RFE is training the basic model (over and over). If that model is wrapped C code (the typical case in sklearn) then this ought to be close optimal, no? I ask because the expedient thing to do might be just wrap the python function, to start with. |
@ablaom thanks for your reply. RFE is a very powerful feature selection technique. Honestly, I find it to be the best thus far because it is considers the nature of your specific dataset, and it is perhaps the only selection technique that advise you on the number of features to use. Unfortunately, I dont have solid evidence that implementing RFE in Julia will be faster than the one in Python (sklearn), but I see that Julia is generally faster than Python, especially with processing that requires alot of loops. Meanwhile, I totally support the idea of wrapping the Python function. That sounds as a great start. |
See also JuliaAI/MLJModels.jl#314 |
Worth mentioning that Caret has a variant with resampling, in addition to a vanilla version that looks the same as the python one. We should start with the vanilla one. https://topepo.github.io/caret/recursive-feature-elimination.html |
RFE applies a supervised model that must report feature importances (could be coefficients, in case of a linear model) but there is not a uniform interface for this at present. We should also keep in mind that some models report several different types of scores. I've opened #747 to address this. In the meantime, a "getter" function (acting on the model's Also, maybe it's worth having an option to use Shapley values (eg https://github.com/nredell/ShapML.jl) ?? |
Yes please, the vanilla version is a good start. |
Some design notes to self or whoever ultimately picks this up: I propose this be a (Structually such a wrapper would look similar to other wrappers, like |
On second thoughts, for consistency with other feature reduction strategies, a wrapper is probably not advised. We should simply implement as an |
Hey, any updates on this topic? |
Yes it would be great to get help with this. I don't have any new thoughts on this, and actually haven't been thinking about this recently. Possibly @OkonSamuel, who worked of the feature importance API, may want to comment. Here is the feature importance API: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#Feature-importances . Let me know if you have any questions about it. My preference would be for new package, rather than a PR here. It could still be part of the standard MLJ.jl install, and be integrated in documentation. As I say above, RFE is probably implemented as For testing, here are supervised models reporting feature importances: julia> models() do m
m.reports_feature_importances
end
15-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
(name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
(name = CatBoostClassifier, package_name = CatBoost, ... )
(name = CatBoostRegressor, package_name = CatBoost, ... )
(name = DecisionTreeClassifier, package_name = DecisionTree, ... )
(name = DecisionTreeRegressor, package_name = DecisionTree, ... )
(name = EvoTreeClassifier, package_name = EvoTrees, ... )
(name = EvoTreeCount, package_name = EvoTrees, ... )
(name = EvoTreeGaussian, package_name = EvoTrees, ... )
(name = EvoTreeMLE, package_name = EvoTrees, ... )
(name = EvoTreeRegressor, package_name = EvoTrees, ... )
(name = RandomForestClassifier, package_name = DecisionTree, ... )
(name = RandomForestRegressor, package_name = DecisionTree, ... )
(name = XGBoostClassifier, package_name = XGBoost, ... )
(name = XGBoostCount, package_name = XGBoost, ... )
(name = XGBoostRegressor, package_name = XGBoost, ... ) A bunch of ScikitLearn models have also been added but not yet updated in the model registry. |
As @ablaom has pointed out the building blocks for supporting this has been added to the MLJ API. Ideally the approach would be to develop an external package that supports RFE, but we have just haven't gotten around to do this yet. This can be implemented as a wrapper model similar to the way we implement |
| This can be implemented as a wrapper model similar to the way we implement EnsembleModel. I think the right approach is to view this as an |
@vboussange I have a Master's student keen to take this on now. What is the status of your own efforts? Are you happy for him to take this on? He can start immediately. |
Hey @ablaom, this sank slowly down on my todo list - I have not done any progress. Please go ahead if you have some workforce! |
done. See FeatureSelection.jl model is RecursiveFeatureElimination |
Hello,
I am doing feature selection using Scikit Learn Recursive Feature Elimination RFE.
This algorithm takes ages on Python. I searched on Julia Observer and couldnt find any equivalent in Julia. Is it true Julia has no RFE implementation? And if so, can it be considered for feature request?
Thank you!
The text was updated successfully, but these errors were encountered: