-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
why it’s not possible to use n_jobs = n, like in scikit-learn #830
Comments
To select from 700 features, this transformer will train 700 models multiplied by the cross-validation fold. So if you set cv to 5, it will train 700 x 5 models. That might be why it takes so long. LightGBMs are sometimes also slow to train, depending on the number of trees. If the lightGBM takes n_jobs, you should set it there. It's hard to say a priori if 2 hs is long or short because it will depend on the lightGBM, the size of your data and your available computing resources. If you send more details about how you set up the entire search, I might be able to provide some tips. Cheers |
Thanks for your response and help! 😊 I tried to add the n_jobs parameter, but it didn’t help. 😕 Here are the LightGBM model parameters I’m using:
|
Why do you use 45 as cv? That makes the selector train 700 x 45 models, which is what's making it take so long. I normally use 3 or 5. Feature-engine relies heavily on sklearn, so we leverage the |
cv = CombPurgedKFoldCVLocal( n_splits = 10 , n_test_splits= 3 , X_index = X.index ) have 45 combinations |
Interesting! Thank you. I haven't heard of that cross-validation framework before.
An alternative and similar search would be the Another alternative, is to set up a simpler LightGBM. If you check out the theory on successive halving in sklearn, you'll see that with simpler models, you can already find out what works best for the model. So you could train a lightGBM with less estimators and shallower depth, reduce the feature space from 700 to 20, and then increase the complexity of the model and finalize the set of features, if that makes sense. I hope this helps! |
Hello, I would like to ask why it’s not possible to use n_jobs = n, like in scikit-learn. I have to select (3-5) features from 700 features, and it takes 2 hours. :/ So, some research is quite hard and slow. 👍
Thanks
The text was updated successfully, but these errors were encountered: