You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing an issue with high RAM usage when running ExplainerDashboard for a VotingClassifier ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:
Model Setup
fromsklearn.ensembleimportVotingClassifierfromsklearn.linear_modelimportLogisticRegressionfromsklearn.pipelineimportPipelinefromsklearn.preprocessingimportStandardScaler# Other imports...# Define pipelines for individual models (example)lr_pipeline=Pipeline([...])
xgb_pipeline=Pipeline([...])
# Other pipelines...# VotingClassifier ensembleeclf=VotingClassifier(
estimators=[
('lr', lr_pipeline),
('xgb', xgb_pipeline),
# Other models...
],
voting='soft'
)
eclf.fit(X_train, y_train)
fromexplainerdashboardimportClassifierExplainer, ExplainerDashboard# Initialize the Explainer without target labelsexplainer=ClassifierExplainer(eclf, future_predict, shap='kernel', model_output='probability')
# Create and run the dashboarddashboard=ExplainerDashboard(explainer)
dashboard.run(port=8050)
ngrok_tunnel=ngrok.connect(8050)
print('Public URL:', ngrok_tunnel.public_url)
This is the output:
WARNING: For shap='kernel', shap interaction values can unfortunately not be calculated!
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')
Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
No y labels were passed to the Explainer, so setting model_summary=False...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...
/usr/local/lib/python3.10/dist-packages/dash/dash.py:538: UserWarning:
JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.
In just a few seconds of running the code use exceeds RAM availability and crashes.
The text was updated successfully, but these errors were encountered:
Hello,
I am experiencing an issue with high RAM usage when running ExplainerDashboard for a
VotingClassifier
ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:Model Setup
This is the output:
In just a few seconds of running the code use exceeds RAM availability and crashes.
The text was updated successfully, but these errors were encountered: