-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
646 Implement LIME for tabular data #661
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
We need a visualizer for the results. I created an issue for it #662. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Yang! I still have some requests like additional tests, and also something to discuss maybe: the necessity to pass the train data to the method.
Great work, also on the notebooks!
) | ||
return explainer.explain( | ||
model_or_function=model_or_function, | ||
input_tabular=input_tabular, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha input_tabular! I was already guessing what this new variable's name would be after the changes you made above
class LIMEOnTabular(TestCase): | ||
"""Suite of LIME tests for the tabular case.""" | ||
|
||
def test_lime_tabular_classification_correct_output_shape(self): | ||
"""Test the output of explainer.""" | ||
training_data = np.random.random((10, 2)) | ||
input_data = np.random.random(2) | ||
feature_names = ["feature_1", "feature_2"] | ||
explainer = LIMETabular(training_data, | ||
mode ='classification', | ||
feature_names=feature_names, | ||
class_names = ["class_1", "class_2"]) | ||
exp = explainer.explain( | ||
run_model, | ||
input_data, | ||
) | ||
assert len(exp[0]) == len(feature_names) | ||
|
||
def test_lime_tabular_regression_correct_output_shape(self): | ||
"""Test the output of explainer.""" | ||
training_data = np.random.random((10, 2)) | ||
input_data = np.random.random(2) | ||
feature_names = ["feature_1", "feature_2"] | ||
exp = dianna.explain_tabular(run_model, input_tabular=input_data, method='lime', | ||
mode ='regression', training_data = training_data, | ||
feature_names=feature_names, class_names=['class_1']) | ||
|
||
assert len(exp) == len(feature_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you're only testing for shapes. It would be nice to add a test with synthetic data+model with some stupidly simple pattern to check if the method can detect it. Just like we have for timeseries RISE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of adding a "naive test" for it. I thought about it. But then I was thinking that we added a test for timeseries mainly because we implement those methods ourselves (referring to some papers and methods like lime segmentation, for instance, but purely implementing those methods mostly from scratch).
This LIME tabular is entirely from LIME. We simply put a wrapper around it. It is somehow tested in the original implementation of LIME:
https://github.com/marcotcr/lime/blob/master/lime/tests/test_lime_tabular.py
But I do agree that it is nice to add a test for it. Maybe let's create an issue and design a simple test for it?? This PR already gets too big...I would prefer to do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #669 created! 😄
self.top_labels = len(class_names) | ||
|
||
self.explainer = LimeTabularExplainer( | ||
training_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to call LIME without training_data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments above.
dianna/methods/lime_tabular.py
Outdated
def explain( | ||
self, | ||
model_or_function, | ||
input_tabular, | ||
labels=(1, ), | ||
num_samples=5000, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add Typing to all public methods? Also for the return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, well spot! Thanks Chris! Just miss those stuff. I will add them.
tutorials/lime_tabular_penguin.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data_instance loading is done in an unexpected section. It would be better to do this in the data loading section above.
A bit lower, you're using the variable name 'exp' which is short but ambiguous. It would be better to call it 'explanation'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions! Done.
tutorials/lime_tabular_weather.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idem to my comments on the penguin notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏅 🧸 🏆 🦾 🎉
[heart] Elena Ranguelova reacted to your message:
…________________________________
From: Christiaan Meijer ***@***.***>
Sent: Thursday, December 7, 2023 2:23:00 PM
To: dianna-ai/dianna ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [dianna-ai/dianna] 646 Implement LIME for tabular data (PR #661)
@cwmeijer approved this pull request.
🏅 🧸 🏆 🦾 🎉
—
Reply to this email directly, view it on GitHub<#661 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAYYBWL7UUMEG7R5PR2IMY3YIHGMJAVCNFSM6AAAAAA7K4ES36VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONZQGI3TOMJUGY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Add LIME tabular to the methods and provide example notebooks for both regression and classification tasks:
Also fixes the inconsistent input variable names in init (https://github.com/dianna-ai/dianna/blob/add_lime_tabular/dianna/__init__.py).
Note that the method returns an array of importance scores for all input features (which means
num_features
is disabled). For more information about this decision, check this issue #665.