Add warning messages when full data-set is used #441

ArturoAmorQ · 2021-08-24T08:58:04Z

The full data-set (no train-test split or cv) is used for modeling in the following notebooks:

This has been a source of confusion (see for instance this forum question).

We should add a Warning message similar (but adapted to each case) to the one in logistic_regression_non_linear.py:

Warning: Be aware that we fit and will check the boundary decision of the classifier on the same dataset without splitting the dataset into a training set and a testing set. While this is a bad practice, we use it for the sake of simplicity to depict the model behavior. Always use cross-validation when you want to assess the generalization performance of a machine-learning model.

Additionally, a Warning message should be added in the following notebooks

where we remind the user that scoring the model in the full data-set is not necessarily wrong but provides no info about under/over-fitting.

What do you think?

This was referenced Aug 24, 2021

Confusing use of term "test" #442

Closed

Scoring model as a good practice #444

Closed

lesteve added this to the MOOC 3.0 milestone Jan 6, 2022

lesteve modified the milestones: MOOC 3.0, MOOC 4.0 Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warning messages when full data-set is used #441

Add warning messages when full data-set is used #441

ArturoAmorQ commented Aug 24, 2021 •

edited

Loading

Add warning messages when full data-set is used #441

Add warning messages when full data-set is used #441

Comments

ArturoAmorQ commented Aug 24, 2021 • edited Loading

ArturoAmorQ commented Aug 24, 2021 •

edited

Loading