Skip to content

Explore the distribution of your train/validation/test datasets

License

Notifications You must be signed in to change notification settings

alejandrodumas/traintestdiff

Repository files navigation

traintestdiff

License

Documentation Status

Installation

$ pip install traintestdiff

Documentation and Examples

You can find the documentation in https://traintestdiff.readthedocs.io and a Jupyer notebook in example

Overview

traintestdiff provides a simple way to explore differences on your train, validation and test data: it's main entry point is the class TrainTestDiff whose only argument is a dict of datasets you would like to explore.

In this case we're going to explore the tips dataset provided by Seaborn

import pandas as pd
import seaborn as sns

from traintestdiff import TrainTestDiff

tips = sns.load_dataset("tips")

# Let's split our data in train and test
train=tips.sample(frac=0.8,random_state=0)
test=tips.drop(train.index)

Once you have your train and test set you're ready to use TrainTestDiff

datasets = {'train': train, 'test': test}
ttd = TrainTestDiff(datasets)

The two main methods are plot_cat_diff and plot_cont_diff: the first one produces a plot of categorical features, and the second one a plot of continuous features.

long_form, fig1 = ttd.plot_cat_diff(features=['smoker', 'day', 'time'])

./examples/cat_diff.png

With plot_cont_diff we can explore the continuous features of the datasets

longform_cont1, fig2 = ttd.plot_cont_diff(features=["total_bill", "size", "tip"], kind="box")

./examples/cont_diff.png

Long Form data and figures

As you can see from the code, both plot_cat_diff and plot_cont_diff return two values: a pandas.core.frame.DataFrame and a matplotlib.figure.Figure

The idea is to give you a way to explore the data in a tidy format and the figure to tweak how it looks. For example, let's change the title:

fig1.suptitle("The same graph with other title")
fig1

./examples/title.png

About

Explore the distribution of your train/validation/test datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published