diff --git a/doc/notebooks_calibration.rst b/doc/notebooks_calibration.rst index 236a67c78..e022fbeaa 100755 --- a/doc/notebooks_calibration.rst +++ b/doc/notebooks_calibration.rst @@ -4,5 +4,5 @@ Calibration notebooks The following examples present advanced analyses on multi-class calibration. -1. Top-label calibration for outputs of ML models : `notebook `_ +1. Top-label calibration for outputs of ML models : `notebook `_ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- diff --git a/doc/notebooks_classification.rst b/doc/notebooks_classification.rst index 35747de19..986dda103 100755 --- a/doc/notebooks_classification.rst +++ b/doc/notebooks_classification.rst @@ -1,13 +1,10 @@ Classification notebooks ======================== -The following examples present advanced analyses on multi-class classification -problems for computer vision settings that are too heavy to be included in the example +The following example present an advanced analyse on multi-class classification +problem for computer vision settings that is too heavy to be included in the example galleries. 1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook `_ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - -2. Top-label calibration for outputs of ML models : `top_label_notebook `_ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- diff --git a/notebooks/classification/top_label_calibration.ipynb b/notebooks/calibration/top_label_calibration.ipynb similarity index 100% rename from notebooks/classification/top_label_calibration.ipynb rename to notebooks/calibration/top_label_calibration.ipynb diff --git a/notebooks/classification/Cifar10.md b/notebooks/classification/Cifar10.md deleted file mode 100755 index 681012e36..000000000 --- a/notebooks/classification/Cifar10.md +++ /dev/null @@ -1,919 +0,0 @@ ---- -jupyter: - jupytext: - formats: ipynb,md - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.13.6 - kernelspec: - display_name: mapie-notebooks - language: python - name: mapie-notebooks ---- - -# Estimating prediction sets on the Cifar10 dataset -The goal of this notebook is to present how to use :class:`mapie.classification.MapieClassifier` on an object classification task. We will build prediction sets for images and study the marginal and conditional coverages. - - -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scikit-learn-contrib/MAPIE/blob/master/notebooks/classification/Cifar10.ipynb) - - - -### What is done in this tutorial ? - -> - **Cifar10 dataset** : 10 classes (horse, dog, cat, frog, deer, bird, airplane, truck, ship, automobile) - -> - Use :class:`mapie.classification.MapieClassifier` to compare the prediction sets estimated by several conformal methods on the Cifar10 dataset. - -> - Train a small CNN to predict the image class - -> - Create a custom class `TensorflowToMapie` to resolve adherence problems between Tensorflow and Mapie - - - - -## Tutorial preparation - -```python -install_mapie = True -if install_mapie: - !pip install mapie -``` - -```python -import random -import warnings -from typing import Dict, List, Tuple, Union - -import cv2 -import matplotlib.pyplot as plt -import numpy as np -import pandas as pd -import tensorflow as tf -import tensorflow.keras as tfk -from tensorflow.keras.callbacks import EarlyStopping -from tensorflow.keras import Sequential -from tensorflow.keras.layers import Conv2D, Dense, Dropout, Flatten, MaxPooling2D -from tensorflow.keras.losses import CategoricalCrossentropy -from tensorflow.keras.optimizers import Adam -import tensorflow_datasets as tfds -from sklearn.metrics import accuracy_score -from sklearn.metrics._plot.confusion_matrix import ConfusionMatrixDisplay -from sklearn.model_selection import train_test_split -from sklearn.preprocessing import label_binarize - -from mapie.metrics import classification_coverage_score -from mapie.classification import MapieClassifier - -warnings.filterwarnings('ignore') -%load_ext autoreload -%autoreload 2 -%matplotlib inline -# %load_ext pycodestyle_magic -``` - -```python -SPACE_BETWEEN_LABELS = 2.5 -SPACE_IN_SUBPLOTS = 4.0 -FONT_SIZE = 18 - -``` - -## 1. Data loading - - -The Cifar10 dataset is downloaded from the `Tensorflow Datasets` library. The training set is then splitted into a training, validation and a calibration set which will be used as follow: - -> - **Training set**: used to train our neural network. -> - **Validation set**: used to check that our model is not overfitting. -> - **Calibration set**: used to calibrate the conformal scores in :class:`mapie.classification.MapieClassifier` - -```python -def train_valid_calib_split( - X: np.ndarray, - y: np.ndarray, - calib_size: float = .1, - val_size: float = .33, - random_state: int = 42 - -) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]: - """ - Create calib and valid datasets from the train dataset. - - Parameters - ---------- - X: np.ndarray of shape (n_samples, width, height, n_channels) - Images of the dataset. - - y: np.ndarray of shape (n_samples, 1): - Label of each image. - - calib_size: float - Percentage of the dataset X to use as calibration set. - - val_size: float - Percentage of the dataset X (minus the calibration set) - to use as validation set. - - random_state: int - Random state to use to split the dataset. - - By default 42. - - Returns - ------- - Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray] - of shapes: - (n_samples * (1 - calib_size) * (1 - val_size), width, height, n_channels), - (n_samples * calib_size, width, height, n_channels), - (n_samples * (1 - calib_size) * val_size, width, height, n_channels), - (n_samples * (1 - calib_size) * (1 - val_size), 1), - (n_samples * calib_size, 1), - (n_samples * (1 - calib_size) * val_size, 1). - - """ - X_train, X_calib, y_train, y_calib = train_test_split( - X, y, - test_size=calib_size, - random_state=random_state - ) - X_train, X_val, y_train, y_val = train_test_split( - X_train, y_train, - test_size=val_size, - random_state=random_state - ) - return X_train, X_calib, X_val, y_train, y_calib, y_val - -``` - -```python -def load_data() -> Tuple[ - Tuple[np.ndarray, np.ndarray, np.ndarray], - Tuple[np.ndarray, np.ndarray, np.ndarray], - Tuple[np.ndarray, np.ndarray, np.ndarray], - List -]: - """ - Load cifar10 Dataset and return train, valid, calib, test datasets - and the names of the labels - - - Returns - ------- - Tuple[ - Tuple[np.ndarray, np.ndarray, np.ndarray], - Tuple[np.ndarray, np.ndarray, np.ndarray], - Tuple[np.ndarray, np.ndarray, np.ndarray], - List - ] - """ - dataset, info = tfds.load( - "cifar10", - batch_size=-1, - as_supervised=True, - with_info=True - ) - label_names = info.features['lac'].names - - dataset = tfds.as_numpy(dataset) - X_train, y_train = dataset['train'] - X_test, y_test = dataset['test'] - X_train, X_calib, X_val, y_train, y_calib, y_val = train_valid_calib_split( - X_train, - y_train - ) - - X_train = X_train/255. - X_val = X_val/255. - - X_calib = X_calib/255. - X_test = X_test/255. - - y_train_cat = tf.keras.utils.to_categorical(y_train) - y_val_cat = tf.keras.utils.to_categorical(y_val) - y_calib_cat = tf.keras.utils.to_categorical(y_calib) - y_test_cat = tf.keras.utils.to_categorical(y_test) - - train_set = (X_train, y_train, y_train_cat) - val_set = (X_val, y_val, y_val_cat) - calib_set = (X_calib, y_calib, y_calib_cat) - test_set = (X_test, y_test, y_test_cat) - - return train_set, val_set, calib_set, test_set, label_names - -``` - -```python -def inspect_images( - X: np.ndarray, - y: np.ndarray, - num_images: int, - label_names: List -) -> None: - """ - Load a sample of the images to check that images - are well loaded. - - Parameters - ---------- - X: np.ndarray of shape (n_samples, width, height, n_channels) - Set of images from which the sample will be taken. - - y: np.ndarray of shape (n_samples, 1) - Labels of the iamges of X. - - num_images: int - Number of images to plot. - - label_names: List - Names of the different labels - - """ - - _, ax = plt.subplots( - nrows=1, - ncols=num_images, - figsize=(2*num_images, 2) - ) - - indices = random.sample(range(len(X)), num_images) - - for i, indice in enumerate(indices): - ax[i].imshow(X[indice]) - ax[i].set_title(label_names[y[indice]]) - ax[i].axis("off") - plt.show() - -``` - -```python -train_set, val_set, calib_set, test_set, label_names = load_data() -(X_train, y_train, y_train_cat) = train_set -(X_val, y_val, y_val_cat) = val_set -(X_calib, y_calib, y_calib_cat) = calib_set -(X_test, y_test, y_test_cat) = test_set -inspect_images(X=X_train, y=y_train, num_images=8, label_names=label_names) -``` - -## 2. Definition and training of the the neural network - - -We define a simple convolutional neural network with the following architecture : - -> - 2 blocks of Convolution/Maxpooling -> - Flatten the images -> - 3 Dense layers -> - The output layer with 10 neurons, corresponding to our 10 classes - -This simple architecture, based on the VGG16 architecture with its succession of convolutions and maxpooling aims at achieving a reasonable accuracy score and a fast training. The objective here is not to obtain a perfect classifier. - - -```python -def get_model( - input_shape: Tuple, loss: tfk.losses, - optimizer: tfk.optimizers, metrics: List[str] -) -> Sequential: - """ - Compile CNN model. - - Parameters - ---------- - input_shape: Tuple - Size of th input images. - - loss: tfk.losses - Loss to use to train the model. - - optimizer: tfk.optimizer - Optimizer to use to train the model. - - metrics: List[str] - Metrics to use evaluate model training. - - Returns - ------- - Sequential - """ - model = Sequential([ - Conv2D(input_shape=input_shape, filters=16, kernel_size=(3, 3), activation='relu', padding='same'), - MaxPooling2D(pool_size=(2, 2)), - Conv2D(input_shape=input_shape, filters=32, kernel_size=(3, 3), activation='relu', padding='same'), - MaxPooling2D(pool_size=(2, 2)), - Conv2D(input_shape=input_shape, filters=64, kernel_size=(3, 3), activation='relu', padding='same'), - MaxPooling2D(pool_size=(2, 2)), - Flatten(), - Dense(128, activation='relu'), - Dense(64, activation='relu'), - Dense(32, activation='relu'), - Dense(10, activation='softmax'), - ]) - model.compile(loss=loss, optimizer=optimizer, metrics=metrics) - return model -``` - -## 3. Training the algorithm with a custom class called `TensorflowToMapie` - -As MAPIE asks for a model with `fit`, `predict_proba`, `predict` class attributes and the information about whether or not the model is fitted. - -```python -class TensorflowToMapie(): - """ - Class that aimes to make compatible a tensorflow model - with MAPIE. To do so, this class create fit, predict, - predict_proba and _sklearn_is_fitted_ attributes to the model. - - """ - - def __init__(self) -> None: - self.pred_proba = None - self.trained_ = False - - - def fit( - self, model: Sequential, - X_train: np.ndarray, y_train: np.ndarray, - X_val: np.ndarray, y_val: np.ndarray - ) -> None: - """ - Train the keras model. - - Parameters - ---------- - model: Sequential - Model to train. - - X_train: np.ndarray of shape (n_sample_train, width, height, n_channels) - Training images. - - y_train: np.ndarray of shape (n_samples_train, n_labels) - Training labels. - - X_val: np.ndarray of shape (n_sample_val, width, height, n_channels) - Validation images. - - y_val: np.ndarray of shape (n_samples_val, n_labels) - Validation labels. - - """ - - early_stopping_monitor = EarlyStopping( - monitor='val_loss', - min_delta=0, - patience=10, - verbose=0, - mode='auto', - baseline=None, - restore_best_weights=True - ) - model.fit( - X_train, y_train, - batch_size=64, - validation_data=(X_val, y_val), - epochs=20, callbacks=[early_stopping_monitor] - ) - - self.model = model - self.trained_ = True - self.classes_ = np.arange(model.layers[-1].units) - - def predict_proba(self, X: np.ndarray) -> np.ndarray: - """ - Returns the predicted probabilities of the images in X. - - Paramters: - X: np.ndarray of shape (n_sample, width, height, n_channels) - Images to predict. - - Returns: - np.ndarray of shape (n_samples, n_labels) - """ - preds = self.model.predict(X) - - return preds - - def predict(self, X: np.ndarray) -> np.ndarray: - """ - Give the label with the maximum softmax for each image. - - Parameters - --------- - X: np.ndarray of shape (n_sample, width, height, n_channels) - Images to predict - - Returns: - -------- - np.ndarray of shape (n_samples, 1) - """ - pred_proba = self.predict_proba(X) - pred = (pred_proba == pred_proba.max(axis=1)[:, None]).astype(int) - return pred - - def __sklearn_is_fitted__(self): - if self.trained_: - return True - else: - return False -``` - -```python tags=[] -model = get_model( - input_shape=(32, 32, 3), - loss=CategoricalCrossentropy(), - optimizer=Adam(), - metrics=['accuracy'] -) -``` - -```python tags=[] -cirfar10_model = TensorflowToMapie() -cirfar10_model.fit(model, X_train, y_train_cat, X_val, y_val_cat) -``` - -```python -y_true = label_binarize(y=y_test, classes=np.arange(max(y_test)+1)) -y_pred_proba = cirfar10_model.predict_proba(X_test) -y_pred = cirfar10_model.predict(X_test) - -``` - -## 4. Prediction of the prediction sets - - -We will now estimate the prediction sets with the five conformal methods implemented in :class:`mapie.classification.MapieClassifier` for a range of confidence levels between 0 and 1. - -```python -method_params = { - "naive": ("naive", False), - "lac": ("lac", False), - "aps": ("aps", True), - "random_aps": ("aps", "randomized"), - "top_k": ("top_k", False) -} - -``` - -```python tags=[] -y_preds, y_pss = {}, {} -alphas = np.arange(0.01, 1, 0.01) - -for name, (method, include_last_label) in method_params.items(): - mapie = MapieClassifier(estimator=cirfar10_model, method=method, cv="prefit", random_state=42) - mapie.fit(X_calib, y_calib) - y_preds[name], y_pss[name] = mapie.predict(X_test, alpha=alphas, include_last_label=include_last_label) -``` - -Let's now estimate the number of null prediction sets, marginal coverages, and averaged prediction set sizes obtained with the different methods for all confidence levels and for a confidence level of 90 \%. - -```python -def count_null_set(y: np.ndarray) -> int: - """ - Count the number of empty prediction sets. - - Parameters - ---------- - y: np.ndarray of shape (n_sample, ) - - Returns - ------- - int - """ - count = 0 - for pred in y[:, :]: - if np.sum(pred) == 0: - count += 1 - return count - -``` - -```python -nulls, coverages, accuracies, sizes = {}, {}, {}, {} -for name, (method, include_last_label) in method_params.items(): - accuracies[name] = accuracy_score(y_true, y_preds[name]) - nulls[name] = [ - count_null_set(y_pss[name][:, :, i]) for i, _ in enumerate(alphas) - ] - coverages[name] = [ - classification_coverage_score( - y_test, y_pss[name][:, :, i] - ) for i, _ in enumerate(alphas) - ] - sizes[name] = [ - y_pss[name][:, :, i].sum(axis=1).mean() for i, _ in enumerate(alphas) - ] - -``` - -```python -coverage_90 = {method: coverage[9] for method, coverage in coverages.items()} -null_90 = {method: null[9] for method, null in nulls.items()} -width_90 = {method: width[9] for method, width in sizes.items()} -y_ps_90 = {method: y_ps[:, :, 9] for method, y_ps in y_pss.items()} -``` - -Let's now look at the marginal coverages, number of null prediction sets, and the averaged size of prediction sets for a confidence level of 90 \%. - -```python -summary_df = pd.concat( - [ - pd.Series(coverage_90), - pd.Series(null_90), - pd.Series(width_90) - ], - axis=1, - keys=["Coverages", "Number of null sets", "Average prediction set sizes"] -).round(3) -``` - -```python -summary_df -``` - -As expected, the "naive" method, which directly uses the alpha value as a threshold for selecting the prediction sets, does not give guarantees on the marginal coverage since this method is not calibrated. Other methods give a marginal coverage close to the desired one, i.e. 90\%. Notice that the "aps" method, which always includes the last label whose cumulated score is above the given quantile, tends to give slightly higher marginal coverages since the prediction sets are slightly too big. - - -## 5. Visualization of the prediction sets - -```python -def prepare_plot(y_methods: Dict[str, Tuple], n_images: int) -> np.ndarray: - """ - Prepare the number and the disposition of the plots according to - the number of images. - - Paramters: - y_methods: Dict[str, Tuple] - Methods we want to compare. - - n_images: int - Number of images to plot. - - Returns - ------- - np.ndarray - """ - plt.rcParams.update({'font.size': FONT_SIZE}) - nrow = len(y_methods.keys()) - ncol = n_images - s = 5 - f, ax = plt.subplots(ncol, nrow, figsize=(s*nrow, s*ncol)) - f.tight_layout(pad=SPACE_IN_SUBPLOTS) - rows = [i for i in y_methods.keys()] - - for x, row in zip(ax[:,0], rows): - x.set_ylabel(row, rotation=90, size='large') - - return ax - -``` - -```python -def get_position(y_set: List, label: str, count: int, count_true: int) -> float: - """ - Return the position of each label according to the number of labels to plot. - - Paramters - --------- - y_set: List - Set of predicted labels for one image. - - label: str - Indice of the true label. - - count: int - Index of the label. - - count_true: int - Total number of labels in the prediction set. - - Returns - ------- - float - """ - if y_set[label] : - position = - (count_true - count)*SPACE_BETWEEN_LABELS - - else: - position = - (count_true + 2 - count)*SPACE_BETWEEN_LABELS - - return position - - -def add_text( - ax: np.ndarray, indices: Tuple, position: float, - label_name: str, proba: float, color: str, missing: bool = False -) -> None: - """ - Add the text to the corresponding image. - - Parameters - ---------- - ax: np.ndarray - Matrix of the images to plot. - - indices: Tuple - Tuple indicating the indices of the image to put - the text on. - - position: float - Position of the text on the image. - - label_name: str - Name of the label to plot. - - proba: float - Proba associated to this label. - - color: str - Color of the text. - - missing: bool - Whether or not the true label is missing in the - prediction set. - - By default False. - - """ - if not missing : - text = f"{label_name} : {proba:.4f}" - else: - text = f"True label : {label_name} ({proba:.4f})" - i, j = indices - ax[i, j].text( - 15, - position, - text, - ha="center", va="top", - color=color, - font="courier new" - ) - - -``` - -```python -def plot_prediction_sets( - X: np.ndarray, y: np.ndarray, - y_pred_proba: np.ndarray, - y_methods: Dict[str, np.ndarray], - n_images: int, label_names: Dict, - random_state: Union[int, None] = None -) -> None: - """ - Plot random images with their associated prediction - set for all the required methods. - - Parameters - ---------- - X: np.ndarray of shape (n_sample, width, height, n_channels) - Array containing images. - - y: np.ndarray of shape (n_samples, ) - Labels of the images. - - y_pred_proba: np.ndarray of shape (n_samples, n_labels) - Softmax output of the model. - - y_methods: Dict[str, np.ndarray] - Outputs of the MapieClassifier with the different - choosen methods. - - n_images: int - Number of images to plot - - random_state: Union[int, None] - Random state to use to choose the images. - - By default None. - """ - random.seed(random_state) - indices = random.sample(range(len(X)), n_images) - - y_true = y[indices] - y_pred_proba = y_pred_proba[indices] - ax = prepare_plot(y_methods, n_images) - - for i, method in enumerate(y_methods): - y_sets = y_methods[method][indices] - - for j in range(n_images): - y_set = y_sets[j] - img, label= X[indices[j]], y_true[j] - - ax[i, j].imshow(img) - - count_true = np.sum(y_set) - index_sorted_proba = np.argsort(-y_pred_proba[j]) - - for count in range(count_true): - index_pred = index_sorted_proba[count] - proba = y_pred_proba[j][index_pred] - label_name = label_names[index_pred] - color = 'green' if index_pred == y_true[j] else 'red' - position = get_position(y_set, label, count, count_true) - - add_text(ax, (i, j), position, label_name, proba, color) - - if not y_set[label] : - label_name = label_names[label] - proba = y_pred_proba[j][label] - add_text(ax, (i, j), -3, label_name, proba, color= 'orange', missing=True) - -``` - -```python -plot_prediction_sets(X_test, y_test, y_pred_proba, y_ps_90, 5, label_names) -``` - -## 6. Calibration of the methods - - -In this section, we plot the number of null sets, the marginal coverages, and the prediction set sizes as function of the target coverage level for all conformal methods. - -```python -vars_y = [nulls, coverages, sizes] -labels_y = ["Empty prediction sets", "Marginal coverage", "Set sizes"] -fig, axs = plt.subplots(1, len(vars_y), figsize=(8*len(vars_y), 8)) -for i, var in enumerate(vars_y): - for name, (method, include_last_label) in method_params.items(): - axs[i].plot(1 - alphas, var[name], label=name) - if i == 1: - axs[i].plot([0, 1], [0, 1], ls="--", color="k") - axs[i].set_xlabel("Couverture cible : 1 - alpha") - axs[i].set_ylabel(labels_y[i]) - if i == len(vars_y) - 1: - axs[i].legend(fontsize=10, loc=[1, 0]) -``` - -The two only methods which are perfectly calibrated for the entire range of alpha values are the "lac" and "random_aps". However, these accurate marginal coverages can only be obtained thanks to the generation of null prediction sets. The compromise between estimating null prediction sets with calibrated coverages or non-empty prediction sets but with larger marginal coverages is entirely up to the user. - - -## 7. Prediction set sizes - -```python -s=5 -fig, axs = plt.subplots(1, len(y_preds), figsize=(s*len(y_preds), s)) -for i, (method, y_ps) in enumerate(y_ps_90.items()): - sizes = y_ps.sum(axis=1) - axs[i].hist(sizes) - axs[i].set_xlabel("Prediction set sizes") - axs[i].set_title(method) -``` - -## 8. Conditional coverages - - -We just saw that all our methods (except the "naive" one) give marginal coverages always larger than the target coverages for alpha values ranging between 0 and 1. However, there is no mathematical guarantees on the *conditional* coverages, i.e. the coverage obtained for a specific class of images. Let's see what conditional coverages we obtain with the different conformal methods. - -```python -def get_class_coverage( - y_test: np.ndarray, - y_method: Dict[str, np.ndarray], - label_names: List[str] -) -> None: - """ - Compute the coverage for each class. As MAPIE is looking for a - global coverage of 1-alpha, it is important to check that their - is not major coverage difference between classes. - - Parameters - ---------- - y_test: np.ndarray of shape (n_samples,) - Labels of the predictions. - - y_method: Dict[str, np.ndarray] - Prediction sets for each method. - - label_names: List[str] - Names of the labels. - """ - recap ={} - for method in y_method: - recap[method] = [] - for label in sorted(np.unique(y_test)): - indices = np.where(y_test==label) - label_name = label_names[label] - y_test_trunc = y_test[indices] - y_set_trunc = y_method[method][indices] - score_coverage = classification_coverage_score(y_test_trunc, y_set_trunc) - recap[method].append(score_coverage) - recap_df = pd.DataFrame(recap, index = label_names) - return recap_df - -``` - -```python -class_coverage = get_class_coverage(y_test, y_ps_90, label_names) -``` - -```python -fig = plt.figure() -class_coverage.plot.bar(figsize=(12, 4), alpha=0.7) -plt.axhline(0.9, ls="--", color="k") -plt.ylabel("Conditional coverage") -plt.legend(loc=[1, 0]) -``` - -We can notice that the conditional coverages slightly vary between classes. The only method whose conditional coverages remain valid for all classes is the "top_k" one. However, those variations are much smaller than that of the naive method. - -```python -def create_confusion_matrix(y_ps: np.ndarray, y_true: np.ndarray) -> np.ndarray: - """ - Create a confusion matrix to visualize, for each class, which - classes are which are the most present classes in the prediction - sets. - - Parameters - ---------- - y_ps: np.ndarray of shape (n_samples, n_labels) - Prediction sets of a specific method. - - y_true: np.ndarray of shape (n_samples, ) - Labels of the sample - - Returns - ------- - np.ndarray of shape (n_labels, n_labels) - """ - number_of_classes = len(np.unique(y_true)) - confusion_matrix = np.zeros((number_of_classes, number_of_classes)) - for i, ps in enumerate(y_ps): - confusion_matrix[y_true[i]] += ps - - return confusion_matrix - -``` - -```python -def reorder_labels(ordered_labels: List, labels: List, cm: np.ndarray) -> np.ndarray: - """ - Used to order the labels in the confusion matrix - - Parameters - ---------- - ordered_labels: List - Order you want to have in your confusion matrix - - labels: List - Initial order of the confusion matrix - - cm: np.ndarray of shape (n_labels, n_labels) - Original confusion matrix - - Returns - ------- - np.ndarray of shape (n_labels, n_labels) - """ - cm_ordered = np.zeros(cm.shape) - index_order = [labels.index(label) for label in ordered_labels] - for i, label in enumerate(ordered_labels): - old_index = labels.index(label) - - cm_ordered[i] = cm[old_index, index_order] - return cm_ordered -``` - -```python -def plot_confusion_matrix(method: str, y_ps: Dict[str, np.ndarray], label_names: List) -> None: - """ - Plot the confusion matrix for a specific method. - - Parameters - ---------- - method: str - Name of the method to plot. - - y_ps: Dict[str, np.ndarray] - Prediction sets for each of the fitted method - - label_names: List - Name of the labels - """ - - y_method = y_ps[method] - cm = create_confusion_matrix(y_method, y_test) - ordered_labels = ["frog", "cat", "dog", "deer", "horse", "bird", "airplane", "ship", "truck", "automobile"] - cm = reorder_labels(ordered_labels, label_names, cm) - disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=ordered_labels) - _, ax = plt.subplots(figsize=(10, 10)) - disp.plot( - include_values=True, - cmap="viridis", - ax=ax, - xticks_rotation="vertical", - values_format='.0f', - colorbar=True, - ) - - ax.set_title(f'Confusion matrix for {method} method') -``` - -```python -plot_confusion_matrix("aps", y_ps_90, label_names) -``` - -Thanks to this confusion matrix we can see that, for some labels (as cat, deer and dog) the distribution of the labels in the prediction set is not uniform. Indeed, when the image is a cat, there are almost as many predictions sets with the true label as with the "cat" label. In this case, the reverse is also true. However, for the deer, the cat label is often included within the prediction set while the deer is not. - -```python - -``` diff --git a/notebooks/classification/tutorial_classification.md b/notebooks/classification/tutorial_classification.md deleted file mode 100644 index 2e9e099ca..000000000 --- a/notebooks/classification/tutorial_classification.md +++ /dev/null @@ -1,259 +0,0 @@ ---- -jupyter: - jupytext: - formats: ipynb,md - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.13.6 - kernelspec: - display_name: mapie_local - language: python - name: mapie_local ---- - -# Tutorial for classification - - -In this tutorial, we compare the prediction sets estimated by the conformal methods implemented in MAPIE on a toy two-dimensional dataset. - -Throughout this tutorial, we will answer the following questions: - -- How does the number of classes in the prediction sets vary according to the significance level ? - -- Is the chosen conformal method well calibrated ? - -- What are the pros and cons of the conformal methods included in MAPIE ? - - -## 1. Conformal Prediction method using the softmax score of the true label - - -We will use MAPIE to estimate a prediction set of several classes such that the probability that the true label -of a new test point is included in the prediction set is always higher than the target confidence level : -$P(Y \in C) \geq 1 - \alpha$. -We start by using the softmax score output by the base classifier as the conformity score on a toy two-dimensional dataset. -We estimate the prediction sets as follows : - -* First we generate a dataset with train, calibration and test, the model is fitted on the training set. -* We set the conformal score $S_i = \hat{f}(X_{i})_{y_i}$ the softmax output of the true class for each sample in the calibration set. -* Then we define $\hat{q}$ as being the $(n + 1) (\alpha) / n$ previous quantile of $S_{1}, ..., S_{n}$ -(this is essentially the quantile $\alpha$, but with a small sample correction). -* Finally, for a new test data point (where $X_{n + 1}$ is known but $Y_{n + 1}$ is not), create a prediction set -$C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\}$ which includes all the classes with a sufficiently high softmax output. - -We use a two-dimensional toy dataset with three labels. The distribution of the data is a bivariate normal with diagonal covariance matrices for each label. - -```python -import numpy as np -from sklearn.model_selection import train_test_split -centers = [(0, 3.5), (-2, 0), (2, 0)] -covs = [np.eye(2), np.eye(2)*2, np.diag([5, 1])] -x_min, x_max, y_min, y_max, step = -6, 8, -6, 8, 0.1 -n_samples = 1000 -n_classes = 3 -np.random.seed(42) -X = np.vstack([ - np.random.multivariate_normal(center, cov, n_samples) - for center, cov in zip(centers, covs) -]) -y = np.hstack([np.full(n_samples, i) for i in range(n_classes)]) -X_train_cal, X_test, y_train_cal, y_test = train_test_split(X, y, test_size=0.2) -X_train, X_cal, y_train, y_cal = train_test_split(X_train_cal, y_train_cal, test_size=0.25) - -xx, yy = np.meshgrid( - np.arange(x_min, x_max, step), np.arange(x_min, x_max, step) -) -X_test_mesh = np.stack([xx.ravel(), yy.ravel()], axis=1) -``` - -Let’s see our training data. - -```python -import matplotlib.pyplot as plt -colors = {0: "#1f77b4", 1: "#ff7f0e", 2: "#2ca02c", 3: "#d62728"} -y_train_col = list(map(colors.get, y_train)) -fig = plt.figure() -plt.scatter( - X_train[:, 0], - X_train[:, 1], - color=y_train_col, - marker='o', - s=10, - edgecolor='k' -) -plt.xlabel("X") -plt.ylabel("Y") -plt.show() -``` - -We fit our training data with a Gaussian Naive Base estimator. And then we apply MAPIE in the calibration data with the method ``score`` to the estimator indicating that it has already been fitted with `cv="prefit"`. -We then estimate the prediction sets with differents alpha values with a -``fit`` and ``predict`` process. - -```python -from sklearn.naive_bayes import GaussianNB -from mapie.classification import MapieClassifier -from mapie.metrics import classification_coverage_score, classification_mean_width_score -clf = GaussianNB().fit(X_train, y_train) -y_pred = clf.predict(X_test) -y_pred_proba = clf.predict_proba(X_test) -y_pred_proba_max = np.max(y_pred_proba, axis=1) -mapie_score = MapieClassifier(estimator=clf, cv="prefit", method="lac") -mapie_score.fit(X_cal, y_cal) -alpha = [0.2, 0.1, 0.05] -y_pred_score, y_ps_score = mapie_score.predict(X_test_mesh, alpha=alpha) -``` - -* ``y_pred_score``: represents the prediction in the test set by the base estimator. -* ``y_ps_score``: the prediction sets estimated by MAPIE with the "lac" method. - -```python -def plot_scores(n, alphas, scores, quantiles): - colors = {0:"#1f77b4", 1:"#ff7f0e", 2:"#2ca02c"} - fig = plt.figure(figsize=(7, 5)) - plt.hist(scores, bins="auto") - i=0 - for i, quantile in enumerate(quantiles): - plt.vlines( - x = quantile, - ymin=0, - ymax=400, - color=colors[i], - ls= "dashed", - label=f"alpha = {alphas[i]}" - ) - plt.title("Distribution of scores") - plt.legend() - plt.xlabel("Scores") - plt.ylabel("Count") - plt.show() -``` - -Let’s see the distribution of the scores with the calculated quantiles. - -```python -scores = mapie_score.conformity_scores_ -n = len(mapie_score.conformity_scores_) -quantiles = mapie_score.quantiles_ -plot_scores(n, alpha, scores, quantiles) -``` - -The estimated quantile increases with alpha. A high value of alpha can potentially lead to a high quantile which would not necessarily be reached by any class in uncertain areas, resulting in null regions. - -We will now visualize the differences between the prediction sets of the different values of alpha. - -```python -def plot_results(alphas, X, y_pred, y_ps): - tab10 = plt.cm.get_cmap('Purples', 4) - colors = {0: "#1f77b4", 1: "#ff7f0e", 2: "#2ca02c", 3: "#d62728"} - y_pred_col = list(map(colors.get, y_pred)) - fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(10, 10)) - axs = {0: ax1, 1: ax2, 2: ax3, 3: ax4} - axs[0].scatter( - X[:, 0], - X[:, 1], - color=y_pred_col, - marker='.', - s=10, - alpha=0.4 - ) - axs[0].set_title("Predicted labels") - for i, alpha in enumerate(alphas): - y_pi_sums = y_ps[:, :, i].sum(axis=1) - num_labels = axs[i+1].scatter( - X[:, 0], - X[:, 1], - c=y_pi_sums, - marker='.', - s=10, - alpha=1, - cmap=tab10, - vmin=0, - vmax=3 - ) - cbar = plt.colorbar(num_labels, ax=axs[i+1]) - axs[i+1].set_title(f"Number of labels for alpha={alpha}") - plt.show() -``` - -```python -plot_results(alpha, X_test_mesh, y_pred_score, y_ps_score) -``` - -When the class coverage is not large enough, the prediction sets can be empty when the model is uncertain at the border between two classes. The null region disappears for larger class coverages but ambiguous classification regions arise with several labels included in the prediction sets highlighting the uncertain behaviour of the base classifier. - - -Let’s now study the effective coverage and the mean prediction set widths as function of the $1-\alpha$ target coverage. To this aim, we use once again the `.predict()` method of MAPIE to estimate predictions sets on a large number of $\alpha$ values. - -```python -alpha2 = np.arange(0.02, 0.98, 0.02) -_, y_ps_score2 = mapie_score.predict(X_test, alpha=alpha2) -coverages_score = [ - classification_coverage_score(y_test, y_ps_score2[:, :, i]) - for i, _ in enumerate(alpha2) -] -widths_score = [ - classification_mean_width_score(y_ps_score2[:, :, i]) - for i, _ in enumerate(alpha2) -] -``` - -```python -def plot_coverages_widths(alpha, coverage, width, method): - fig, axs = plt.subplots(1, 2, figsize=(12, 5)) - axs[0].scatter(1 - alpha, coverage, label=method) - axs[0].set_xlabel("1 - alpha") - axs[0].set_ylabel("Coverage score") - axs[0].plot([0, 1], [0, 1], label="x=y", color="black") - axs[0].legend() - axs[1].scatter(1 - alpha, width, label=method) - axs[1].set_xlabel("1 - alpha") - axs[1].set_ylabel("Average size of prediction sets") - axs[1].legend() - plt.show() -``` - -```python -plot_coverages_widths(alpha2, coverages_score, widths_score, "lac") -``` - -## 2. Conformal Prediction method using the cumulative softmax score - - -We saw in the previous section that the "lac" method is well calibrated by providing accurate coverage levels. However, it tends to give null prediction sets for uncertain regions, especially when the $\alpha$ value is high. MAPIE includes another method, called Adaptive Prediction Set (APS), whose conformity score is the cumulated score of the softmax output until the true label is reached (see the theoretical description for more details). We will see in this Section that this method no longer estimates null prediction sets but by giving slightly bigger prediction sets. - - -Let's visualize the prediction sets obtained with the APS method on the test set after fitting MAPIE on the calibration set. - -```python -mapie_aps = MapieClassifier(estimator=clf, cv="prefit", method="aps") -mapie_aps.fit(X_cal, y_cal) -alpha = [0.2, 0.1, 0.05] -y_pred_aps, y_ps_aps = mapie_aps.predict(X_test_mesh, alpha=alpha, include_last_label=True) -``` - -```python -plot_results(alpha, X_test_mesh, y_pred_aps, y_ps_aps) -``` - -One can notice that the uncertain regions are emphasized by wider boundaries, but without null prediction sets with respect to the first "lac" method. - -```python -_, y_ps_aps2 = mapie_aps.predict(X_test, alpha=alpha2, include_last_label="randomized") -coverages_aps = [ - classification_coverage_score(y_test, y_ps_aps2[:, :, i]) - for i, _ in enumerate(alpha2) -] -widths_aps = [ - classification_mean_width_score(y_ps_aps2[:, :, i]) - for i, _ in enumerate(alpha2) -] -``` - -```python -plot_coverages_widths(alpha2, coverages_aps, widths_aps, "lac") -``` - -This method also gives accurate calibration plots, meaning that the effective coverage level is always very close to the target coverage, sometimes at the expense of slightly bigger prediction sets. diff --git a/notebooks/regression/exoplanets.md b/notebooks/regression/exoplanets.md deleted file mode 100755 index f71758520..000000000 --- a/notebooks/regression/exoplanets.md +++ /dev/null @@ -1,373 +0,0 @@ ---- -jupyter: - jupytext: - formats: ipynb,md - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.13.6 - kernelspec: - display_name: mapie_local - language: python - name: mapie_local ---- - -# Estimating the uncertainties in the exoplanet masses - - -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scikit-learn-contrib/MAPIE/blob/master/notebooks/regression/exoplanets.ipynb) - - - -In this notebook, we quantify the uncertainty in exoplanet masses predicted by several machine learning models, based on the exoplanet properties. To this aim, we use the exoplanet dataset downloaded from the [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/) and estimate the prediction intervals using the methods implemented in MAPIE. - -```python -install_mapie = True -if install_mapie: - !pip install mapie -``` - -```python -from typing_extensions import TypedDict -from typing import Union -from sklearn.compose import ColumnTransformer -from sklearn.ensemble import RandomForestRegressor -from sklearn.impute import SimpleImputer -from sklearn.linear_model import LinearRegression -from sklearn.model_selection import train_test_split, KFold -from sklearn.pipeline import Pipeline -from sklearn.preprocessing import ( - OneHotEncoder, - OrdinalEncoder, - PolynomialFeatures, - RobustScaler -) -import warnings -import matplotlib.pyplot as plt -import numpy as np -import pandas as pd -import seaborn as sns - -from mapie.metrics import regression_coverage_score -from mapie.regression import MapieRegressor -from mapie.subsample import Subsample - -warnings.filterwarnings("ignore") -``` - -## 1. Data Loading - - -Let's start by loading the `exoplanets` dataset and looking at the main information. - -```python -url_file = "https://raw.githubusercontent.com/scikit-learn-contrib/MAPIE/master/notebooks/regression/exoplanets_mass.csv" -exo_df = pd.read_csv(url_file, index_col=0) -``` - -```python -exo_df.info() -``` - -The dataset contains 21 features giving complementary information about the properties of the discovered planet, the star around which the planet revolves, together with the type of discovery method. 7 features are categorical, and 14 are continuous. - - -Some properties show high variance among exoplanets and stars due to the astronomical nature of such systems. We therefore decide to use a log transformation for the following features to approach a normal distribution. - -```python -exo_df["Stellar_Mass_[Solar_mass]"] = exo_df["Stellar_Mass_[Solar_mass]"].replace(0, np.nan) -vars2log = [ - "Planet_Orbital_Period_[day]", - "Planet_Orbital_SemiMajorAxis_[day]", - "Planet_Radius_[Earth_radius]", - "Planet_Mass_[Earth_mass]", - "Stellar_Radius_[Solar_radius]", - "Stellar_Mass_[Solar_mass]", - "Stellar_Effective_Temperature_[K]" -] -for var in vars2log: - exo_df[var+"_log"] = np.log(exo_df[var]) -``` - -```python -vars2keep = list(set(exo_df.columns) - set(vars2log)) -exo_df = exo_df[vars2keep] -``` - -```python -exo_df.head() -``` - -Throughout this tutorial, the target variable will be `Planet_Mass_[Earth_mass]_log`. - -```python -target = "Planet_Mass_[Earth_mass]_log" -``` - -```python -num_cols = list(exo_df.columns[exo_df.dtypes == "float64"]) -cat_cols = list(exo_df.columns[exo_df.dtypes != "float64"]) -exo_df[cat_cols] = exo_df[cat_cols].astype(str) -``` - -```python -planet_cols = [col for col in num_cols if "Planet_" in col] -star_cols = [col for col in num_cols if "Stellar_" in col] -system_cols = [col for col in num_cols if "System_" in col] -``` - -## 2. Data visualization - -```python -sns.pairplot(exo_df[planet_cols]) -``` - -```python -sns.pairplot(exo_df[star_cols]) -``` - -## 3. Data preprocessing - - -In this section, we perform a simple preprocessing of the dataset in order to impute the missing values and encode the categorical features. - -```python -endos = list(set(exo_df.columns) - set([target])) -X = exo_df[endos] -y = exo_df[target] -``` - -```python -num_cols = list(X.columns[X.dtypes == "float64"]) -cat_cols = list(X.columns[X.dtypes != "float64"]) -X[cat_cols] = X[cat_cols].astype(str) -``` - -```python -imputer_num = SimpleImputer(strategy="mean") -scaler_num = RobustScaler() -imputer_cat = SimpleImputer(strategy="constant", fill_value=-1) -encoder_cat = OneHotEncoder( - categories="auto", - drop=None, - sparse=False, - handle_unknown="ignore", -) -``` - -```python -numerical_transformer = Pipeline( - steps=[("imputer", imputer_num), ("scaler", scaler_num)] -) -categorical_transformer = Pipeline( - steps=[("ordinal", OrdinalEncoder()), ("imputer", imputer_cat), ("encoder", encoder_cat)] -) -preprocessor = ColumnTransformer( - transformers=[ - ("numerical", numerical_transformer, num_cols), - ("categorical", categorical_transformer, cat_cols) - ], - remainder="drop", - sparse_threshold=0, -) -``` - -```python -X_train, X_test, y_train, y_test = train_test_split( - X, y, test_size=0.2, random_state=42, shuffle=True -) -``` - -```python -X_train = preprocessor.fit_transform(X_train) -X_test = preprocessor.transform(X_test) -``` - -## 4. First estimation of the uncertainties with MAPIE - - -### Uncertainty estimation - - -Here, we build our first prediction intervals with MAPIE. To this aim, we adopt the CV+ strategy with 5 folders, using `method="plus"` and `cv=KFold(n_splits=5, shuffle=True)` as input arguments. - -```python -def get_regressor(name): - if name == "linear": - mdl = LinearRegression() - elif name == "polynomial": - degree_polyn = 2 - mdl = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", LinearRegression()) - ] - ) - elif name == "random_forest": - mdl = RandomForestRegressor() - return mdl -``` - -```python -mdl = get_regressor("random_forest") -``` - -```python -mapie = MapieRegressor(mdl, method="plus", cv=KFold(n_splits=5, shuffle=True)) -``` - -```python -mapie.fit(X_train, y_train) -``` - -We build prediction intervals for a range of alpha values between 0 and 1. - -```python -alpha = np.arange(0.05, 1, 0.05) -y_train_pred, y_train_pis = mapie.predict(X_train, alpha=alpha) -y_test_pred, y_test_pis = mapie.predict(X_test, alpha=alpha) -``` - -### Visualization - - -The following function offers to visualize the error bars estimated by MAPIE for the selected method and the given confidence level. - -```python -def plot_predictionintervals( - y_train, - y_train_pred, - y_train_pred_low, - y_train_pred_high, - y_test, - y_test_pred, - y_test_pred_low, - y_test_pred_high, - suptitle: str, -) -> None: - fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 6)) - - ax1.errorbar( - x=y_train, - y=y_train_pred, - yerr=(y_train_pred - y_train_pred_low, y_train_pred_high - y_train_pred), - alpha=0.8, - label="train", - fmt=".", - ) - ax1.errorbar( - x=y_test, - y=y_test_pred, - yerr=(y_test_pred - y_test_pred_low, y_test_pred_high - y_test_pred), - alpha=0.8, - label="test", - fmt=".", - ) - ax1.plot( - [y_train.min(), y_train.max()], - [y_train.min(), y_train.max()], - color="gray", - alpha=0.5, - ) - ax1.set_xlabel("True values", fontsize=12) - ax1.set_ylabel("Predicted values", fontsize=12) - ax1.legend() - - ax2.scatter( - x=y_train, y=y_train_pred_high - y_train_pred_low, alpha=0.8, label="train", marker="." - ) - ax2.scatter(x=y_test, y=y_test_pred_high - y_test_pred_low, alpha=0.8, label="test", marker=".") - ax2.set_xlabel("True values", fontsize=12) - ax2.set_ylabel("Interval width", fontsize=12) - ax2.set_xscale("linear") - ax2.set_ylim([0, np.max(y_test_pred_high - y_test_pred_low)*1.1]) - ax2.legend() - std_all = np.concatenate([ - y_train_pred_high - y_train_pred_low, y_test_pred_high - y_test_pred_low - ]) - type_all = np.array(["train"] * len(y_train) + ["test"] * len(y_test)) - x_all = np.arange(len(std_all)) - order_all = np.argsort(std_all) - std_order = std_all[order_all] - type_order = type_all[order_all] - ax3.scatter( - x=x_all[type_order == "train"], - y=std_order[type_order == "train"], - alpha=0.8, - label="train", - marker=".", - ) - ax3.scatter( - x=x_all[type_order == "test"], - y=std_order[type_order == "test"], - alpha=0.8, - label="test", - marker=".", - ) - ax3.set_xlabel("Order", fontsize=12) - ax3.set_ylabel("Interval width", fontsize=12) - ax3.legend() - ax1.set_title("True vs predicted values") - ax2.set_title("Prediction interval width vs true values") - ax3.set_title("Ordered prediction interval width") - plt.suptitle(suptitle, size=20) - plt.show() - -``` - -```python -alpha_plot = int(np.where(alpha == 0.1)[0]) -plot_predictionintervals( - y_train, - y_train_pred, - y_train_pis[:, 0, alpha_plot], - y_train_pis[:, 1, alpha_plot], - y_test, - y_test_pred, - y_test_pis[:, 0, alpha_plot], - y_test_pis[:, 1, alpha_plot], - "Prediction intervals for alpha=0.1", -) -``` - -## 5. Comparison of the uncertainty quantification methods - - -In the last section, we compare the calibration of several uncertainty-quantification methods provided by MAPIE using Random Forest as base model. To this aim, we build so-called "calibration plots" which compare the effective marginal coverage obtained on the test set with the target $1-\alpha$ coverage. - -```python -Params = TypedDict("Params", {"method": str, "cv": Union[int, Subsample]}) -STRATEGIES = { - "naive": Params(method="naive"), - "cv": Params(method="base", cv=5), - "cv_plus": Params(method="plus", cv=5), - "cv_minmax": Params(method="minmax", cv=5), - "jackknife_plus_ab": Params(method="plus", cv=Subsample(n_resamplings=20)), -} -mdl = get_regressor("random_forest") -``` - -```python -y_pred, y_pis, scores = {}, {}, {} -for strategy, params in STRATEGIES.items(): - mapie = MapieRegressor(mdl, **params) - mapie.fit(X_train, y_train) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=alpha) - scores[strategy] = [ - regression_coverage_score(y_test, y_pis[strategy][:, 0, i], y_pis[strategy][:, 1, i]) - for i, _ in enumerate(alpha) - ] -``` - -```python -plt.figure(figsize=(7, 6)) -plt.xlabel("Target coverage (1 - alpha)") -plt.ylabel("Effective coverage") -for strategy, params in STRATEGIES.items(): - plt.plot(1 - alpha, scores[strategy], label=strategy) -plt.plot([0, 1], [0, 1], ls="--", color="k") -plt.legend(loc=[1, 0]) -``` - -The calibration plot clearly demonstrates that the "naive" method underestimates the coverage by giving too narrow prediction intervals, due to the fact that they are built from training data. All other methods show much more robust calibration plots : the effective coverages follow almost linearly the expected coverage levels. diff --git a/notebooks/regression/ts-changepoint.md b/notebooks/regression/ts-changepoint.md deleted file mode 100644 index 3837c3d36..000000000 --- a/notebooks/regression/ts-changepoint.md +++ /dev/null @@ -1,453 +0,0 @@ -# Estimating prediction intervals of time series forecast with EnbPI - -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/scikit-learn-contrib/MAPIE/blob/add-ts-notebooks/notebooks/regression/ts-changepoint.ipynb) - -This example uses `mapie.time_series_regression.MapieTimeSeriesRegressor` to estimate -prediction intervals associated with time series forecast. It follows Xu \& Xie (2021). -We use here the Victoria electricity demand dataset used in the book -"Forecasting: Principles and Practice" by R. J. Hyndman and G. Athanasopoulos. -The electricity demand features daily and weekly seasonalities and is impacted -by the temperature, considered here as a exogeneous variable. -A Random Forest model is already fitted on data. The hyper-parameters are -optimized with a `sklearn.model_selection.RandomizedSearchCV` using a -sequential `sklearn.model_selection.TimeSeriesSplit` cross validation, -in which the training set is prior to the validation set. -The best model is then feeded into -`mapie.time_series_regression.MapieTimeSeriesRegressor` to estimate the -associated prediction intervals. We compare four approaches: with or without -``partial_fit`` called at every step. - - -```python -install_mapie = False -if install_mapie: - !pip install mapie -``` - - -```python -import warnings - -import numpy as np -import pandas as pd -from matplotlib import pylab as plt -from scipy.stats import randint -from sklearn.ensemble import RandomForestRegressor -from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit - -from mapie.metrics import regression_coverage_score, regression_mean_width_score -from mapie.subsample import BlockBootstrap -from mapie.time_series_regression import MapieTimeSeriesRegressor - -%reload_ext autoreload -%autoreload 2 -warnings.simplefilter("ignore") -``` - -## 1. Load input data and feature engineering - - -```python -url_file = "https://raw.githubusercontent.com/scikit-learn-contrib/MAPIE/master/examples/data/demand_temperature.csv" -demand_df = pd.read_csv( - url_file, parse_dates=True, index_col=0 -) - -demand_df["Date"] = pd.to_datetime(demand_df.index) -demand_df["Weekofyear"] = demand_df.Date.dt.isocalendar().week.astype("int64") -demand_df["Weekday"] = demand_df.Date.dt.isocalendar().day.astype("int64") -demand_df["Hour"] = demand_df.index.hour -n_lags = 5 -for hour in range(1, n_lags): - demand_df[f"Lag_{hour}"] = demand_df["Demand"].shift(hour) - -``` - -## 2. Train/validation/test split - - -```python -num_test_steps = 24 * 7 -demand_train = demand_df.iloc[:-num_test_steps, :].copy() -demand_test = demand_df.iloc[-num_test_steps:, :].copy() -features = ["Weekofyear", "Weekday", "Hour", "Temperature"] -features += [f"Lag_{hour}" for hour in range(1, n_lags)] - -X_train = demand_train.loc[ - ~np.any(demand_train[features].isnull(), axis=1), features -] -y_train = demand_train.loc[X_train.index, "Demand"] -X_test = demand_test.loc[:, features] -y_test = demand_test["Demand"] -``` - - -```python -plt.figure(figsize=(16, 5)) -plt.plot(y_train) -plt.plot(y_test) -plt.ylabel("Hourly demand (GW)") -``` - - - - - Text(0, 0.5, 'Hourly demand (GW)') - - - - - -![png](output_9_1.png) - - - -## 3. Optimize the base estimator - - -```python -model_params_fit_not_done = False -if model_params_fit_not_done: - # CV parameter search - n_iter = 100 - n_splits = 5 - tscv = TimeSeriesSplit(n_splits=n_splits) - random_state = 59 - rf_model = RandomForestRegressor(random_state=random_state) - rf_params = {"max_depth": randint(2, 30), "n_estimators": randint(10, 100)} - cv_obj = RandomizedSearchCV( - rf_model, - param_distributions=rf_params, - n_iter=n_iter, - cv=tscv, - scoring="neg_root_mean_squared_error", - random_state=random_state, - verbose=0, - n_jobs=-1, - ) - cv_obj.fit(X_train, y_train) - model = cv_obj.best_estimator_ -else: - # Model: Random Forest previously optimized with a cross-validation - model = RandomForestRegressor( - max_depth=10, n_estimators=50, random_state=59) -``` - -## 4. Estimate prediction intervals on the test set - - -```python -alpha = 0.05 -gap = 1 -cv_mapiets = BlockBootstrap( - n_resamplings=100, length=48, overlapping=True, random_state=59 -) -mapie_enbpi = MapieTimeSeriesRegressor( - model, method="enbpi", cv=cv_mapiets, agg_function="mean", n_jobs=-1 -) -``` - -### Without partial fit - - -```python -print("EnbPI, with no partial_fit, width optimization") -mapie_enbpi = mapie_enbpi.fit(X_train, y_train) -y_pred_npfit, y_pis_npfit = mapie_enbpi.predict( - X_test, alpha=alpha, ensemble=True, optimize_beta=True -) -coverage_npfit = regression_coverage_score( - y_test, y_pis_npfit[:, 0, 0], y_pis_npfit[:, 1, 0] -) -width_npfit = regression_mean_width_score( - y_pis_npfit[:, 0, 0], y_pis_npfit[:, 1, 0] -) -``` - - EnbPI, with no partial_fit, width optimization - - -### With partial fit - - -```python -print("EnbPI with partial_fit, width optimization") -mapie_enbpi = mapie_enbpi.fit(X_train, y_train) - -y_pred_pfit = np.zeros(y_pred_npfit.shape) -y_pis_pfit = np.zeros(y_pis_npfit.shape) -y_pred_pfit[:gap], y_pis_pfit[:gap, :, :] = mapie_enbpi.predict( - X_test.iloc[:gap, :], alpha=alpha, ensemble=True, optimize_beta=True -) -for step in range(gap, len(X_test), gap): - mapie_enbpi.partial_fit( - X_test.iloc[(step - gap):step, :], - y_test.iloc[(step - gap):step], - ) - ( - y_pred_pfit[step:step + gap], - y_pis_pfit[step:step + gap, :, :], - ) = mapie_enbpi.predict( - X_test.iloc[step:(step + gap), :], - alpha=alpha, - ensemble=True, - optimize_beta=True - ) -coverage_pfit = regression_coverage_score( - y_test, y_pis_pfit[:, 0, 0], y_pis_pfit[:, 1, 0] -) -width_pfit = regression_mean_width_score( - y_pis_pfit[:, 0, 0], y_pis_pfit[:, 1, 0] -) -``` - - EnbPI with partial_fit, width optimization - - -## V. Plot estimated prediction intervals on test set - - -```python -y_preds = [y_pred_npfit, y_pred_pfit] -y_pis = [y_pis_npfit, y_pis_pfit] -coverages = [coverage_npfit, coverage_pfit] -widths = [width_npfit, width_pfit] -``` - - -```python -def plot_forecast(y_train, y_test, y_preds, y_pis, coverages, widths, plot_coverage=True): - fig, axs = plt.subplots( - nrows=2, ncols=1, figsize=(14, 8), sharey="row", sharex="col" - ) - for i, (ax, w) in enumerate(zip(axs, ["without", "with"])): - ax.set_ylabel("Hourly demand (GW)") - ax.plot(y_train[int(-len(y_test)/2):], lw=2, label="Training data", c="C0") - ax.plot(y_test, lw=2, label="Test data", c="C1") - - ax.plot( - y_test.index, y_preds[i], lw=2, c="C2", label="Predictions" - ) - ax.fill_between( - y_test.index, - y_pis[i][:, 0, 0], - y_pis[i][:, 1, 0], - color="C2", - alpha=0.2, - label="Prediction intervals", - ) - title = f"EnbPI, {w} update of residuals. " - if plot_coverage: - title += f"Coverage:{coverages[i]:.3f} and Width:{widths[i]:.3f}" - ax.set_title(title) - ax.legend() - fig.tight_layout() - plt.show() -``` - - -```python -plot_forecast(y_train, y_test, y_preds, y_pis, coverages, widths) -``` - - - -![png](output_21_0.png) - - - -## VI. Forecast on test dataset with change point - -We will now see how MAPIE adapts its prediction intervals when a brutal changepoint arises in the test set. To simulate this, we will artificially decrease the electricity demand by 2 GW in the test set, aiming at simulating an effect, such as blackout or lockdown due to a pandemic, that was not taken into account by the model during its training. - -### Corrupt the dataset - - -```python -demand_df_corrupted = demand_df.copy() -demand_df_corrupted.Demand.iloc[-int(num_test_steps/2):] -= 2 -``` - - -```python -n_lags = 5 -for hour in range(1, n_lags): - demand_df[f"Lag_{hour}"] = demand_df["Demand"].shift(hour) -demand_train_corrupted = demand_df_corrupted.iloc[:-num_test_steps, :].copy() -demand_test_corrupted = demand_df_corrupted.iloc[-num_test_steps:, :].copy() - -X_train = demand_train_corrupted.loc[ - ~np.any(demand_train_corrupted[features].isnull(), axis=1), features -] -y_train = demand_train_corrupted.loc[X_train.index, "Demand"] -X_test = demand_test_corrupted.loc[:, features] -y_test = demand_test_corrupted["Demand"] -``` - - -```python -plt.figure(figsize=(16, 5)) -plt.ylabel("Hourly demand (GW)") -plt.plot(y_train) -plt.plot(y_test) -``` - - - - - [] - - - - - -![png](output_27_1.png) - - - -### Prediction intervals without partial fit - - -```python -print("EnbPI, with no partial_fit, width optimization") -mapie_enbpi = mapie_enbpi.fit(X_train, y_train) -y_pred_npfit, y_pis_npfit = mapie_enbpi.predict( - X_test, alpha=alpha, ensemble=True, optimize_beta=True -) -coverage_npfit = regression_coverage_score( - y_test, y_pis_npfit[:, 0, 0], y_pis_npfit[:, 1, 0] -) -width_npfit = regression_mean_width_score( - y_pis_npfit[:, 0, 0], y_pis_npfit[:, 1, 0] -) -``` - - EnbPI, with no partial_fit, width optimization - - -### Prediction intervals with partial fit - - -```python -print("EnbPI with partial_fit, width optimization") -mapie_enbpi = mapie_enbpi.fit(X_train, y_train) - -y_pred_pfit = np.zeros(y_pred_npfit.shape) -y_pis_pfit = np.zeros(y_pis_npfit.shape) -conformity_scores_pfit, lower_quantiles_pfit, higher_quantiles_pfit = [], [], [] -y_pred_pfit[:gap], y_pis_pfit[:gap, :, :] = mapie_enbpi.predict( - X_test.iloc[:gap, :], alpha=alpha, ensemble=True, optimize_beta=True -) -for step in range(gap, len(X_test), gap): - mapie_enbpi.partial_fit( - X_test.iloc[(step - gap):step, :], - y_test.iloc[(step - gap):step], - ) - ( - y_pred_pfit[step:step + gap], - y_pis_pfit[step:step + gap, :, :], - ) = mapie_enbpi.predict( - X_test.iloc[step:(step + gap), :], - alpha=alpha, - ensemble=True, - optimize_beta=True - ) - conformity_scores_pfit.append(mapie_enbpi.conformity_scores_) - lower_quantiles_pfit.append(mapie_enbpi.lower_quantiles_) - higher_quantiles_pfit.append(mapie_enbpi.higher_quantiles_) -coverage_pfit = regression_coverage_score( - y_test, y_pis_pfit[:, 0, 0], y_pis_pfit[:, 1, 0] -) -width_pfit = regression_mean_width_score( - y_pis_pfit[:, 0, 0], y_pis_pfit[:, 1, 0] -) -``` - - EnbPI with partial_fit, width optimization - - -### Plot estimated prediction intervals on test set - - -```python -y_preds = [y_pred_npfit, y_pred_pfit] -y_pis = [y_pis_npfit, y_pis_pfit] -coverages = [coverage_npfit, coverage_pfit] -widths = [width_npfit, width_pfit] -``` - - -```python -plot_forecast(y_train, y_test, y_preds, y_pis, coverages, widths, plot_coverage=False) -``` - - - -![png](output_34_0.png) - - - - -```python -window = 24 -rolling_coverage_pfit, rolling_coverage_npfit = [], [] -for i in range(window, len(y_test), 1): - rolling_coverage_pfit.append( - regression_coverage_score( - y_test[i-window:i], y_pis_pfit[i-window:i, 0, 0], y_pis_pfit[i-window:i, 1, 0] - ) - ) - rolling_coverage_npfit.append( - regression_coverage_score( - y_test[i-window:i], y_pis_npfit[i-window:i, 0, 0], y_pis_npfit[i-window:i, 1, 0] - ) - ) -``` - -### Marginal coverage on a 24-hour rolling window of prediction intervals - - -```python -plt.figure(figsize=(10, 5)) -plt.ylabel(f"Rolling coverage [{window} hours]") -plt.plot(y_test[window:].index, rolling_coverage_npfit, label="Without update of residuals") -plt.plot(y_test[window:].index, rolling_coverage_pfit, label="With update of residuals") -``` - - - - - [] - - - - - -![png](output_37_1.png) - - - -### Temporal evolution of the distribution of residuals used for estimating prediction intervals - - -```python -plt.figure(figsize=(7, 5)) -for i, j in enumerate([0, -1]): - plt.hist(conformity_scores_pfit[j], range=[-2.5, 0.5], bins=30, color=f"C{i}", alpha=0.3, label=f"Conformity scores(step={j})") - plt.axvline(lower_quantiles_pfit[j], ls="--", color=f"C{i}") - plt.axvline(higher_quantiles_pfit[j], ls="--", color=f"C{i}") -plt.legend(loc=[1, 0]) -``` - - - - - - - - - - -![png](output_39_1.png) - - diff --git a/notebooks/regression/tutorial_regression.md b/notebooks/regression/tutorial_regression.md deleted file mode 100644 index 5a45f2ecb..000000000 --- a/notebooks/regression/tutorial_regression.md +++ /dev/null @@ -1,764 +0,0 @@ ---- -jupyter: - jupytext: - formats: ipynb,md - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.13.6 - kernelspec: - display_name: mapie-notebooks - language: python - name: mapie-notebooks ---- - -# Tutorial for regression - - -In this tutorial, we compare the prediction intervals estimated by MAPIE on a -simple, one-dimensional, ground truth function - -$$ -f(x) = x \sin(x) -$$ - -Throughout this tutorial, we will answer the following questions: - -- How well do the MAPIE strategies capture the aleatoric uncertainty existing in the data? - -- How do the prediction intervals estimated by the resampling strategies - evolve for new *out-of-distribution* data? - -- How do the prediction intervals vary between regressor models? - -Throughout this tutorial, we estimate the prediction intervals first using -a polynomial function, and then using a boosting model, and a simple neural network. - -**For practical problems, we advise using the faster CV+ strategies. -For conservative prediction interval estimates, you can alternatively -use the CV-minmax strategies.** - - - -## 1. Estimating the aleatoric uncertainty of homoscedastic noisy data - - -Let's start by defining the $x \times \sin(x)$ function and another simple function -that generates one-dimensional data with normal noise uniformely in a given interval. - -```python -from typing import List, Dict, Union -``` - -```python -import warnings -warnings.filterwarnings("ignore") -import numpy as np -def x_sinx(x): - """One-dimensional x*sin(x) function.""" - return x*np.sin(x) -``` - -```python -def get_1d_data_with_constant_noise(funct, min_x, max_x, n_samples, noise): - """ - Generate 1D noisy data uniformely from the given function - and standard deviation for the noise. - """ - np.random.seed(59) - X_train = np.linspace(min_x, max_x, n_samples) - np.random.shuffle(X_train) - X_test = np.linspace(min_x, max_x, n_samples*5) - y_train, y_mesh, y_test = funct(X_train), funct(X_test), funct(X_test) - y_train += np.random.normal(0, noise, y_train.shape[0]) - y_test += np.random.normal(0, noise, y_test.shape[0]) - return X_train.reshape(-1, 1), y_train, X_test.reshape(-1, 1), y_test, y_mesh -``` - -We first generate noisy one-dimensional data uniformely on an interval. -Here, the noise is considered as *homoscedastic*, since it remains constant -over $x$. - -```python -min_x, max_x, n_samples, noise = -5, 5, 600, 0.5 -X_train, y_train, X_test, y_test, y_mesh = get_1d_data_with_constant_noise( - x_sinx, min_x, max_x, n_samples, noise -) -``` - -Let's visualize our noisy function. - -```python -import matplotlib.pyplot as plt -plt.xlabel("x") ; plt.ylabel("y") -plt.scatter(X_train, y_train, color="C0") -_ = plt.plot(X_test, y_mesh, color="C1") -``` - -As mentioned previously, we fit our training data with a simple -polynomial function. Here, we choose a degree equal to 10 so the function -is able to perfectly fit $x \times \sin(x)$. - -```python -from sklearn.preprocessing import PolynomialFeatures -from sklearn.linear_model import LinearRegression, QuantileRegressor -from sklearn.pipeline import Pipeline - -degree_polyn = 10 -polyn_model = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", LinearRegression()) - ] -) -polyn_model_quant = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", QuantileRegressor( - solver="highs", - alpha=0, - )) - ] -) -``` - -We then estimate the prediction intervals for all the strategies very easily with a -`fit` and `predict` process. The prediction interval's lower and upper bounds -are then saved in a DataFrame. Here, we set an alpha value of 0.05 -in order to obtain a 95% confidence for our prediction intervals. - -```python -from typing import Union, Optional -from typing_extensions import TypedDict -from mapie.regression import MapieRegressor -from mapie.quantile_regression import MapieQuantileRegressor -from mapie.subsample import Subsample -from sklearn.model_selection import train_test_split -Params = TypedDict("Params", {"method": str, "cv": Union[int, str, Subsample], "alpha": Optional[float]}) -STRATEGIES = { - "naive": Params(method="naive"), - "jackknife": Params(method="base", cv=-1), - "jackknife_plus": Params(method="plus", cv=-1), - "jackknife_minmax": Params(method="minmax", cv=-1), - "cv": Params(method="base", cv=10), - "cv_plus": Params(method="plus", cv=10), - "cv_minmax": Params(method="minmax", cv=10), - "jackknife_plus_ab": Params(method="plus", cv=Subsample(n_resamplings=50)), - "jackknife_minmax_ab": Params(method="minmax", cv=Subsample(n_resamplings=50)), - "conformalized_quantile_regression": Params(method="quantile", cv="split", alpha=0.05) -} -y_pred, y_pis = {}, {} -for strategy, params in STRATEGIES.items(): - if strategy == "conformalized_quantile_regression": - mapie = MapieQuantileRegressor(polyn_model_quant, **params) - mapie.fit(X_train, y_train, random_state=1) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test) - else: - mapie = MapieRegressor(polyn_model, **params) - mapie.fit(X_train, y_train) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=0.05) -``` - -Let’s now compare the target confidence intervals with the predicted intervals obtained -with the Jackknife+, Jackknife-minmax, CV+, CV-minmax, Jackknife+-after-Boostrap, and conformalized quantile regression (CQR) strategies. Note that for the Jackknife-after-Bootstrap method, we call the :class:`mapie.subsample.Subsample` object that allows us to train bootstrapped models. Note also that the CQR method is called with :class:`MapieQuantileRegressor` with a "split" strategy. - -```python -def plot_1d_data( - X_train, - y_train, - X_test, - y_test, - y_sigma, - y_pred, - y_pred_low, - y_pred_up, - ax=None, - title=None -): - ax.set_xlabel("x") ; ax.set_ylabel("y") - ax.fill_between(X_test, y_pred_low, y_pred_up, alpha=0.3) - ax.scatter(X_train, y_train, color="red", alpha=0.3, label="Training data") - ax.plot(X_test, y_test, color="gray", label="True confidence intervals") - ax.plot(X_test, y_test - y_sigma, color="gray", ls="--") - ax.plot(X_test, y_test + y_sigma, color="gray", ls="--") - ax.plot(X_test, y_pred, color="blue", alpha=0.5, label="Prediction intervals") - if title is not None: - ax.set_title(title) - ax.legend() -``` - -```python -strategies = ["jackknife_plus", "jackknife_minmax", "cv_plus", "cv_minmax", "jackknife_plus_ab", "conformalized_quantile_regression"] -n_figs = len(strategies) -fig, axs = plt.subplots(3, 2, figsize=(9, 13)) -coords = [axs[0, 0], axs[0, 1], axs[1, 0], axs[1, 1], axs[2, 0], axs[2, 1]] -for strategy, coord in zip(strategies, coords): - plot_1d_data( - X_train.ravel(), - y_train.ravel(), - X_test.ravel(), - y_mesh.ravel(), - np.full((X_test.shape[0]), 1.96*noise).ravel(), - y_pred[strategy].ravel(), - y_pis[strategy][:, 0, 0].ravel(), - y_pis[strategy][:, 1, 0].ravel(), - ax=coord, - title=strategy - ) -``` - -At first glance, the four strategies give similar results and the -prediction intervals are very close to the true confidence intervals. -Let’s confirm this by comparing the prediction interval widths over -$x$ between all strategies. - -```python -fig, ax = plt.subplots(1, 1, figsize=(7, 5)) -ax.axhline(1.96*2*noise, ls="--", color="k", label="True width") -for strategy in STRATEGIES: - ax.plot(X_test, y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0], label=strategy) -ax.set_xlabel("x") -ax.set_ylabel("Prediction Interval Width") -_ = ax.legend(fontsize=10, loc=[1, 0.4]) -``` - -As expected, the prediction intervals estimated by the Naive method -are slightly too narrow. The Jackknife, Jackknife+, CV, CV+, JaB, and J+aB give -similar widths that are very close to the true width. On the other hand, -the width estimated by Jackknife-minmax and CV-minmax are slightly too -wide. Note that the widths given by the Naive, Jackknife, and CV strategies -are constant because there is a single model used for prediction, -perturbed models are ignored at prediction time. - -It's interesting to observe that CQR strategy offers more varying width, -often giving much higher but also lower interval width than other methods, therefore, -with homoscedastic noise, CQR would not be the preferred method. - - -Let’s now compare the *effective* coverage, namely the fraction of test -points whose true values lie within the prediction intervals, given by -the different strategies. - -```python -import pandas as pd -from mapie.metrics import regression_coverage_score -pd.DataFrame([ - [ - regression_coverage_score( - y_test, y_pis[strategy][:, 0, 0], y_pis[strategy][:, 1, 0] - ), - ( - y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0] - ).mean() - ] for strategy in STRATEGIES -], index=STRATEGIES, columns=["Coverage", "Width average"]).round(2) -``` - -All strategies except the Naive one give effective coverage close to the expected -0.95 value (recall that alpha = 0.05), confirming the theoretical garantees. - - -## 2. Estimating the aleatoric uncertainty of heteroscedastic noisy data - - -Let's define again the $x \times \sin(x)$ function and another simple function -that generates one-dimensional data with normal noise uniformely in a given interval. - -```python -def x_sinx(x): - """One-dimensional x*sin(x) function.""" - return x*np.sin(x) -``` - -```python -def get_1d_data_with_heteroscedastic_noise(funct, min_x, max_x, n_samples, noise): - """ - Generate 1D noisy data uniformely from the given function - and standard deviation for the noise. - """ - np.random.seed(59) - X_train = np.linspace(min_x, max_x, n_samples) - np.random.shuffle(X_train) - X_test = np.linspace(min_x, max_x, n_samples*5) - y_train = funct(X_train) + (np.random.normal(0, noise, len(X_train)) * X_train) - y_test = funct(X_test) + (np.random.normal(0, noise, len(X_test)) * X_test) - y_mesh = funct(X_test) - return X_train.reshape(-1, 1), y_train, X_test.reshape(-1, 1), y_test, y_mesh -``` - -We first generate noisy one-dimensional data uniformely on an interval. -Here, the noise is considered as *heteroscedastic*, since it will increase linearly with $x$. - -```python -min_x, max_x, n_samples, noise = 0, 5, 300, 0.5 -X_train, y_train, X_test, y_test, y_mesh = get_1d_data_with_heteroscedastic_noise( - x_sinx, min_x, max_x, n_samples, noise -) -``` - -Let's visualize our noisy function. As x increases, the data becomes more noisy. - -```python -import matplotlib.pyplot as plt -plt.xlabel("x") ; plt.ylabel("y") -plt.scatter(X_train, y_train, color="C0") -_ = plt.plot(X_test, y_mesh, color="C1") -``` - -As mentioned previously, we fit our training data with a simple -polynomial function. Here, we choose a degree equal to 10 so the function -is able to perfectly fit $x \times \sin(x)$. - -```python -from sklearn.preprocessing import PolynomialFeatures -from sklearn.linear_model import LinearRegression, QuantileRegressor -from sklearn.pipeline import Pipeline - -degree_polyn = 10 -polyn_model = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", LinearRegression()) - ] -) -polyn_model_quant = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", QuantileRegressor( - solver="highs", - alpha=0, - )) - ] -) -``` - -We then estimate the prediction intervals for all the strategies very easily with a -`fit` and `predict` process. The prediction interval's lower and upper bounds -are then saved in a DataFrame. Here, we set an alpha value of 0.05 -in order to obtain a 95% confidence for our prediction intervals. - -```python -Params = TypedDict("Params", {"method": str, "cv": Union[int, str, Subsample], "alpha": Optional[float]}) -STRATEGIES = { - "naive": Params(method="naive"), - "jackknife": Params(method="base", cv=-1), - "jackknife_plus": Params(method="plus", cv=-1), - "jackknife_minmax": Params(method="minmax", cv=-1), - "cv": Params(method="base", cv=10), - "cv_plus": Params(method="plus", cv=10), - "cv_minmax": Params(method="minmax", cv=10), - "jackknife_plus_ab": Params(method="plus", cv=Subsample(n_resamplings=50)), - "conformalized_quantile_regression": Params(method="quantile", cv="split", alpha=0.05) -} -y_pred, y_pis = {}, {} -for strategy, params in STRATEGIES.items(): - if strategy == "conformalized_quantile_regression": - mapie = MapieQuantileRegressor(polyn_model_quant, **params) - mapie.fit(X_train, y_train, random_state=1) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test) - else: - mapie = MapieRegressor(polyn_model, **params) - mapie.fit(X_train, y_train) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=0.05) -``` - -Once again, let’s compare the target confidence intervals with prediction intervals obtained with the Jackknife+, Jackknife-minmax, CV+, CV-minmax, Jackknife+-after-Boostrap, and CQR strategies. - -```python -def plot_1d_data( - X_train, - y_train, - X_test, - y_test, - y_sigma, - y_pred, - y_pred_low, - y_pred_up, - ax=None, - title=None -): - ax.set_xlabel("x") ; ax.set_ylabel("y") - ax.fill_between(X_test, y_pred_low, y_pred_up, alpha=0.3) - ax.scatter(X_train, y_train, color="red", alpha=0.3, label="Training data") - ax.plot(X_test, y_test, color="gray", label="True confidence intervals") - ax.plot(X_test, y_test - y_sigma, color="gray", ls="--") - ax.plot(X_test, y_test + y_sigma, color="gray", ls="--") - ax.plot(X_test, y_pred, color="blue", alpha=0.5, label="Prediction intervals") - if title is not None: - ax.set_title(title) - ax.legend() -``` - -```python -strategies = ["jackknife_plus", "jackknife_minmax", "cv_plus", "cv_minmax", "jackknife_plus_ab", "conformalized_quantile_regression"] -n_figs = len(strategies) -fig, axs = plt.subplots(3, 2, figsize=(9, 13)) -coords = [axs[0, 0], axs[0, 1], axs[1, 0], axs[1, 1], axs[2, 0], axs[2, 1]] -for strategy, coord in zip(strategies, coords): - plot_1d_data( - X_train.ravel(), - y_train.ravel(), - X_test.ravel(), - y_mesh.ravel(), - (1.96*noise*X_test).ravel(), - y_pred[strategy].ravel(), - y_pis[strategy][:, 0, 0].ravel(), - y_pis[strategy][:, 1, 0].ravel(), - ax=coord, - title=strategy - ) -``` - -We can observe that all of the strategies except CQR seem to have similar constant prediction intervals. -On the other hand, the CQR strategy offers a solution that adapts the prediction -intervals to the local noise. - -```python -fig, ax = plt.subplots(1, 1, figsize=(7, 5)) -ax.plot(X_test, 1.96*2*noise*X_test, ls="--", color="k", label="True width") -for strategy in STRATEGIES: - ax.plot(X_test, y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0], label=strategy) -ax.set_xlabel("x") -ax.set_ylabel("Prediction Interval Width") -_ = ax.legend(fontsize=10, loc=[1, 0.4]) -``` - -One can observe that all the strategies behave in a similar way as in the first example shown previously. One exception is the CQR method which takes into account the heteroscedasticity of the data. In this method we observe very low interval widths at low values of $x$. This is the only method that even slightly follows the true width, and therefore is the preferred method for heteroscedastic data. Notice also that the true width is greater (lower) than the predicted width from the other methods at $x \gtrapprox 3$ ($x \leq 3$). This means that while the marginal coverage correct for these methods, the conditional coverage is likely not guaranteed as we will observe in the next figure. - -```python -def get_heteroscedastic_coverage(y_test, y_pis, STRATEGIES, bins): - recap ={} - for i in range(len(bins)-1): - bin1, bin2 = bins[i], bins[i+1] - name = f"[{bin1}, {bin2}]" - recap[name] = [] - for strategy in STRATEGIES: - indices = np.where((X_test>=bins[i])*(X_test<=bins[i+1])) - y_test_trunc = np.take(y_test, indices) - y_low_ = np.take(y_pis[strategy][:, 0, 0], indices) - y_high_ = np.take(y_pis[strategy][:, 1, 0], indices) - score_coverage = regression_coverage_score(y_test_trunc[0], y_low_[0], y_high_[0]) - recap[name].append(score_coverage) - recap_df = pd.DataFrame(recap, index=STRATEGIES) - return recap_df -``` - -```python -bins = [0, 1, 2, 3, 4, 5] -heteroscedastic_coverage = get_heteroscedastic_coverage(y_test, y_pis, STRATEGIES, bins) -``` - -```python -fig = plt.figure() -heteroscedastic_coverage.T.plot.bar(figsize=(12, 4), alpha=0.7) -plt.axhline(0.95, ls="--", color="k") -plt.ylabel("Conditional coverage") -plt.xlabel("x bins") -plt.xticks(rotation=0) -plt.ylim(0.8, 1.0) -plt.legend(loc=[1, 0]) -``` - -Let’s now conclude by summarizing the *effective* coverage, namely the fraction of test -points whose true values lie within the prediction intervals, given by -the different strategies. - -```python -import pandas as pd -from mapie.metrics import regression_coverage_score -pd.DataFrame([ - [ - regression_coverage_score( - y_test, y_pis[strategy][:, 0, 0], y_pis[strategy][:, 1, 0] - ), - ( - y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0] - ).mean() - ] for strategy in STRATEGIES -], index=STRATEGIES, columns=["Coverage", "Width average"]).round(2) -``` - -All the strategies have the wanted coverage, however, we notice that the CQR strategy has much lower interval width than all the other methods, therefore, with heteroscedastic noise, CQR would be the preferred method. - - -## 3. Estimating the epistemic uncertainty of out-of-distribution data - - -Let’s now consider one-dimensional data without noise, but normally distributed. -The goal is to explore how the prediction intervals evolve for new data -that lie outside the distribution of the training data in order to see how the strategies -can capture the *epistemic* uncertainty. -For a comparison of the epistemic and aleatoric uncertainties, please have a look at this -[source](https://en.wikipedia.org/wiki/Uncertainty_quantification). - - -Lets" start by generating and showing the data. - -```python -def get_1d_data_with_normal_distrib(funct, mu, sigma, n_samples, noise): - """ - Generate noisy 1D data with normal distribution from given function - and noise standard deviation. - """ - np.random.seed(59) - X_train = np.random.normal(mu, sigma, n_samples) - X_test = np.arange(mu-4*sigma, mu+4*sigma, sigma/20.) - y_train, y_mesh, y_test = funct(X_train), funct(X_test), funct(X_test) - y_train += np.random.normal(0, noise, y_train.shape[0]) - y_test += np.random.normal(0, noise, y_test.shape[0]) - return X_train.reshape(-1, 1), y_train, X_test.reshape(-1, 1), y_test, y_mesh -``` - -```python -mu = 0 ; sigma = 2 ; n_samples = 1000 ; noise = 0. -X_train, y_train, X_test, y_test, y_mesh = get_1d_data_with_normal_distrib( - x_sinx, mu, sigma, n_samples, noise -) -``` - -```python -plt.xlabel("x") ; plt.ylabel("y") -plt.scatter(X_train, y_train, color="C0") -_ = plt.plot(X_test, y_test, color="C1") -``` - -As before, we estimate the prediction intervals using a polynomial -function of degree 10 and show the results for the Jackknife+ and CV+ -strategies. - -```python -polyn_model_quant = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", QuantileRegressor( - solver="highs-ds", - alpha=0, - )) - ] -) -Params = TypedDict("Params", {"method": str, "cv": Union[int, str, Subsample], "alpha": Optional[float]}) -STRATEGIES = { - "naive": Params(method="naive"), - "jackknife": Params(method="base", cv=-1), - "jackknife_plus": Params(method="plus", cv=-1), - "jackknife_minmax": Params(method="minmax", cv=-1), - "cv": Params(method="base", cv=10), - "cv_plus": Params(method="plus", cv=10), - "cv_minmax": Params(method="minmax", cv=10), - "jackknife_plus_ab": Params(method="plus", cv=Subsample(n_resamplings=50)), - "jackknife_minmax_ab": Params(method="minmax", cv=Subsample(n_resamplings=50)), - "conformalized_quantile_regression": Params(method="quantile", cv="split", alpha=0.05) -} -y_pred, y_pis = {}, {} -for strategy, params in STRATEGIES.items(): - if strategy == "conformalized_quantile_regression": - mapie = MapieQuantileRegressor(polyn_model_quant, **params) - mapie.fit(X_train, y_train, random_state=1) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test) - else: - mapie = MapieRegressor(polyn_model, **params) - mapie.fit(X_train, y_train) - y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=0.05) -``` - -```python -strategies = ["jackknife_plus", "jackknife_minmax", "cv_plus", "cv_minmax", "jackknife_plus_ab", "conformalized_quantile_regression"] -n_figs = len(strategies) -fig, axs = plt.subplots(3, 2, figsize=(9, 13)) -coords = [axs[0, 0], axs[0, 1], axs[1, 0], axs[1, 1], axs[2, 0], axs[2, 1]] -for strategy, coord in zip(strategies, coords): - plot_1d_data( - X_train.ravel(), - y_train.ravel(), - X_test.ravel(), - y_mesh.ravel(), - 1.96*noise, - y_pred[strategy].ravel(), - y_pis[strategy][:, 0, :].ravel(), - y_pis[strategy][:, 1, :].ravel(), - ax=coord, - title=strategy - ) -``` - -At first glance, our polynomial function does not give accurate -predictions with respect to the true function when $|x > 6|$. -The prediction intervals estimated with the Jackknife+ do not seem to -increase significantly, unlike the CV+ method whose prediction intervals -capture a high uncertainty when $x > 6$. - - -Let's now compare the prediction interval widths between all strategies. - - -```python -fig, ax = plt.subplots(1, 1, figsize=(7, 5)) -ax.set_yscale("log") -for strategy in STRATEGIES: - ax.plot(X_test, y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0], label=strategy) -ax.set_xlabel("x") -ax.set_ylabel("Prediction Interval Width") -ax.legend(fontsize=10, loc=[1, 0.4]); -``` - -The prediction interval widths start to increase exponentially -for $|x| > 4$ for the CV+, CV-minmax, Jackknife-minmax, and quantile -strategies. On the other hand, the prediction intervals estimated by -Jackknife+ remain roughly constant until $|x| \sim 5$ before -increasing. -The CQR strategy seems to perform well, however, on the extreme values -of the data the quantile regression fails to give reliable results as it outputs -negative value for the prediction intervals. This occurs because the quantile -regressor with quantile $1 - \alpha/2$ gives higher values than the quantile -regressor with quantile $\alpha/2$. Note that a warning will be issued when -this occurs. - -```python -pd.DataFrame([ - [ - regression_coverage_score( - y_test, y_pis[strategy][:, 0, 0], y_pis[strategy][:, 1, 0] - ), - ( - y_pis[strategy][:, 1, 0] - y_pis[strategy][:, 0, 0] - ).mean() - ] for strategy in STRATEGIES -], index=STRATEGIES, columns=["Coverage", "Width average"]).round(3) -``` - -In conclusion, the Jackknife-minmax, CV+, CV-minmax, or Jackknife-minmax-ab strategies are more -conservative than the Jackknife+ strategy, and tend to result in more -reliable coverages for *out-of-distribution* data. It is therefore -advised to use the three former strategies for predictions with new -out-of-distribution data. -Note however that there are no theoretical guarantees on the coverage level -for out-of-distribution data. -Here it's important to note that the CQR strategy should not be taken into account for -width prediction, and it is abundantly clear from the negative width coverage that -is observed in these results. - - -## 4. Estimating the uncertainty with different sklearn-compatible regressors - - -MAPIE can be used with any kind of sklearn-compatible regressor. Here, we -illustrate this by comparing the prediction intervals estimated by the CV+ method using -different models: - -- the same polynomial function as before. - -- a XGBoost model using the Scikit-learn API. - -- a simple neural network, a Multilayer Perceptron with three dense layers, using the KerasRegressor wrapper. - -Once again, let’s use our noisy one-dimensional data obtained from a -uniform distribution. - -```python -min_x, max_x, n_samples, noise = -5, 5, 100, 0.5 -X_train, y_train, X_test, y_test, y_mesh = get_1d_data_with_constant_noise( - x_sinx, min_x, max_x, n_samples, noise -) -``` - -```python -plt.xlabel("x") ; plt.ylabel("y") -plt.plot(X_test, y_mesh, color="C1") -_ = plt.scatter(X_train, y_train) -``` - -Let's then define the models. The boosing model considers 100 shallow trees with a max depth of 2 while -the Multilayer Perceptron has two hidden dense layers with 20 neurons each followed by a relu activation. - - -```python -import os -os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3" # disable debugging logs from Tensorflow -from tensorflow.keras import Sequential -from tensorflow.keras.layers import Dense -from scikeras.wrappers import KerasRegressor -def mlp(): - """ - Two-layer MLP model - """ - model = Sequential([ - Dense(units=20, input_shape=(1,), activation="relu"), - Dense(units=20, activation="relu"), - Dense(units=1) - ]) - model.compile(loss="mean_squared_error", optimizer="adam") - return model -``` - -```python -polyn_model = Pipeline( - [ - ("poly", PolynomialFeatures(degree=degree_polyn)), - ("linear", LinearRegression()) - ] -) -``` - -```python -from xgboost import XGBRegressor -xgb_model = XGBRegressor( - max_depth=2, - n_estimators=100, - tree_method="hist", - random_state=59, - learning_rate=0.1, - verbosity=0, - nthread=-1 -) -mlp_model = KerasRegressor( - build_fn=mlp, - epochs=500, - verbose=0 -) -``` - -Let's now use MAPIE to estimate the prediction intervals using the CV+ method -and compare their prediction interval. - -```python -models = [polyn_model, xgb_model, mlp_model] -model_names = ["polyn", "xgb", "mlp"] -prediction_interval = {} -for name, model in zip(model_names, models): - mapie = MapieRegressor(model, method="plus", cv=5) - mapie.fit(X_train, y_train) - y_pred[name], y_pis[name] = mapie.predict(X_test, alpha=0.05) -``` - -```python -fig, axs = plt.subplots(1, 3, figsize=(20, 6)) -for name, ax in zip(model_names, axs): - plot_1d_data( - X_train.ravel(), - y_train.ravel(), - X_test.ravel(), - y_mesh.ravel(), - 1.96*noise, - y_pred[name].ravel(), - y_pis[name][:, 0, 0].ravel(), - y_pis[name][:, 1, 0].ravel(), - ax=ax, - title=name - ) -``` - -```python -fig, ax = plt.subplots(1, 1, figsize=(7, 5)) -for name in model_names: - ax.plot(X_test, y_pis[name][:, 1, 0] - y_pis[name][:, 0, 0]) -ax.axhline(1.96*2*noise, ls="--", color="k") -ax.set_xlabel("x") -ax.set_ylabel("Prediction Interval Width") -ax.legend(model_names + ["True width"], fontsize=8); -``` - -As expected with the CV+ method, the prediction intervals are a bit -conservative since they are slightly wider than the true intervals. -However, the CV+ method on the three models gives very promising results -since the prediction intervals closely follow the true intervals with $x$.