Skip to content

Commit

Permalink
DOC: update v1 migration guide and corresponding docstring examples
Browse files Browse the repository at this point in the history
  • Loading branch information
Valentin-Laurent committed Jan 9, 2025
1 parent 344be4f commit e70191a
Show file tree
Hide file tree
Showing 2 changed files with 162 additions and 111 deletions.
225 changes: 154 additions & 71 deletions doc/v1_migration_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,44 @@ In MAPIE v0.9, ``MapieRegressor`` managed all conformal regression methods under
+--------------------+--------------------------------------------------------------------------+


2. Key parameter changes
2. Method changes
-----------------

In MAPIE v1, the conformal prediction workflow is more streamlined and modular, with distinct methods for training, calibration, and prediction. The calibration process in v1 consists of four steps.

Step 1: Data splitting
~~~~~~~~~~~~~~~~~~~~~~
In v0.9, Data splitting is done within two-phase process. First, data ``(X, y)`` was divided into training ``(X_train, y_train)`` and test ``(X_test, y_test)`` sets using ``train_test_split`` from ``sklearn``. In the second phase, the split between training and calibration was either done manually or handled internally by ``MapieRegressor``.

In v1, a ``conf_split`` function has been introduced to split the data ``(X, y)`` into training ``(X_train, y_train)``, calibration ``(X_calib, y_calib)``, and test sets ``(X_test, y_test)``.

This new approach in v1 gives users more control over data splitting, making it easier to manage training, calibration, and testing phases explicitly. The ``CrossConformalRegressor`` is an exception, where train/calibration splitting happens internally because cross-validation requires more granular control over data splits.

Step 2 & 3: Model training and calibration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In v0.9, the ``fit`` method handled both model training and calibration.

In v1.0: MAPIE separates between the training and calibration:

- ``.fit()`` method:
- In v1, ``fit`` only trains the model on training data, without handling calibration.
- Additional fitting parameters, like ``sample_weight``, should be included in ``fit_params``, keeping this method focused on training alone.

- ``.conformalize()`` method:
- This new method performs calibration after fitting, using separate calibration data ``(X_calib, y_calib)``.
- ``predict_params`` can be passed here, allowing independent control over calibration and prediction stages.

Step 4: Making predictions (``predict`` and ``predict_set`` methods)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In MAPIE v0.9, both point predictions and prediction intervals were produced through the ``predict`` method.

MAPIE v1 introduces two distinct methods for prediction:
- ``.predict_set()`` is dedicated to generating prediction intervals (i.e., lower and upper bounds), clearly separating interval predictions from point predictions.
- ``.predict()`` now focuses solely on producing point predictions.



3. Key parameter changes
------------------------

``conformity_score``
Expand All @@ -53,12 +90,19 @@ Specifies the approach for calculating prediction intervals, especially in advan
- **v0.9**: Part of ``MapieRegressor``. Configured for the main prediction process.
- **v1**: Specific to ``CrossConformalRegressor`` and ``JackknifeAfterBootstrapRegressor``, indicating the interval calculation approach (``"base"``, ``"plus"``, or ``"minmax"``).

``cv`` (includes ``groups``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``cv``
~~~~~~~
The ``cv`` parameter manages the cross-validation configuration, accepting either an integer to indicate the number of data splits or a ``BaseCrossValidator`` object for custom data splitting.

- **v0.9**: The ``cv`` parameter was included in ``MapieRegressor``, where it handled cross-validation. The option ``cv="prefit"`` was available for models that were already pre-trained.
- **v1**: The ``cv`` parameter is now only present in ``CrossConformalRegressor``, with the ``prefit`` option removed. Additionally, the ``groups`` parameter was removed from the ``fit`` method, allowing groups to be directly passed to ``cv`` for processing.
- **v1**: The ``cv`` parameter is now only present in ``CrossConformalRegressor``, with the ``prefit`` option removed.

``groups``
~~~~~~~~~~~
The ``groups`` parameter is used to specify group labels for cross-validation, ensuring that the same group is not present in both training and calibration sets.

- **v0.9**: Passed as a parameter to the ``fit`` method.
- **v1**: The ``groups`` present is now only present in ``CrossConformalRegressor``. It is passed in the ``.conformalize()`` method instead of the ``.fit()`` method. In other classes (like ``SplitConformalRegressor``), groups can be directly handled by the user during data splitting.

``prefit``
~~~~~~~~~~
Expand All @@ -81,12 +125,12 @@ Defines additional parameters exclusively for prediction.
- **v0.9**: Passed additional parameters in a flexible but less explicit manner, sometimes mixed within training configurations.
- **v1**: Now structured as a dedicated dictionary, ``predict_params``, to be used during calibration (``conformalize`` method) and prediction stages, ensuring no overlap with training parameters.

``aggregation_method``
~~~~~~~~~~~~~~~~~~~~~~
The ``aggregation_method`` parameter defines how predictions from multiple conformal regressors are aggregated when making point predictions.
``agg_function``, ``aggregation_method``, ``aggregate_predictions``, and ``ensemble``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The aggregation method and technique for combining predictions in ensemble methods.

- **v0.9**: Previously, the ``agg_function`` parameter specified the aggregation method, allowing options such as the mean or median of predictions. This was applicable only when using ensemble methods by setting ``ensemble=True`` in the ``predict`` method.
- **v1**: The ``agg_function`` parameter has been renamed to ``aggregation_method`` for clarity. It now serves the same purpose in selecting an aggregation technique but is specified at prediction time rather than during class initialization. Additionally, the ``ensemble`` parameter has been removed, as ``aggregation_method`` is relevant only to the ``CrossConformalRegressor`` and ``JackknifeAfterBootstrapRegressor`` classes.
- **v0.9**: Previously, the ``agg_function`` parameter had two usage: to aggregate predictions when setting ``ensemble=True`` in the ``predict`` method, and to specify the aggregation technique in ``JackknifeAfterBootstrapRegressor``.
- **v1**: The ``agg_function`` parameter has been split into two distinct parameters: ``aggregate_predictions`` and ``aggregation_method``. ``aggregate_predictions`` is specific to ``CrossConformalRegressor``, and it specifies how predictions from multiple conformal regressors are aggregated when making point predictions. ``aggregation_method`` is specific to ``JackknifeAfterBootstrapRegressor``, and it specifies the aggregation technique for combining predictions across different bootstrap samples during calibration.

``Other parameters``
~~~~~~~~~~~~~~~~~~~~
Expand All @@ -101,118 +145,157 @@ Some parameters' name have been improved for clarity:
- ``symmetry``-> ``symmetric_intervals``


3. Method changes
-----------------

In MAPIE v1, the conformal prediction workflow is more streamlined and modular, with distinct methods for training, calibration, and prediction. The calibration process in v1 consists of four steps.

Step 1: Data splitting
~~~~~~~~~~~~~~~~~~~~~~
In v0.9, Data splitting is done within two-phase process. First, data ``(X, y)`` was divided into training ``(X_train, y_train)`` and test ``(X_test, y_test)`` sets using ``train_test_split`` from ``sklearn``. In the second phase, the split between training and calibration was either done manually or handled internally by ``MapieRegressor``.

In v1, a ``conf_split`` function has been introduced to split the data ``(X, y)`` into training ``(X_train, y_train)``, calibration ``(X_calib, y_calib)``, and test sets ``(X_test, y_test)``.

This new approach in v1 gives users more control over data splitting, making it easier to manage training, calibration, and testing phases explicitly. The ``CrossConformalRegressor`` is an exception, where train/calibration splitting happens internally because cross-validation requires more granular control over data splits.

Step 2 & 3: Model training and calibration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In v0.9, the ``fit`` method handled both model training and calibration.

In v1.0: MAPIE separates between the training and calibration:

- ``.fit()`` method:
- In v1, ``fit`` only trains the model on training data, without handling calibration.
- Additional fitting parameters, like ``sample_weight``, should be included in ``fit_params``, keeping this method focused on training alone.

- ``.conformalize()`` method:
- This new method performs calibration after fitting, using separate calibration data ``(X_calib, y_calib)``.
- ``predict_params`` can be passed here, allowing independent control over calibration and prediction stages.

Step 4: Making predictions (``predict`` and ``predict_set`` methods)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In MAPIE v0.9, both point predictions and prediction intervals were produced through the ``predict`` method.
4. Migration example: MAPIE v0.9 to MAPIE v1
----------------------------------------------------------------------------------------

MAPIE v1 introduces two distinct methods for prediction:
- ``.predict_set()`` is dedicated to generating prediction intervals (i.e., lower and upper bounds), clearly separating interval predictions from point predictions.
- ``.predict()`` now focuses solely on producing point predictions.
Below is a side-by-side example of code in MAPIE v0.9 and its equivalent in MAPIE v1 using the new modular classes and methods.

Example 1: Split Conformal Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

4. Migration example: MAPIE v0.9 to MAPIE v1
--------------------------------------------
Description
############
Split conformal prediction is a widely used method for generating prediction intervals, it splits the data into training, calibration, and test sets. The model is trained on the training set, calibrated on the calibration set, and then used to make predictions on the test set. In `MAPIE v1`, the `SplitConformalRegressor` replaces the older `MapieRegressor` with a more modular design and simplified API.

Below is a side-by-side example of code in MAPIE v0.9 and its equivalent in MAPIE v1 using the new modular classes and methods.
MAPIE v0.9 Code
###############

MAPIE v0.9 code
~~~~~~~~~~~~~~~
Below is a MAPIE v0.9 code for split conformal prediction in case of pre-fitted model:

.. code-block:: python
from sklearn.linear_model import LinearRegression
from mapie.estimator import MapieRegressor
from mapie.conformity_scores import GammaConformityScore
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# Step 1: Split data
X_train, X_conf_test, y_train, y_conf_test = train_test_split(X, y, test_size=0.4)
X_conf, X_test, y_conf, y_test = train_test_split(X_conf_test, y_conf_test, test_size=0.5)
X_train, X_conf_test, y_train, y_conf_test = train_test_split(X, y)
X_conf, X_test, y_conf, y_test = train_test_split(X_conf_test, y_conf_test)
# Step 2: Train the model on the training set
prefit_model = LinearRegression().fit(X_train, y_train)
# Step 3: Initialize MapieRegressor with the prefit model and gamma conformity score
v0 = MapieRegressor(
estimator=prefit_model,
cv="prefit",
conformity_score=GammaConformityScore()
)
# Step 4: Fit MAPIE on the calibration set
v0.fit(X_conf, y_conf)
# Step 5: Make predictions with confidence intervals
prediction_intervals_v0 = v0.predict(X_test, alpha=0.1)[1][:, :, 0]
prediction_points_v0 = v0.predict(X_test)
Equivalent MAPIE v1 code
~~~~~~~~~~~~~~~~~~~~~~~~
########################

Below is the equivalent MAPIE v1 code for split conformal prediction:

.. code-block:: python
from sklearn.linear_model import LinearRegression
from mapie.estimator import SplitConformalRegressor
from mapie.utils import conf_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
# Step 1: Split data with conf_split (returns X_train, y_train, X_conf, y_conf, X_test, y_test)
X_train, y_train, X_conf, y_conf, X_test, y_test = conf_split(X, y)
X_train, X_conf_test, y_train, y_conf_test = train_test_split(X, y)
X_conf, X_test, y_conf, y_test = train_test_split(X_conf_test, y_conf_test)
# Step 2: Train the model on the training set
prefit_model = LinearRegression().fit(X_train, y_train)
# Step 3: Initialize SplitConformalRegressor with the prefit model, gamma conformity score, and prefit option
v1 = SplitConformalRegressor(
estimator=prefit_model,
confidence_level=0.9, # equivalent to alpha=0.1 in v0.9
confidence_level=0.9,
conformity_score="gamma",
prefit=True
)
# Step 4: Calibrate the model with the conformalize method on the calibration set
# Here we're not using v1.fit(), because the provided model is already fitted
v1.conformalize(X_conf, y_conf)
# Step 5: Make predictions with confidence intervals
prediction_intervals_v1 = v1.predict_set(X_test)
prediction_points_v1 = v1.predict(X_test)
Example 2: Cross-Conformal Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

5. Additional migration examples
--------------------------------
Description
############

We will provide further migration examples :
Cross-conformal prediction extends split conformal prediction by using multiple cross-validation folds to improve the efficiency of the prediction intervals. In MAPIE v1, `CrossConformalRegressor`` replaces the older `MapieRegressor`` for this purpose.

MAPIE v0.9 code
###############

- **Prefit Models**: Using ``SplitConformalRegressor`` with ``prefit=True``
- **Non-Prefit Models**:
Below is a MAPIE v0.9 code for cross-conformal prediction:

- ``SplitConformalRegressor`` without ``prefit``
- ``CrossConformalRegressor`` with ``fit_params`` (e.g., ``sample_weight``) and ``predict_params``
- ``ConformalizedQuantileRegressor`` with ``symmetric_intervals=False``
- ``JackknifeAfterBootstrapRegressor`` with custom configurations
.. code-block:: python
from sklearn.ensemble import RandomForestRegressor
from mapie.estimator import MapieRegressor
from mapie.conformity_scores import CrossConformalConformityScore
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)
groups = np.random.randint(0, 3, X.shape[0])
sample_weight = np.random.rand(X.shape[0])
X_train, X_test, y_train, y_test = train_test_split(X, y)
regression_model = RandomForestRegressor(
n_estimators=100,
max_depth=5
)
v0 = MapieRegressor(
estimator=regression_model,
cv=3,
conformity_score=CrossConformalConformityScore(),
agg_function="median",
)
v0.fit(X_conf, y_conf, sample_weight=sample_weight, groups=groups)
prediction_intervals_v0 = v0.predict(X_test, alpha=0.1)[1][:, :, 0]
prediction_points_v0 = v0.predict(X_test, ensemble=True)
Equivalent MAPIE v1 code
########################

Below is the equivalent MAPIE v1 code for cross-conformal prediction:

.. code-block:: python
from sklearn.ensemble import RandomForestRegressor
from mapie.estimator import CrossConformalRegressor
from mapie.utils import conf_split
from sklearn.datasets import make_regression
X_full, y_full = make_regression(n_samples=100, n_features=2, noise=0.1)
groups = np.random.randint(0, 3, X_full.shape[0])
sample_weight = np.random.rand(X_full.shape[0])
X, X_test, y, y_test = train_test_split(X_full, y_full)
regression_model = RandomForestRegressor(
n_estimators=100,
max_depth=5
)
v1 = CrossConformalRegressor(
estimator=regression_model,
confidence_level=0.9,
cv=3,
conformity_score="absolute",
)
v1.fit(X, y, fit_params={"sample_weight": sample_weight})
v1.conformalize(X, y, groups=groups)
prediction_intervals_v1 = v1.predict_set(X_test)
prediction_points_v1 = v1.predict(X_test, aggregate_predictions="median")
48 changes: 8 additions & 40 deletions mapie_v1/regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,16 +80,8 @@ class SplitConformalRegressor:
>>> from sklearn.linear_model import Ridge
>>> X, y = make_regression(n_samples=500, n_features=2, noise=1.0)
>>> X_train, X_test_initial, y_train, y_test_initial = train_test_split(
... X,
... y,
... test_size=0.2,
... )
>>> X_conf, X_test, y_conf, y_test = train_test_split(
... X_test_initial,
... y_test_initial,
... test_size=0.5,
... )
>>> X_train, X_conf_test, y_train, y_conf_test = train_test_split(X, y)
>>> X_conf, X_test, y_conf, y_test = train_test_split(X_conf_test, y_conf_test)
>>> mapie_regressor = SplitConformalRegressor(
... estimator=Ridge(),
Expand Down Expand Up @@ -337,16 +329,8 @@ class CrossConformalRegressor:
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import Ridge
>>> X_initial, y_initial = make_regression(
... n_samples=500,
... n_features=2,
... noise=1.0
... )
>>> X, X_test, y, y_test = train_test_split(
... X_initial,
... y_initial,
... test_size=0.2,
... )
>>> X_full, y_full = make_regression(n_samples=500,n_features=2,noise=1.0)
>>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_regressor = CrossConformalRegressor(
... estimator=Ridge(),
Expand Down Expand Up @@ -649,16 +633,8 @@ class JackknifeAfterBootstrapRegressor:
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import Ridge
>>> X_initial, y_initial = make_regression(
... n_samples=500,
... n_features=2,
... noise=1.0
... )
>>> X, X_test, y, y_test = train_test_split(
... X_initial,
... y_initial,
... test_size=0.2,
... )
>>> X_full, y_full = make_regression(n_samples=500,n_features=2,noise=1.0)
>>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_regressor = JackknifeAfterBootstrapRegressor(
... estimator=Ridge(),
Expand Down Expand Up @@ -967,16 +943,8 @@ class ConformalizedQuantileRegressor:
>>> from sklearn.linear_model import QuantileRegressor
>>> X, y = make_regression(n_samples=500, n_features=2, noise=1.0)
>>> X_train, X_test_initial, y_train, y_test_initial = train_test_split(
... X,
... y,
... test_size=0.2,
... )
>>> X_conf, X_test, y_conf, y_test = train_test_split(
... X_test_initial,
... y_test_initial,
... test_size=0.5,
... )
>>> X_train, X_conf_test, y_train, y_conf_test = train_test_split(X, y)
>>> X_conf, X_test, y_conf, y_test = train_test_split(X_conf_test, y_conf_test)
>>> mapie_regressor = ConformalizedQuantileRegressor(
... estimator=QuantileRegressor(),
Expand Down

0 comments on commit e70191a

Please sign in to comment.