Skip to content

PierreLeveau/automl

 
 

Repository files navigation

pre-commit Code style: black Code style: flake8 Python 3.7

Kili AutoML

AutoML is a lightweight library to create ML models in a data-centric AI way:

  1. Label on Kili
  2. Train a model with AutoML and evaluate its performance in one line of code
  3. Push predictions to Kili to accelerate the labeling in one line of code
  4. Prioritize labeling on Kili to label the data that will improve your model the most first

Iterate.

Once you are satisfied with the performance, in one line of code, serve the model and monitor the performance keeping a human in the loop with Kili.

Quickstart

You can try automl on a mock image classification project with this notebook.

Open In Colab

Installation

Creating a new conda or virtualenv before cloning is recommended because we install a lot of packages:

conda create --name automl python=3.7
conda activate automl
git clone https://github.com/kili-technology/automl.git
cd automl
git submodule update --init

then install the requirements:

pip install -r kiliautoml/utils/ultralytics/yolov5/requirements.txt
pip install -e .

Usage

We made AutoML very simple to use. The following sections detail how to call the main methods.

Train a model

We train the model with the following command line:

kiliautoml train \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

By default, the library uses Weights and Biases to track the training and the quality of the predictions. The model is then stored in the cache of the AutoML library in HOME/.cache/kili/automl. Kili automl training does the following:

  • Selects the models related to the tasks declared in the project ontology.
  • Retrieve Kili's asset data and convert it into the input format for each model.
  • Finetunes the model on the input data.
  • Outputs the model loss.

Here are the supported ML frameworks and the tasks they are used for.

  • Hugging Face (NER, Text Classification)
  • YOLOv5 (Object Detection)
  • spaCy (coming soon)
  • Simple Transformers (coming soon)
  • Catalyst (coming soon)
  • XGBoost & LightGBM (coming soon)

Compute model loss to infer when you can stop labeling.

Train a model

Push predictions to Kili

Once trained, the models are used to predict the labels, add preannotations on the assets that have not yet been labeled by the annotators. The annotators can then validate or correct the preannotations in the Kili user interface.

kiliautoml predict \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

Using trained models to push pre-annotations onto unlabeled assets typically speeds up labeling by 10%.

Predict a model

You can also use a model coming from another project, if they have the same ontology:

kiliautoml predict \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID \
    --from-project $ANOTHER_KILI_PROJECT_ID

Prioritize labeling on Kili

Once roughly 10 percent of the assets in a project have been labeled, it is possible to prioritize the remaining assets to be labeled on the project in order to prioritize the assets that will best improve the performance of the model.

kiliautoml prioritize \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

This command will change the priority queue of the assets to be labeled. To do this, AutoML uses a mix between diversity sampling and uncertainty sampling.

Label errors on Kili

Note: for image classification projects only.

The error is human, fortunately there are methods to detect potential annotation problems. label_errors.py allows to identify potential problems and create a 'potential_label_error' filter on the project's asset exploration view:

kiliautoml label_errors \
    --api-key $KILI_API_KEY \
    --project-id $KILI_PROJECT_ID

ML Tasks

AutoML currently supports the following tasks:

  • Natural Language Processing (NLP)
  • Image
    • Object detection
    • Image Classification

Disclaimer

AutoML is a utility library that trains and serves models. It is your responsibility to determine whether the model performance is high enough or not.

Don't hesitate to contribute!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 78.7%
  • Jupyter Notebook 21.3%