AutoML is a lightweight library to create ML models in a data-centric AI way:
- Label on Kili
- Train a model with AutoML and evaluate its performance in one line of code
- Push predictions to Kili to accelerate the labeling in one line of code
- Prioritize labeling on Kili to label the data that will improve your model the most first
Iterate.
Once you are satisfied with the performance, in one line of code, serve the model and monitor the performance keeping a human in the loop with Kili.
You can try automl on a mock image classification project with this notebook.
Creating a new conda or virtualenv before cloning is recommended because we install a lot of packages:
conda create --name automl python=3.7
conda activate automl
git clone https://github.com/kili-technology/automl.git
cd automl
git submodule update --init
then install the requirements:
pip install -r kiliautoml/utils/ultralytics/yolov5/requirements.txt
pip install -e .
We made AutoML very simple to use. The following sections detail how to call the main methods.
We train the model with the following command line:
kiliautoml train \
--api-key $KILI_API_KEY \
--project-id $KILI_PROJECT_ID
By default, the library uses Weights and Biases to track the training and the quality of the predictions.
The model is then stored in the cache of the AutoML library in HOME/.cache/kili/automl
.
Kili automl training does the following:
- Selects the models related to the tasks declared in the project ontology.
- Retrieve Kili's asset data and convert it into the input format for each model.
- Finetunes the model on the input data.
- Outputs the model loss.
Here are the supported ML frameworks and the tasks they are used for.
- Hugging Face (NER, Text Classification)
- YOLOv5 (Object Detection)
- spaCy (coming soon)
- Simple Transformers (coming soon)
- Catalyst (coming soon)
- XGBoost & LightGBM (coming soon)
Compute model loss to infer when you can stop labeling.
Once trained, the models are used to predict the labels, add preannotations on the assets that have not yet been labeled by the annotators. The annotators can then validate or correct the preannotations in the Kili user interface.
kiliautoml predict \
--api-key $KILI_API_KEY \
--project-id $KILI_PROJECT_ID
Using trained models to push pre-annotations onto unlabeled assets typically speeds up labeling by 10%.
You can also use a model coming from another project, if they have the same ontology:
kiliautoml predict \
--api-key $KILI_API_KEY \
--project-id $KILI_PROJECT_ID \
--from-project $ANOTHER_KILI_PROJECT_ID
Once roughly 10 percent of the assets in a project have been labeled, it is possible to prioritize the remaining assets to be labeled on the project in order to prioritize the assets that will best improve the performance of the model.
kiliautoml prioritize \
--api-key $KILI_API_KEY \
--project-id $KILI_PROJECT_ID
This command will change the priority queue of the assets to be labeled. To do this, AutoML uses a mix between diversity sampling and uncertainty sampling.
Note: for image classification projects only.
The error is human, fortunately there are methods to detect potential annotation problems. label_errors.py
allows to identify potential problems and create a 'potential_label_error' filter on the project's asset exploration view:
kiliautoml label_errors \
--api-key $KILI_API_KEY \
--project-id $KILI_PROJECT_ID
AutoML currently supports the following tasks:
- Natural Language Processing (NLP)
- Named Entity Recognition
- Text Classification
- Image
- Object detection
- Image Classification
AutoML is a utility library that trains and serves models. It is your responsibility to determine whether the model performance is high enough or not.
Don't hesitate to contribute!