Interactive Clustering
Python package used to apply NLP interactive clustering methods.
- GitHub repository : https://github.com/cognitivefactory/interactive-clustering/tree/1.0.0
- Main documentation : https://cognitivefactory.github.io/interactive-clustering/
- Pypi distribution : https://pypi.org/project/cognitivefactory-interactive-clustering/1.0.0/
Quick description
Interactive clustering is a method intended to assist in the design of a training data set.
This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :
-
the user defines constraints on data sampled by the computer ;
-
the computer performs data partitioning using a constrained clustering algorithm.
Thus, at each step of the process :
-
the user corrects the clustering of the previous steps using constraints, and
-
the computer offers a corrected and more relevant data partitioning for the next step.
The process use severals objects :
-
a constraints manager : its role is to manage the constraints annotated by the user and to feed back the information deduced (such as the transitivity between constraints or the situation of inconsistency) ;
-
a constraints sampler : its role is to select the most relevant data during the annotation of constraints by the user ;
-
a constrained clustering algorithm : its role is to partition the data while respecting the constraints provided by the user.
References
- PhD report:
Schild, E. (2024, in press). De l'Importance de Valoriser l'Expertise Humaine dans l'Annotation : Application à la Modélisation de Textes en Intentions à l'aide d'un Clustering Interactif. Université de Lorraine.
; - First presentation:
Schild, E., Durantin, G., Lamirel, J.C., & Miconi, F. (2021). Conception itérative et semi-supervisée d'assistants conversationnels par regroupement interactif des questions. In EGC 2021 - 21èmes Journées Francophones Extraction et Gestion des Connaissances. Edition RNTI. <hal-03133007>.
- Theoretical study:
Schild, E., Durantin, G., Lamirel, J., & Miconi, F. (2022). Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering. International Journal of Data Warehousing and Mining (IJDWM), 18(2), 1-19. http://doi.org/10.4018/IJDWM.298007. <hal-03648041>.
- Methodological discussion:
Schild, E., Durantin, G., & Lamirel, J.C. (2021). Concevoir un assistant conversationnel de manière itérative et semi-supervisée avec le clustering interactif. In Atelier - Fouille de Textes - Text Mine 2021 - En conjonction avec EGC 2021. <hal-03133060>.
Other links
- Several comparative studies of Interactive Clustering methodology on NLP datasets:
Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255
- A web application designed for NLP data annotation using Interactive Clustering methodology:
Schild, E. (2021). cognitivefactory/interactive-clustering-gui. Zenodo. https://doi.org/10.5281/zenodo.4775270
How to cite
Schild, E. (2021). cognitivefactory/interactive-clustering. Zenodo. https://doi.org/10.5281/zenodo.4775251.