Master's Thesis - Learning what to learn: Generating Language Lessons using BERT

This repository has work done regarding my master's thesis at KU Leuven. In summary, the goal of the thesis was to devise a method to partially automate the process of creating language skill trees such as the ones present in language learning apps. Portuguese was chosen as a case study language for this task. The main goal was subdivided into further tasks. The first task was to make a method to automatically classify the difficulty of text. The second task was to devise a method to classify texts by semantical topic. More details can be found in the thesis text. Since Github does not support very large files, the models will be hosted elsewhere.

Task 1 - Proficiency classification

Work regarding the first task, namely fine-tuning a BERTimbau model for classifying Portuguese text difficulty in the CEFR scale, can be found here. A BERTimbau model was fine-tuned on the COPLE2 corpus for the task of classifying text in Portuguese by difficulty in the CEFR system. The model can be accessed at the Hugging Face Model Hub.

Task 1.5 - Classifying Tatoeba sentences by difficulty

In between the two tasks, an intermediate difficulty classification of Tatoeba sentences was done. It can be found here.

Task 2 - Topic modeling

Finally, topic modeling was done, which can be found here. Topic modeling was done with BERTopic with default parameters and with the Tatoeba sentences in Portuguese as documents. The model can be accessed at the Hugging Face Model Hub

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Task 1 - Proficiency classification		Task 1 - Proficiency classification
Task 1.5 - Classify Tatoeba sentences by difficulty		Task 1.5 - Classify Tatoeba sentences by difficulty
Task 2 - Topic Modeling (VSC)		Task 2 - Topic Modeling (VSC)
Text and Defense		Text and Defense
imgs		imgs
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Master's Thesis - Learning what to learn: Generating Language Lessons using BERT

Task 1 - Proficiency classification

Task 1.5 - Classifying Tatoeba sentences by difficulty

Task 2 - Topic modeling

About

Releases

Packages

Languages

License

joaoDossena/MasterThesis

Folders and files

Latest commit

History

Repository files navigation

Master's Thesis - Learning what to learn: Generating Language Lessons using BERT

Task 1 - Proficiency classification

Task 1.5 - Classifying Tatoeba sentences by difficulty

Task 2 - Topic modeling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages