Skip to content

Nickisdope/11-785ActiveLearningProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

11-785ActiveLearningProject

This is the repository of 11-785 course project:

Incorporating Active Learning with Self-Training based Abusive Language Detection

Group Members (Andrew ID):

  • Chuiyu Wang (chuiyuw)
  • Yimeng Gu (yimengg)
  • Haoyue Zhao (haoyuez)
  • Xiang Liu (xliu4)

1. Datasets

  1. English Datasets:

https://drive.google.com/drive/folders/1NjCtBvOKRjCiVbmEWKmqhHiweW5NdhtH?usp=sharing

  1. Turkish Datasets:

https://drive.google.com/drive/folders/1HK_NeTg_PN7iRwl73O9RD8xUVBUhUjda?usp=sharing

2. Code Preparation

  • Download the English datasets to the ./EnglishDataset/ directory
  • Download the Turkish datasets to the ./TurkishDataset/ directory
  • Clone the huggingface xlm-roberta repository
  • Check if the file of pretrained model parameters is around 1GB, if not, use wget to re-download it (already scripted in a code cell)

3. Code Running

  • Run the code cell step by step
  • If you have already cloned the huggingface xlm-roberta repository, skip the relevant code cell
  • The code contans 2 stages:

1. Pretrain the model on English dataset

2. Finetune the model on Turkish dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published