- Chuiyu Wang (chuiyuw)
- Yimeng Gu (yimengg)
- Haoyue Zhao (haoyuez)
- Xiang Liu (xliu4)
- English Datasets:
https://drive.google.com/drive/folders/1NjCtBvOKRjCiVbmEWKmqhHiweW5NdhtH?usp=sharing
- Turkish Datasets:
https://drive.google.com/drive/folders/1HK_NeTg_PN7iRwl73O9RD8xUVBUhUjda?usp=sharing
- Download the English datasets to the ./EnglishDataset/ directory
- Download the Turkish datasets to the ./TurkishDataset/ directory
- Clone the huggingface xlm-roberta repository
- Check if the file of pretrained model parameters is around 1GB, if not, use wget to re-download it (already scripted in a code cell)
- Run the code cell step by step
- If you have already cloned the huggingface xlm-roberta repository, skip the relevant code cell
- The code contans 2 stages: