A classification algorithm for identifying properties about a scientific paper from its text, particularly whether it is AI generated or not
- Create a python virtual environment:
python -m venv path/to/venv/folder
- Activate the environment (dependent on shell/system). In bash on linux:
source path/to/venv/folder/bin/activate
- Clone and enter this repo
- Install the requirements:
pip install -r requirements.txt
(this may take awhile because of pytorch) - Run
wandb login
- Follow the instructions to create a profile if you don't have one, then get and paste your api key
NOTE: If you don't have a decent GPU with cuda installed, it will use CPU to train, and that will be very, very slow.
Runnvidia-smi
to check
go to http://73.254.3.61:5003/ to use the classifier by providing an abstract, receiving a prediction on whether it is fake or real
The discriminator is trained using a Generative Adversarial Network architecture using a finetuned pretrained GPT2 as the Generator and an non finetuned pretrained GPT2 as the discriminator.
It does not work very well at the moment. More computational power and training time is needed