cm-data-science

Hi all and welcome to the climate misinformation data science repo!

Feel free to create your own branches and start playing around with the data that is stored in the labelled_data directory.

You will find text preprocessing and embedding pipeline in the text_preprocessing directory.

In the models directory you will find the implementation of several models and their performance evaluation.

Notebooks directory contains any additional EDA.

Option 1. Install dependences

One time only:

Set up virtual environment and add this environment to ipykernel.
The virtual environment allows you to install and use specific packages for this project without interfering with other projects. You will need to enter the virtual environment each time before running code.
```
python3 -m venv ~/venvs/cm-venv
source ~/venvs/cm-venv/bin/activate
pip install -r requirements.txt
python -m ipykernel install --name=cm-venv
```

Every time before running code:

Enter virtual environment.
```
source ~/venvs/cm-venv/bin/activate
```
You will see your Terminal prompt begin with '(cm-venv)'.

Every time to run Jupyter Notebok:

Open jupyter notebook.
```
jupyter notebook
```
Select 'Kernel' > 'Change kernel' > cm-venv

Option 2. Docker setup

Alternatively, can build a Docker image and run the code inside a container...

```
docker build -t cd-ds .
docker run --rm -it -p 8887:8887 -v "`pwd`":/data cd-ds
```

Then follow the link with 127.0.0.1 to open Jupyter

The results so far...

So far we have a fairly simple model which classifies articles into one of three categories...

0 - Climate denying
1 - Climate related (not climate denying)
2 - Not climate related

We have experimented with a few classification algorithms (e.g. support vector machines, random forests, adaptive boosting) and feature representations (tf-idf, normalised bag-of-words, word2vec).

Arguably the best results have been for a Random Forest with TF-IDF

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
labelled_data		labelled_data
models		models
notebooks		notebooks
text_preprocessing		text_preprocessing
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cm-data-science

Option 1. Install dependences

One time only:

Every time before running code:

Every time to run Jupyter Notebok:

Option 2. Docker setup

The results so far...

About

Releases

Packages

Contributors 5

Languages

License

ClimateMisinformation/cm-data-science

Folders and files

Latest commit

History

Repository files navigation

cm-data-science

Option 1. Install dependences

One time only:

Every time before running code:

Every time to run Jupyter Notebok:

Option 2. Docker setup

The results so far...

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages