For this project, I scrape the entire corpus of speeches delivered during the 2017 UN General Assembly, as well as the official speech summaries as posted on the UN website. Unsupervised topic modeling produces several relevant issues that. Observing every country's dominant topic, as well as the distribution of important topics across the globe, gives relevant insight on national priorities, as well as strategic political placements.
In a first script, I scrape the UN General Assembly Debate website and collect every head of state's intervention, as well as the official statement summary. Statement summaries are saved as text files. Full statements are downloaded as PDF's, but converted to text using using pdftotxt
from the Xpdf suite. I also conduct initial cleaning on the text files.
In a second script, I conduct text analysis (PCA on TF-IDF word frequencies and LDA modeling) on the interventions and abstracts.
I give more information on the project and present interesting results on this blog post.
Feel free to get in touch if you have questions or comments !
Nico ([email protected])