-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
121 lines (82 loc) · 3.84 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "#NICAR18 Tweets"
output:
github_document:
toc: true
df_print: "kable"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = "#>")
```
This is a dedicated repository for tracking [#NICAR18 tweets](https://twitter.com/hashtag/NICAR18?f=tweets&vertical=default&src=hash) (the official hashtag of 2018 annual Computer-Assisted Reporting Conference).
## Data
### rtweet
Whether you lookup the status IDs or search/stream new tweets, you'll need to make sure to install the [rtweet](http://rtweet.info) package. The code below will install [if it's not already] and load rtweet.
```{r}
## install rtweet if not already
if (!requireNamespace("rtweet", quietly = TRUE)) {
install.packages("rtweet")
}
## load rtweet
library(rtweet)
```
Our data collection method is described in detail below. However, if you want to get straight to the data, simply run the following code:
```{r, eval=FALSE}
## download status IDs file
download.file(
"https://github.com/computer-assisted-reporting/NICAR18/blob/master/data/search-ids.rds?raw=true",
"NICAR18_status_ids.rds"
)
## read status IDs fromdownloaded file
ids <- readRDS("NICAR18_status_ids.rds")
## lookup data associated with status ids
rt <- rtweet::lookup_tweets(ids$status_id)
```
### Search
One of the easiest ways to gather Twitter data is to search for the data (using Twitter's REST API). Unlike streaming, searching makes it possible to go back in time. Unfortunately, Twitter sets a rather restrictive cap–roughly nine days–on how far back you can go. Regardless, searching for tweets is often the preferred method. For example, the code below is setup in such a way that it can be executed once [or even several times] a day throughout the conference. See the [R code here](R/search.R).
Here's some example code showing what essentially we're doing to collect the data:
```{r, echo=FALSE}
source("R/search.R")
```
```{r, eval=FALSE}
## search terms
nicar18conf <- c("NICAR18", "NICAR2018", "IRE_NICAR")
## search for up to 10,000 tweets mentioning nicar18
rt <- search_tweets(paste(nicar18conf, collapse = " OR "), n = 10000)
```
## Explore
To explore the Twitter data, we recommend using the [tidyverse](http://tidyverse.org) packages. We're also using a customized [ggplot2](http://ggplot2.org) theme. See the [R code here](R/tidyggplot.R).
```{r, echo=FALSE}
source("R/tidyggplot.R")
```
### Tweet frequency over time
To create the image below, the data were summarized into a time series-like data frame and then plotted in order depict the frequency of tweets–aggregated in two-hour intevals–about \#nicar18 over time. See the [R code here](R/ts.R).
```{r timefreq, echo=FALSE}
source("R/ts.R")
```
<p align="center"><img width="100%" height="auto" src="img/timefreq.png" /></p>
### Positive/negative sentiment
Next, some sentiment analysis of the tweets so far. See the [R code here](R/sentiment.R).
```{r sentiment, echo=FALSE}
source("R/sentiment.R")
```
<p align="center"><img width="100%" height="auto" src="img/sentiment.png" /></p>
### Semantic networks
The image below depicts a quick and dirty visualization of the semantic network (connections via retweet, quote, mention, or reply) as it is observed in the data. See the [R code here](R/network.R).
```{r network, echo=FALSE}
source("R/network.R")
```
<p align="center"><img width="100%" height="auto" src="img/network.png" /></p>
Ideally, the network visualization would be an interactive, searchable graphic. Since it's not, I've printed out the node size values below.
```{r, echo=FALSE}
nodes <- as_tibble(sort(size, decreasing = TRUE))
nodes$rank <- seq_len(nrow(nodes))
nodes$screen_name <- paste0(
'<a href="https://twitter.com/', nodes$screen_name,
'">@', nodes$screen_name, '</a>')
nodes$n <- round(nodes$n, 3)
dplyr::select(nodes, rank, screen_name, log_n = n)
```