This toolkit contains tools to extract conversational features and analyze social phenomena in conversations. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets.
The toolkit currently implements features for:
-
Linguistic coordination, a measure of linguistic influence (and relative power) between individuals or groups based on their use of function words (see the Echoes of Power paper). Example script exploring the balance of power in the US Supreme Court.
-
Politeness strategies, a set of lexical and parse-based features correlating with politeness and impoliteness (see the A computational approach to politeness paper). Example script for understanding the (mis)use of politeness strategies in conversations gone awry on Wikipedia.
-
Question typology, an unsupervised method for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role (see the Asking too much paper). Example scripts for extracting common question types in the UK parliament, on Wikipedia edit pages, and in sport interviews.
-
Conversational prompts, an unsupervised method for extracting types of conversational prompts (see the Conversations gone awry paper). Example script for understanding the use of conversational prompts in conversations gone awry on Wikipedia.
-
Coming soon: Basic message and turn features, currently available here Constructive conversations
These datasets are included for ready use with the toolkit:
-
Conversations Gone Awry Corpus: a collection of conversations from Wikipedia talk pages that derail into personal attacks (1,270 conversations, 6,963 comments)
-
Tennis Corpus: transcripts for tennis singles post-match press conferences for major tournaments between 2007 to 2015 (6467 post-match press conferences)
-
Wikipedia Talk Pages Corpus: collection of conversations from Wikipedia editors' talk pages
-
Supreme Court Corpus: collection of conversations from the U.S. Supreme Court Oral Arguments
-
Parliament Corpus: parliamentary question periods from May 1979 to December 2016 (216,894 question-answer pairs)
This toolkit requires Python 3.
- Download the toolkit.
- Run
python3 setup.py install
to install the package. - Run
python3 -m spacy download en
Use import convokit
to import it into your project.
Detailed installation and usage examples are also provided on the specific pages dedicated to each function of this toolkit.
Documentation is hosted here.
The documentation is built with Sphinx (pip3 install sphinx
). To build it yourself, navigate to doc/
and run make html
.
Andrew Wang ([email protected]) wrote the Coordination code and the respective example script, wrote the helper functions and designed the structure of the toolkit.
Ishaan Jhaveri ([email protected]) refactored the Question Typology code and wrote the respective example scripts.
Jonathan Chang ([email protected]) wrote the example script for Conversations Gone Awry.