Cornell Conversational Analysis Toolkit

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets.

The toolkit currently implements features for:

Linguistic coordination, a measure of linguistic influence (and relative power) between individuals or groups based on their use of function words (see the Echoes of Power paper). Example script exploring the balance of power in the US Supreme Court.
Politeness strategies, a set of lexical and parse-based features correlating with politeness and impoliteness (see the A computational approach to politeness paper). Example script for understanding the (mis)use of politeness strategies in conversations gone awry on Wikipedia.
Question typology, an unsupervised method for extracting surface motifs that recur in questions, and for grouping them according to their latent rhetorical role (see the Asking too much paper). Example scripts for extracting common question types in the UK parliament, on Wikipedia edit pages, and in sport interviews.
Conversational prompts, an unsupervised method for extracting types of conversational prompts (see the Conversations gone awry paper). Example script for understanding the use of conversational prompts in conversations gone awry on Wikipedia.
Coming soon: Basic message and turn features, currently available here Constructive conversations

Datasets

These datasets are included for ready use with the toolkit:

Conversations Gone Awry Corpus: a collection of conversations from Wikipedia talk pages that derail into personal attacks (1,270 conversations, 6,963 comments)
Tennis Corpus: transcripts for tennis singles post-match press conferences for major tournaments between 2007 to 2015 (6467 post-match press conferences)
Wikipedia Talk Pages Corpus: collection of conversations from Wikipedia editors' talk pages
Supreme Court Corpus: collection of conversations from the U.S. Supreme Court Oral Arguments
Parliament Corpus: parliamentary question periods from May 1979 to December 2016 (216,894 question-answer pairs)

Usage

Installation

This toolkit requires Python 3.

Download the toolkit.
Run python3 setup.py install to install the package.
Run python3 -m spacy download en

Use

Use import convokit to import it into your project.

Detailed installation and usage examples are also provided on the specific pages dedicated to each function of this toolkit.

Documentation

Documentation is hosted here.

The documentation is built with Sphinx (pip3 install sphinx). To build it yourself, navigate to doc/ and run make html.

Acknowledgements

Andrew Wang ([email protected]) wrote the Coordination code and the respective example script, wrote the helper functions and designed the structure of the toolkit.

Ishaan Jhaveri ([email protected]) refactored the Question Typology code and wrote the respective example scripts.

Jonathan Chang ([email protected]) wrote the example script for Conversations Gone Awry.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
convokit		convokit
datasets		datasets
doc		doc
examples		examples
website		website
.gitignore		.gitignore
Coordination_README.html		Coordination_README.html
Coordination_README.md		Coordination_README.md
Makefile		Makefile
Politeness_README.md		Politeness_README.md
QuestionTypology_README.html		QuestionTypology_README.html
QuestionTypology_README.md		QuestionTypology_README.md
README.html		README.html
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cornell Conversational Analysis Toolkit

Datasets

Usage

Installation

Use

Documentation

Acknowledgements

About

Releases

Packages

Languages

Merlin-chatbots/Cornell-Conversational-Analysis-Toolkit

Folders and files

Latest commit

History

Repository files navigation

Cornell Conversational Analysis Toolkit

Datasets

Usage

Installation

Use

Documentation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages