Skip to content

Merlin-chatbots/Cornell-Conversational-Analysis-Toolkit

 
 

Repository files navigation

Cornell Conversational Analysis Toolkit

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets.

The toolkit currently implements features for:

Datasets

These datasets are included for ready use with the toolkit:

  • Conversations Gone Awry Corpus: a collection of conversations from Wikipedia talk pages that derail into personal attacks (1,270 conversations, 6,963 comments)

  • Tennis Corpus: transcripts for tennis singles post-match press conferences for major tournaments between 2007 to 2015 (6467 post-match press conferences)

  • Wikipedia Talk Pages Corpus: collection of conversations from Wikipedia editors' talk pages

  • Supreme Court Corpus: collection of conversations from the U.S. Supreme Court Oral Arguments

  • Parliament Corpus: parliamentary question periods from May 1979 to December 2016 (216,894 question-answer pairs)

Usage

Installation

This toolkit requires Python 3.

  1. Download the toolkit.
  2. Run python3 setup.py install to install the package.
  3. Run python3 -m spacy download en

Use

Use import convokit to import it into your project.

Detailed installation and usage examples are also provided on the specific pages dedicated to each function of this toolkit.

Documentation

Documentation is hosted here.

The documentation is built with Sphinx (pip3 install sphinx). To build it yourself, navigate to doc/ and run make html.

Acknowledgements

Andrew Wang ([email protected]) wrote the Coordination code and the respective example script, wrote the helper functions and designed the structure of the toolkit.

Ishaan Jhaveri ([email protected]) refactored the Question Typology code and wrote the respective example scripts.

Jonathan Chang ([email protected]) wrote the example script for Conversations Gone Awry.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.9%
  • Makefile 4.4%
  • HTML 3.4%
  • CSS 1.2%
  • Shell 0.1%