AI Trivia Chatbot - UNC Bootcamp Project 3

cUrioBot is a chatbot for playing a trivia game based on a set list of questions and answers. Using retrieval-augmented-generation (RAG) with a pre-trained LLM, the AI agents interact with the user to assess the answer’s level of correctness, to handle fallback and exception messages, and to generate facts based on a question’s topic.

This app can run locally without any external API calls or other external dependencies. We have also enabled it to run on HuggingFace Spaces.

Run cUrioBot

This project depends on these technologies and modules:

Ollama
mistral:7b-instruct model from ChatOllama
Python 3.10 or greater
LangChain (LLM agent interface)
Gradio (user interface)
sqlite3
Pandas

Try on HuggingFace Spaces

HuggingFace Spaces - cUrioBot

Try locally

cUrioBot is designed to work entirely locally without using external API requests.

Clone the repository:

git clone https://github.com/d3rp3tt3/AI-Trivia-ChatBot-Project-3.git
cd AI-Trivia-ChatBot-Project-3

Create a virtual environment (optional but recommended):

Conda

conda env create --name <my-env>

Python virtual environment

python3 -m venv <my-venv>
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```
Install Ollama:

Follow the instructions on the Ollama website to install Ollama.
Pull the mistral:7b-instruct model:

After installing Ollama, pull the mistral:7b-instruct model by running:
```
ollama pull mistral:7b-instruct
```
Run the application:
```
python local_quizchatbot.py
```

Methodology and architecture

Tools and libraries

Model: Pretrained embedded LLM model for natural language processing and "humanlike" agent workflows (mistral:7b-instruct)
HuggingFace transformers
SQLite database for trivia store
LangChain for AI agent logic
Retrieval-Augmented Generation (RAG) for using the questions and answers from our cleaned trivia dataset (via database queries)
NLP, spaCy, and Sci-kit learn for data cleaning and exploration
HuggingFace Spaces for hosted app
Unittest for unit and functional testing and bot accuracy evaluation

Architecture

cUrioBot is written in Python.

Datastore

Data is loaded into two SQLite3 database by using Python scripts that create the prod database if it doesn't exist and populate hem with data from the respective CSVs.

Prod data: /infra/prod_sql.py
Test data: infra/test_sql.py

The prod database is /app/trivia_qa.db and the test database is /app/testing_trivia_qa.db.

Workflow

Testing

To enable testing, we created a set of mock human answers via the /data/test/build_mock_data.ipynb Jupyter Notebook. This notebook includes two key pieces of code:

Call an LLM to create mock human answers by adding typos
Call an LLM to create mock human answers by giving incorrect answers based on the question

The data is cleaned and then saved as a CSV at /data/test/mock_data.csv.

The app also includes unit tests, functional, and model evaluation tests:

Unit: /app/unit_tests.py
Functional: app/functional_tests.py
Model evaluation: /app/verify_tests.py

Data

Sourcing

cUrioBot is based on a raw dataset of 200,000 Jeopardy questions in a CSV file.

Exploration and visualization

We reviewed the data for NA's. We checked the number of unique catergories. The column show number was removed as it did not add value. The show date was converted to Date format so it has availability future functions to be added. Many of the columns had a space in the name. They were renamed with _ to make utilizing them more efficient. We also checked the frequency of categories and answers for general information.

Cleaning and preparing

We explored using a clasifier to reduce the number of categories. Ultimatly we picked 3 clearly labeled categories, we needed to reduce the size of the dataset and this also allowed for clear labels and a category set that was small enough not to overwhelm the end user.

The cleaned question set is in a CSV file at /data/prod/JEOPARDY_CSV_TOP3_Categories.csv.

User experience

Users interact with cUrioBot through a chat interface, which we built as a proof-of-concept with Gradio. This app can run completely isolated from the Internet locally or in a hosted service, such as HuggingFace Spaces.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.vscode		.vscode
app		app
data		data
infra		infra
research		research
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Trivia Chatbot - UNC Bootcamp Project 3

Run cUrioBot

Try on HuggingFace Spaces

Try locally

Methodology and architecture

Tools and libraries

Architecture

Datastore

Workflow

Testing

Data

Sourcing

Exploration and visualization

Cleaning and preparing

User experience

About

Releases

Packages

Contributors 3

Languages

d3rp3tt3/AI-Trivia-ChatBot-Project-3

Folders and files

Latest commit

History

Repository files navigation

AI Trivia Chatbot - UNC Bootcamp Project 3

Run cUrioBot

Try on HuggingFace Spaces

Try locally

Methodology and architecture

Tools and libraries

Architecture

Datastore

Workflow

Testing

Data

Sourcing

Exploration and visualization

Cleaning and preparing

User experience

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages