diff --git a/data/index.html b/data/index.html index 2937f19..5ae3754 100644 --- a/data/index.html +++ b/data/index.html @@ -280,8 +280,8 @@
We also provide additional data from iKAT Year 1 and TREC CAsT 2019-2022.
We provide a segmented version of the TREC iKAT ClueWeb22-B Document collection available from CMU in two formats: JSONL
and TrecWeb
.
Last updated: August 5, 2024
See the official TREC iKAT repository for tools and code related to the track
"},{"location":"#introduction-to-trec-interactive-knowledge-assistance-track","title":"Introduction to TREC Interactive Knowledge Assistance Track","text":"The widespread adoption of voice-based assistants is significantly changing how we interact with technology. According to a Comscore report, over 20% of U.S. households now own a smart speaker. This trend is further exemplified by the recent introduction of assistant-enabled smart glasses by major tech companies, pushing the boundaries of real-world applications.
Despite their proficiency in executing simple, well-defined tasks, these assistants still face limitations in supporting conversational information seeking (CIS). CIS is crucial within fields such as information retrieval, natural language processing, and dialogue systems, focusing on tasks like ranking, summarizing, and question answering.
The TREC Interactive Knowledge Assistance Track (iKAT) builds on the four years of success of the TREC Conversational Assistance Track (CAsT), which can be explored further here. iKAT is designed to research and develop conversational agents that excel in collaborative information seeking by personalizing responses based on user-derived insights.
CAsT's fourth year introduced more interactive elements, such as clarifications and suggestions, fostering multi-turn, multi-path conversations. iKAT evolves from CAsT with a renewed focus on supporting diverse, multi-turn conversations tailored to the user\u2019s background, perspective, and context. This means that for any given topic, the flow and substance of the conversation can vary significantly depending on the user\u2019s individual traits and needs.
iKAT's primary goal is to advance research on conversational agents that not only respond to users\u2019 immediate queries but also adapt their responses based on the cumulative context of the interaction. This aspect of personalization is particularly timely with the advancements in large language models (LLMs), which introduce new challenges and opportunities in the dynamic interplay of user context, system promptings, and conversational initiatives.
Shield:
All data associated with this work is licensed and released under a Creative Commons Attribution-ShareAlike 4.0 International License.
"},{"location":"#track-coordinators","title":"Track Coordinators","text":"Mohammad Aliannejadi, University of Amsterdam, The Netherlands. Dr. Aliannejadi is an Assistant Professor at the IRLab (formerly known as ILPS), the University of Amsterdam in The Netherlands. His research is in modeling user information needs with a focus on recommender systems, unified (meta) search, and conversational systems.
Zahra Abbasiantaeb, University of Amsterdam, The Netherlands. Zahra is a Ph.D. student at the IRLab supervised by Dr. Aliannejadi. She is working on conversational search and recommendation. Earlier, she has also worked on patent reference mining. Zahra obtained her masters in AI from the Amirkabir University of Technology with a focus on Question Answering systems.
Simon Lupart, University of Amsterdam, The Netherlands. Simon is a Ph.D. student at the IRLab supervised by Dr. Aliannejadi and Prof. Kanoulas. He worked in IR for the past two years at Naver Labs Europe, and joined UvA to focus on Conversational search.
Shubham Chatterjee, Missouri University of Science and Technology, Missouri, USA. Dr. Chatterjee is an Assistant Professor of Computer Science.Dr. Chatterjee works on Neural IR, Entity-Orinted Search, Conversational IR, and the applications for LLMs to these areas.
Jeff Dalton, University of Edinburgh, Scotland. Dr. Dalton is a Associate Professor (Reader) and Chancellor's Fellow at the School of Informatics, the University of Edinburgh. He is also a Turing AI Fellow and PI for the GRILL Lab. His research focuses on new methods for machine understanding of language and text data using deep neural networks and entity knowledge graphs for improving information seeking applications.
Leif Azzopardi, University of Strathclyde, Scotland. Dr. Azzopardi is an Associate Professor in Artifical Intelligence and Data Science within the Departement of Computer and Information Sciences at the University of Strathclyde. He is the PI for the Interaction Lab (i-lab) which specializes in developing, evaluating and modelling information rich and information intensive applications and agents.
"},{"location":"#submit-your-runs-ikat-2024","title":"Submit Your Runs (iKAT 2024)","text":"We will share details of how to submit the runs soon. We will also provide a validation script to validate your runs for submission. Runs failing the validation script will not be accepted.
"},{"location":"#announcements","title":"Announcements","text":".tsv
file has this format: doc_id passage_number passage_MD5
. Total download size is 2.2GB. 2023_top_1000_query_results.zip This zip file has queries from both training and testing topics, saved in queries_train.txt
and queries_test.txt
respectively. The results from the iKAT searcher (BM25
using manually resolved queries) are saved in the query_results_train
and query_results_test
folders. Each result file, with up to 1000 results, corresponds to a query based on line numbers, starting from zero. For instance, the results for the first query in queries_train.txt
can be found in query_results_train/query_results_000.txt
. In each result file, every line shows the ClueWeb22 ID
followed by the URL
. ret_bm25_rm3--type_automatic--num_ptkb_3--k_100--num_psg_3.official.run.json Automatic Run ret_bm25_rm3--type_manual--num_ptkb_3--k_100--num_psg_3.official.run.json Manual Run"},{"location":"additional_data/#baseline-runs-ikat-2023","title":"Baseline Runs iKAT 2023","text":"We provide two baseline runs (linked above).
BM25+RM3
(Pyserini default) as the initial retrieval (denoted by ret_bm25_rm3
in the file name) method to retrieve 100 passages per query (denoted by k_100
).num_ptkb_3
in the file name).num_psg_3
in the file name). We use the T5
model mrm8488/t5-base-finetuned-summarize-news
available on HuggingFace for this purpose. SentenceTransformers
, specifically, the model cross-encoder/ms-marco-MiniLM-L-6-v2
available on HuggingFace.castorini/t5-base-canard
model available on HuggingFace.ptkb_provenance
field were used. resolved_utterance
field was used.We provide the data from previous years' TREC CAsT below. The iKAT topics are similar, with the addition of the Personal Text Knowledge Base. For more information on TREC CAsT, see the website and read the overview papers [2019] [2020] [2021] [2022]
Note. TREC CAsT did not include a PTKB but you can be creative and modify the data according to your needs. Also, TREC CAsT used different collections (Wikipedia, KILT, MS MARCO, etc.) at different stages. iKAT is using a subset of the recently released ClueWeb22-B.
"},{"location":"additional_data/#cast-year-4-2022","title":"CAsT Year 4 (2022)","text":"File Description 2022_automatic_evaluation_topics_tree_v1.0.json Contains each conversation tree (topic) with an automatic rewrite generated for each user utterance. 2022_evaluation_topics_turn_ids.json Contains each conversation tree (topic) with the resolved query for each user utterance. 2022_evaluation_topics_tree_v1.0.json Contains all ids that responses/ranked passages need to be returned for. 2022_evaluation_topics_flattened_duplicated_v1.0.json Contains all possible conversation paths across all the conversation trees."},{"location":"additional_data/#cast-year-3-2021","title":"CAsT Year 3 (2021)","text":"File Description 2021_automatic_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Automatic 2021_manual_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Manual 2021qrels.txt Qrels file for passage ranking task."},{"location":"additional_data/#cast-year-2-2020","title":"CAsT Year 2 (2020)","text":"File Description 2020_automatic_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Automatic 2020_manual_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Manual 2020qrels.txt Qrels file for passage ranking task."},{"location":"additional_data/#cast-year-1-2019","title":"CAsT Year 1 (2019)","text":"File Description train_topics_v1.0.json 30 example training topics in JSON format. evaluation_topics_v1.0.json 50 evaluation topics in JSON format. 2019qrels.txt Official evaluation qrels file for passage ranking task. train_qrels.txt Limited (incomplete) training judegements for 5 topics (approximately 50 turns). The judgments are graded on a three point scale (2 very relevant, 1 relevant, and 0 not relevant)."},{"location":"data/","title":"Datasets and Resources","text":""},{"location":"data/#topics-for-ikat-year-2-2024","title":"Topics for iKAT Year 2 (2024)","text":"File Description 2024_test_topics.json Test topics in JSON format."},{"location":"data/#addtional-data","title":"Addtional Data","text":"We also provide additional data from iKAT Year 1 and TREC CAsT 2019-2022.
"},{"location":"data/#trec-ikat-clueweb22-b-passage-collection","title":"TREC iKAT ClueWeb22-B Passage Collection","text":"We provide a segmented version of the TREC iKAT ClueWeb22-B Document collection available from CMU in two formats: JSONL
and TrecWeb
.
In case you have segmented the document collection yourself, you may check whether your segments match ours using the tsv
file of passage hashes provided.
JSONL
format.{\"id\": \"[passage id]\", \"contents\": \"[passage text]\", \"url\": \"[ClueWeb22 document URL]\"}
doc_id:passage_number
Passage collection in TrecWeb
format.
Passage hashes.
tsv
file containing MD5 hashes of passage texts..tsv
file has this format: doc_id passage_number passage_MD5
We also provide a sparse Lucene index generated from the JSONL
passage files above using Pyserini. The files form a single .tar.bz2
archive split into sections for simpler downloading due to the overall size. To extract the archive, once downloaded, you must combine each of the sections in name order back into a single file:
cat ikat_2023_passage_index.tar.bz2.part* > ikat_203_passage_index.tar.bz2\n
Total download size is approximately 150 GB
"},{"location":"data/#how-do-i-access-these-resources","title":"How do I access these resources?","text":"Each team should use a URL of https://ikattrecweb.grill.science/<team_name>
to access the files. The page will ask for a userID and password. Enter the login details which you obtained from the iKAT organizers. You should see a page which lists each type of data and has links to the individual files listed above, along with their checksum files.
NOTE: Please do not share IPs in the 10.x.x.x
range which is for private networks. We would need a suitable public IP so that we may configure the above download link to work for you.
iKAT Searcher is a simple tool developed to help with creating the topics for iKAT. The tool allows topic developers to visually assess the behaviour of a retrieval system, ultimately making it easier to develop challenging, but interesting, topics for the Track. You can interact with the system here. See the GitHub repository.
"},{"location":"data/#run-validation","title":"Run Validation","text":"We provide code for run validation in our Github repository. Please see the associated README file for detailed instructions on how to run the code. It is crucial to validate your submission files before submitting them. The run files that fail the validation phase will be discarded. We advise you to get familiarized with the validation script as soon as possible and let us know if you have any questions or encounter any problems working with it.
Note. You need the MD5 hash file of the passages in the collection to run the validation code. You can download this file from above.
Below is a summary of the checks that the script performs on a run file.
protocol_buffers/run.proto
?run_name
field non-empty?run_type
field non-empty and set to automatic
or manual
?passage_provenance
passage IDs appear in the collection?text
field?passage_provenance
entry:passage_provenance
entries listed for the response?passage_provenance
with its used
field set to True in the response?passage_provenance
entry?ptkb_provenance
entry?ptkb_provenance
entry:ptkb
field of the topic data?To help you get started, we (the iKAT organizers) have put together this guide. In this demo, we'll explore and build the components of a simple iKAT system. These components include:
The diagram above shows how the components of our system interact.
Given a query, conversation context, and the PTKB of the user, our system's Query Rewriter reformulates the query to resolve ambiguity. Next, the Passage Retriever uses the reformulated query to retrieve the top-K candidate passages from an index. Finally, the Response Generator uses the top-N of the K retrieved passages to generate a coherant response. The output of our system is a response along with the provenance relevant passages used to construct the response for the input query, based on the conversation context.
"},{"location":"demo/#setup","title":"Setup","text":"Before putting our system together, let's download the topics and the demo collection.
"},{"location":"demo/#trec-ikat-2023-simple-english-wikipedia-passage-collection","title":"TREC iKAT 2023 Simple English Wikipedia Passage Collection","text":"Downloading and processing the entire TREC iKAT 2023 ClueWeb22-B passage collection is not possible on Colab. Moreover, it requires a licenece to use. For this demo, we will use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. The iKAT organizers have preprocessed the articles and created a passage collection for you to use. This collection is in a jsonl
format. An example record from the collection is shown below:
{\n \"id\": \"simplewiki:Ted%20Cassidy:0\",\n \"contents\": \"Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on \\\"The Addams Family\\\".\",\n \"title\": \"Ted Cassidy\",\n \"wiki_id\": \"9822\"\n}\n
Each record in this collection contains the following fields:
id
: The passage id is a combination of (1) the string \"simplewiki:\", (2) the encoded title of the Wikipedia page, and (3) the passage number. This is similar to the iKAT 2023 passage id format (doc_id:passage_number) contents
: The text of the passage.title
: The title of the Wikipedia page to which this passage belongs.wiki_id
: The Wikipedia page ID of the Wikipedia page to which this passage belongs. These IDs are unique and will never changeNote. As this collection is a toy collection meant for demo purposes, the quality of results we obtain in this tutorial may be affected.
!pip install gdown\n
!echo \"Creating target directory..\"\n!mkdir -p ikat_demo\n!mkdir -p ikat_demo/collection\n\nimport gdown\n# The Google Drive file ID and the destination path\nurl = 'https://drive.google.com/uc?id=1touBjwkPByH69utT9_sevr5nYT0TTZ2M'\noutput = '/content/ikat_demo/collection/simplewiki-2020-11-01.passages.jsonl'\ngdown.download(url, output, quiet=False)\n\nurl = 'https://drive.google.com/uc?id=1zPSiAqLmbx9QFGm6walnuMUl7xoJmRB7'\noutput = '/content/ikat_demo/test.json'\ngdown.download(url, output, quiet=False)\n
"},{"location":"demo/#creating-a-bm25-index","title":"Creating a BM25 Index","text":"Now, we'll use the Pyserini information retrieval toolkit to build a sparse index for the collection we just downloaded. Pyserini provides APIs for our indexing needs and supports both sparse and dense retrieval. Alternatively, you may also use PyTerrier.
First, let's install Pyserini and its dependcies.
!pip install pyserini\n!pip install faiss-cpu\n
Pyserini provides ingestors for document collections in many different formats. The simplest, however, is the following JSON format:
{\n \"id\": \"doc1\",\n \"contents\": \"this is the contents.\"\n}\n
The collection to be used with Pyserini must be in a jsonl
format, where each line is a json
record structured as above. The preprocessed collection that we provide is already in a jsonl
format.
!python -m pyserini.index.lucene \\\n --collection JsonCollection \\\n --input '/content/ikat_demo/collection/' \\\n --index '/content/ikat_demo/index' \\\n --generator DefaultLuceneDocumentGenerator \\\n --threads 8 \\\n --storePositions --storeDocvectors --storeRaw\n
To check that our new sparse index works, let's try searching with it. The code below loads the index and searches for the query global warming
.
from pyserini.search.lucene import LuceneSearcher\n\nsearcher = LuceneSearcher('ikat_demo/index')\nquery = 'global warming'\nhits = searcher.search(query)\n\nfor i in range(len(hits)):\n print(f'{i+1:2} {hits[i].docid:4} {hits[i].score:.5f}')\n
Let's see the contents of the best ranking document.
import json\nbest_ranked_doc = searcher.doc(hits[0].docid)\nparsed_doc = json.loads(best_ranked_doc.raw())\nparsed_doc['contents']\n
"},{"location":"demo/#query-rewriting","title":"Query Rewriting","text":"iKAT topics mimic real-world dialogue phenomena. As a result, utterances within topics become increasingly ambiguous as the topic unfolds. On their own, these utterances likely won't return good candidates from our index, so we need to reformulate them using information from prior turns in the topic.
Let's examine the utterances in a topic to demonstrate the need for query rewriting. As an example, we will examine a segment from the iKAT 2023 test topic on \"Finding a diet\" conversation.
{\n \"turn_id\": 1,\n \"utterance\": \"Hey, Assistant! I am wondering what to cook for dinner tonight for my family.\",\n \"resolved_utterance\": \"Hey, Assistant! I am wondering what to cook for dinner tonight for my family.\",\n \"response\": \"Hey there! There are many healthy and delicious recipes you could try! For example, a stir-fry with lean proteins like chicken or tofu and a variety of colorful vegetables is always a good option. Or you could try making a Mediterranean quinoa salad with cucumbers, tomatoes, feta cheese, and a light lemon vinaigrette. Baked salmon with a side of roasted vegetables is also a great choice.\",\n},\n{\n \"turn_id\": 2,\n \"utterance\": \"Nah, we had salmon just two days ago. But the Mediterranean dish that you mentioned sounds interesting, could you provide the recipe for that?\",\n \"resolved_utterance\": \"Nah, we had salmon just two days ago. But the Mediterranean quinoa salad sounds interesting, can you provide the recipe for the Mediterranean quinoa salad?\",\n \"response\": \"Ah, I see. Of course! Here's a simple recipe for Mediterranean Quinoa Salad. The ingredients you need are 1 cup quinoa, 2 cups water, 1/4 cup extra-virgin olive oil, 2 limes, juiced, 2 teaspoons ground cumin, 1 teaspoon salt, 1/2 teaspoon red pepper flakes (optional), 1 1/2 cups halved cherry tomatoes, 1 cucumber, diced, 1/2 cup chopped fresh mint, 1/2 cup feta cheese. Instructions: Bring quinoa and water to a boil in a saucepan. Reduce heat to medium-low, cover, and simmer until quinoa is tender and water has been absorbed, 10 to 15 minutes. Set aside to cool. In a large bowl, whisk olive oil, lime juice, cumin, salt, and red pepper flakes together to create the dressing. In the bowl with the dressing, mix in quinoa, tomatoes, cucumber, and mint. Stir until well mixed. Top with feta cheese before serving. Enjoy!\"\n},\n{\n \"turn_id\": 3,\n \"utterance\": \"That's great, thanks! But I'm thinking of reducing the amount of oil in the dressing. Any suggestions?\",\n \"resolved_utterance\": \"That's great, thanks! But I'm thinking of reducing the amount of oil in the salad dressing. Any suggestions?\",\n \"response\": \"Certainly! You can reduce the amount of oil by using more lime juice or adding some water. Alternatively, you could substitute some of the oil with a different kind of liquid like vinegar. This should maintain a good balance between tartness and the need for oil as an emulsifier. Which of these do you think works the best for you?\"\n},\n{\n \"turn_id\": 4,\n \"utterance\": \"Thanks! My children don't like vinegar, so I would probably go with lime juice. Can I also use butter instead?\",\n \"resolved_utterance\": \"Thanks! My children don't like vinegar, so I would probably go with lime juice. Can I also butter instead of oil in the Mediterranean Quinoa Salad?\",\n \"response\": \"Butter can be substituted for vegetable oil in some dishes, but not always. Butter is a solid fat, while vegetable oil is liquid, so they have different effects on the consistency and texture of the food. Butter is also higher in calories and lower in trans fats than vegetable oil. If you want to use butter instead of vegetable oil, you may need to adjust the quantity, melt it down, and let it cool a bit. For example, 1 cup of butter equals 7/8 cup of vegetable oil.\"\n},\n{\n \"turn_id\": 5,\n \"utterance\": \"I see, thanks! What are some cheese substitutes?\",\n \"resolved_utterance\": \"I see, thanks! What are some substitutes for cheese in the Mediterranean Quinoa Salad?\",\n \"response\": \"I can suggest some alternatives for feta cheese. One option is to use a lower fat cheese like mozzarella. Another is to use a plant-based cheese substitute, if you're open to vegan options. You could also try adding a bit of nutritional yeast for a cheese-like flavor with less fat and more protein.\"\n},\n{\n \"turn_id\": 6,\n \"utterance\": \"Can you compare the first two?\",\n \"resolved_utterance\": \"Can you compare mozzarella with plant-based cheese?\",\n \"response\": \"Mozzarella cheese is a dairy product derived from milk and produced in a wide range of flavors. It comprises fat, usually from the milk of cows, buffalo, goats, or sheep. On the other hand, vegan cheese is a non-dairy, plant-based product. Vegan cheese is 100 percent animal-free cheese and is made using vegetable proteins. The ingredients of vegan cheese include soy, nuts, cashews, and vegetable oils (such as coconut oil). Vegan cheese is made using a process similar to traditional cheese making-without the animal milk. During the process, the plant proteins are separated using bacteria. Then the ingredients, such as oils and thickeners, are added to achieve the desired cheese-like consistency.\"\n},\n{\n \"turn_id\": 7,\n \"utterance\": \"Which of these is suitable for me?\",\n \"resolved_utterance\": \"Which of mozzarella and vegan cheese is suitable for me?\",\n \"response\": \"Based on the nutritional facts, mozzarella can be considered safe for lactose-intolerant people to eat, but be aware that the type of mozzarella you buy may not help reduce your symptoms. This is because many types of processed and packaged mozzarella also contain whey powder or casein - two other forms of dairy which are high in lactose. However, mozzarella has almost no lactose. Just one ounce of cheese provides around 0.3 grams of lactose. Vegan cheese is also a good alternative for lactose-intolerant people. Vegan cheeses are 100 percent animal-free and made using vegetable proteins. There are many different types of vegan cheese available, including vegan mozzarella. So, both mozzarella and vegan cheese can be suitable for lactose-intolerant people. It depends on your personal preference and dietary needs.\"\n},\n
This topic starts with a question regarding selecting a diet. If we isolate Turn 6
from the rest of the conversation and use it for search, we would likely get minimal, if any, results.
Now, let's see how a query rewriter helps.
We'll use a T5
query rewriter from HuggingFace
. It is finetuned on the CANARD
dataset but works effectively on iKAT queries.
# Load model and tokenizer from HuggingFace\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\nimport torch\n\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nrewriter = AutoModelForSeq2SeqLM.from_pretrained(\"castorini/t5-base-canard\").to(device).eval()\nrewriter_tokenizer = AutoTokenizer.from_pretrained(\"castorini/t5-base-canard\")\n
The model rewrites an utterance using that utterance and all previous utterances and system responses as input. The utterance and previous turn utterances and system responses should be separated by |||
when building the input to the model.
Let's read the json
data file and load the turns.
with open('/content/ikat_demo/test.json', 'r') as f:\n topics = json.load(f)\n
Next, we write a small function to extract the context.
The provided Python function, extract_context
, extracts a sequence of utterances and responses up to a given turn_id
from a JSON data structure. Here's a breakdown:
Purpose: Extracts a series of utterances and responses up to a specified turn from a given JSON data based on the provided number
.
Parameters:
json_data
: A list of dictionaries, where each dictionary represents a conversation that has a unique number and contains a series of turns.number
: The unique identifier for a specific conversation in the JSON data.turn_id
: A specified turn up to which the utterances and responses will be extracted.Process:
a. Locate Conversation: Loops through the json_data
to find the dictionary with the given number
.
b. Error Handling: If no dictionary with the given number
is found, it returns a message indicating so.
c. Extracting Text: Loops through the turns within the found conversation and appends the utterances and responses up to the turn_id
to a list.
d. Context Formation: Concatenates the extracted utterances and responses using \"|||\" as a separator to form the context.
Output: A tuple containing:
utterance
for the provided turn_id
.context
, which is the sequence of utterances and responses up to the given turn_id
, concatenated with \"|||\".def extract_context(json_data, number, turn_id):\n # Find the correct dictionary with the given number\n data = None\n for item in json_data:\n if item['number'] == number:\n data = item\n break\n\n # If we couldn't find the data for the given number\n if not data:\n print(\"No data found for the given number.\")\n return \"No data found for the given number.\", None\n\n # Extract the utterance and response values\n texts = []\n current_utterance = \"\"\n for turn in data['turns']:\n if turn['turn_id'] < turn_id:\n texts.append(turn['utterance'])\n texts.append(turn['response'])\n elif turn['turn_id'] == turn_id:\n current_utterance = turn['utterance']\n texts.append(current_utterance)\n\n # Join the texts with \"|||\" separator\n context = '|||'.join(texts)\n\n return current_utterance, context\n
Now we can use this function to extract the context for a given topic number
and turn_id
in the topic.
number_to_search = \"10-1\"\nturn_id_to_search = 6\nutterance, context = extract_context(topics, number_to_search, turn_id_to_search)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Turn Context: {context}\")\n
NOTE: When building context this way, there's a risk that the input can become too lengthy for subsequent interactions, especially in extended discussions. For handling this, you can experiment with various context truncation methods. A straightforward strategy is to eliminate earlier turn utterances and responses if the input size surpasses the model's token limit.
Now, let's rewrite the query using our model.
def rewrite_query(context: str, model, tokenizer, device) -> str:\n tokenized_context = tokenizer.encode(context, return_tensors=\"pt\").to(device)\n output_ids = model.generate(\n tokenized_context,\n max_length=200,\n num_beams=4,\n repetition_penalty=2.5,\n length_penalty=1.0,\n early_stopping=True\n ).to(device)\n\n rewrite = tokenizer.decode(output_ids[0], skip_special_tokens=True)\n return rewrite\n
rewrite = rewrite_query(context, rewriter, rewriter_tokenizer, device)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Query Rewrite: {rewrite}\")\n
Hmm, that didn't really help! \ud83d\ude1e The rewriter did expand the query but with the wrong information!
"},{"location":"demo/#expanding-the-context-using-relevant-ptkb-statements","title":"Expanding the Context using Relevant PTKB Statements","text":"One major difference between iKAT and CAsT is the presence of the Personal Text Knowledge Base (PTKB). In the first year, we are providing the PTKB as a dictionary of statements about the user. Each PTKB defines a user's profile and controls how the system should respond to the user. For the example conversation above, the PTKB, as provided in the test data, is as below.
{\n \"1\": \"I want to know about healthy cooking techniques.\",\n \"2\": \"I am lactose intolerant.\",\n \"3\": \"I'm looking for a speaker set to match my TV.\",\n \"4\": \"I'm willing to drive a long distance to find a cheaper TV.\",\n \"5\": \"I'm hoping to find some offers and discounts for TV.\",\n \"6\": \"I like to eat fruits and vegetables.\",\n \"7\": \"I don't read much.\",\n \"8\": \"I want to cook healthy and tasty recipes for my family.\",\n \"9\": \"I am on a diet and prefer low-calorie food.\",\n \"10\": \"I want to know about the nutritional value of the ingredients I use.\",\n \"11\": \"I'm looking for a new TV to replace my current one.\",\n \"12\": \"I want a TV that is okay for light and size of my living room.\"\n},\n
Above, we re-wrote the query using the context. But for a more persoanlized conversation, one approach to query rewriting could be to use the PTKB statements in the query reformulation process.
To incorporate the PTKB into the system, we must answer two questions:
In a manual
run, you may use the ptkb_provenance
fields. This field was manually populated by the iKAT topic developers and provides a straightforward way to identify relevant PTKB statements for the given turn utterance. However, a more difficult (and perhaps interesting) exercise is to automatically identify relevant PTKB statements for the given turn.
One easy-to-implement (and probably good) solution is to use BERT
embeddings. Specifially, we can use SentenceTransformers
SentenceTransformers
is a Python framework designed for sentence, text, and image embeddings. the foundational work on this was presented in the paper titled Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
This tool enables computation of sentence and text embeddings in over 100 languages. You can then use cosine similarity, for instance, to identify sentences of similar meanings. It's particularly valuable for semantic text similarity, semantic searching, and paraphrase detection.
Built on PyTorch and Transformers, the framework boasts a vast array of pre-trained models optimized for diverse tasks. Moreover, fine-tuning your models is a breeze.
We are going to use the CrossEncoder
model from SentenceTransformers
to identify the relevant PTKB statements. Specifically, we are going to re-rank the PTKB statements based on the current utterance.
A CrossEncoder
-based re-ranker can significantly enhance the end results for users. In this approach, both the query and a potential document are fed into the transformer network concurrently. The network then produces a score between 0 and 1, signifying the document's relevance to the query.
The strength of a CrossEncoder
lies in its superior performance, stemming from its ability to execute attention operations across both the query and the document.
We will use cross-encoder/ms-marco-MiniLM-L-6-v2
model from HuggingFace that scores the query and all retrieved passages for their relevancy.
For a complete introduction to using cross encoders and retrieval and reranking, see this notebook.
First, we need to install the SentenceTransformers
library
!pip install sentence-transformers\n
Next, we write a small function that will rerank the PTKB statements for the given query.
The provided Python function, get_ptkb_statements
, compares statements from the PTKB with a query to determine their similarity. Here's a step-by-step explanation of the function:
Purpose: The function aims to return the top num_ptkb
statements from the PTKB that are most similar to the given query
.
Parameters:
query
: The user's input or question.num_ptkb
: The number of PTKB statements to return.ptkb
: A dictionary of the PTKB statements.reranker
: A model that predicts the similarity score between two texts.Process:
a. Calculate Similarity Scores: For each statement in the PTKB, it computes a similarity score with the query
using the reranker
. The score is between 0 and 1, with 1 being highly similar.
b. Pair Statements with Scores: The statements from the PTKB are paired with their respective similarity scores.
c. Sort Pairs: The pairs are then sorted in descending order based on their similarity scores.
d. Extract Statements: From the sorted pairs, the actual statements are extracted.
e. Return Top Statements: The top num_ptkb
statements are then concatenated into a single string and returned.
Output: A string containing the top num_ptkb
statements from the PTKB that are most similar to the given query
, separated by spaces.
def get_ptkb_statements(query, num_ptkb, ptkb, reranker):\n # Find the similarity of PTKB statements with the given query\n similarity_scores = [reranker.predict([[query, ptkb_statement]])[0] for ptkb_statement in ptkb.values()]\n\n # Pair each statement with its similarity score\n statement_score_pairs = list(zip(list(ptkb.values()), similarity_scores))\n\n # Sort the pairs based on the similarity scores in descending order\n sorted_pairs = sorted(statement_score_pairs, key=lambda x: x[1], reverse=True)\n\n # Extract the sorted responses\n sorted_ptkb_statements = [pair[0] for pair in sorted_pairs]\n\n # Return required number of PTKB statements\n return ' '.join(sorted_ptkb_statements[:num_ptkb])\n
Now, let's use this function to find the top relevant PTKB statements for a given turn.
query = \"Can you compare the first two?\"\nptkb = {\n \"1\": \"I want to know about healthy cooking techniques.\",\n \"2\": \"I am lactose intolerant.\",\n \"3\": \"I'm looking for a speaker set to match my TV.\",\n \"4\": \"I'm willing to drive a long distance to find a cheaper TV.\",\n \"5\": \"I'm hoping to find some offers and discounts for TV.\",\n \"6\": \"I like to eat fruits and vegetables.\",\n \"7\": \"I don't read much.\",\n \"8\": \"I want to cook healthy and tasty recipes for my family.\",\n \"9\": \"I am on a diet and prefer low-calorie food.\",\n \"10\": \"I want to know about the nutritional value of the ingredients I use.\",\n \"11\": \"I'm looking for a new TV to replace my current one.\",\n \"12\": \"I want a TV that is okay for light and size of my living room.\"\n}\nnum_ptkb = 3\n
from sentence_transformers import CrossEncoder\nreranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')\nptkb_statements = get_ptkb_statements(query, num_ptkb, ptkb, reranker)\nptkb_statements\n
"},{"location":"demo/#question-2-how-do-we-use-these-relevant-ptkb-statements","title":"Question 2: How do we use these relevant PTKB statements?","text":"One possible way of using these relevant PTKB statements is to include them in the context when re-writing the query.
Let's see how that works. We will modify out previous function extract_context
a little to include the relevant PTKB statements.
def extract_context_with_ptkb_statements(json_data, number, turn_id, ptkb_statements):\n # Find the correct dictionary with the given number\n data = None\n for item in json_data:\n if item['number'] == number:\n data = item\n break\n\n # If we couldn't find the data for the given number\n if not data:\n print(\"No data found for the given number.\")\n return \"No data found for the given number.\"\n\n # Extract the utterance and response values\n texts = [ptkb_statements]\n current_utterance = \"\"\n for turn in data['turns']:\n if turn['turn_id'] < turn_id:\n texts.append(turn['utterance'])\n texts.append(turn['response'])\n elif turn['turn_id'] == turn_id:\n current_utterance = turn['utterance']\n texts.append(current_utterance)\n\n # Join the texts with \"|||\" separator\n context = '|||'.join(texts)\n\n return current_utterance, context\n
number_to_search = \"10-1\"\nturn_id_to_search = 6\nutterance, context = extract_context_with_ptkb_statements(topics, number_to_search, turn_id_to_search, ptkb_statements)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Turn Context: {context}\")\n
rewrite = rewrite_query(context, rewriter, rewriter_tokenizer, device)\nprint(f\"Query Rewrite: {rewrite}\")\n
That didn't help either! \ud83d\ude1e
This is a really difficult query for the system! We are excited \ud83e\udd29 to see how your system handles such queries.
Alternatively, we can also append the PTKB statements to the rewritten query (without PTKB statements).
"},{"location":"demo/#passage-retrieval-and-reranking","title":"Passage Retrieval and Reranking","text":"In iKAT 2023, we provide several tasks, see the guidelines section of the webpage for more details.
One core task in iKAT 2023 involves producing a ranked list of relevant passages corresponding to a specific user utterance. During the Passage Retrieval phase, we employ the rephrased query (either manually or automatically adjusted) to fetch a potential set of passages from the previously generated sparse index.
The retrieve-then-rerank approach is a widely adopted strategy in Information Retrieval systems, aimed at enhancing the quality of the preliminary set of candidates. The process commences with a swift and effective retrieval method to fetch the initial set of passages. One prevalent method for this is BM25. However, there's also the option of adopting dense retrieval methods like Bi-encoders. For a comprehensive understanding of utilizing bi-encoders in retrieval, consider checking this guide.
Subsequent to this initial retrieval, the candidate set undergoes a reranking process, leveraging more advanced methods. An example would be rerankers rooted in BERT, known as cross-encoders. In this tutorial, we'll specifically employ the CrossEncoder
from the SentenceTransformers
library.
We will first retrieve a candidate set of passages from our index using BM25. As query, we will use the manually resolved utterance from turn_id=6
in the example shown above.
def retrieve_using_bm25(query):\n hits = searcher.search(query)\n candidate_set = []\n for i in range(len(hits)):\n print('Rank: {} | PassageID: {} | Score: {}'.format(i+1, hits[i].docid, hits[i].score))\n doc = searcher.doc(hits[i].docid)\n parsed_doc = json.loads(doc.raw())\n print(parsed_doc['contents'])\n candidate_set.append({\n 'passage_id': hits[i].docid,\n 'bm25_rank': i+1,\n 'bm25_score': hits[i].score,\n 'passage_text': parsed_doc['contents']\n })\n print('=================================')\n return candidate_set\n
"},{"location":"demo/#step-2-rerank-using-crossencoder","title":"Step-2: Rerank using CrossEncoder","text":"Next, we will rerank this candidate set using the CrossEncoder
defined earlier.
def rerank_passages(query, passages, reranker):\n res = []\n query_passage_pairs = [[query, passage['passage_text']] for passage in passages]\n scores = reranker.predict(query_passage_pairs)\n\n for passage, score in zip(passages, scores):\n passage['reranker_score'] = score\n res.append(passage)\n\n ranked_passages = sorted(passages, key=lambda x: x['reranker_score'], reverse=True)\n return ranked_passages\n
query = \"Can you compare mozzarella with plant-based cheese?\"\ncandidate_set = retrieve_using_bm25(query)\n
import numpy as np\nreranked_passages = rerank_passages(query, candidate_set, reranker)\nprint(json.dumps(reranked_passages, indent=4, default=lambda o: float(o) if isinstance(o, np.float32) else o))\n
These results are not great. An important thing to note here is that we are doing retrieval over a very small corpus of SimpleEnglishWikipedia
. As mentioned earlier, the results may not be of high quality.
One of the tasks in iKAT 2023 is response generation. After retrieval, the system should use the top-K passages to generate a short response (250 words or less) that is appropriate for an interactive conversational agent to give to the user.
Let's explore one way this can be done, by framing the task as a summarisation problem. We will use the T5
model for this purpose. Specifically, we will use the mrm8488/t5-base-finetuned-summarize-news
model from HuggingFace.
The mrm8488/t5-base-finetuned-summarize-news
is Google's T5-base
model fine-tuned on the News Summary dataset for the downstream task of summarization.
First, we will write a short function for this task.
The generate_response
function is described below:
Purpose: Generates a summarized response based on the top passages from a set of documents returned by a search operation.
Parameters:
passages
: A set of top documents or hits returned by the search operation.model
: An instance of a pre-trained sequence-to-sequence language model (from the AutoModelForSeq2SeqLM
class) for generating summaries.tokenizer
: An instance of a tokenizer (from the AutoTokenizer
class) used to tokenize and decode text.Process:
a. Consolidating Passages: Combines all the extracted passages into one continuous string.
b. Tokenization and Input Formation: Tokenizes the combined text and pre-processes it by adding a \"summarize: \" prefix. The tokenized input is adjusted to not exceed a specified maximum length (512 tokens) and is moved to the desired computation device.
c. Generating Summary: Utilizes the sequence-to-sequence language model to generate a summarized response based on the input. Applies various parameters to control and improve the quality of the output summary.
d. Decoding the Summary: Transforms the token IDs from the generated summary back into human-readable text, ensuring any special tokens are omitted.
Output: Returns a coherent and summarized text derived from the top passages of the documents.
def generate_response(passages, model, tokenizer):\n text = ' '.join(passages)\n inputs = tokenizer.encode(\"summarize: \" + text, return_tensors=\"pt\", max_length=512, truncation=True)\n with torch.no_grad():\n summary_ids = model.generate(\n inputs,\n max_length=250,\n min_length=50,\n length_penalty=2.0,\n num_beams=4,\n early_stopping=True\n )\n summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)\n return summary\n
summarizer = AutoModelForSeq2SeqLM.from_pretrained('mrm8488/t5-base-finetuned-summarize-news')\nsummarizer_tokenizer = AutoTokenizer.from_pretrained('mrm8488/t5-base-finetuned-summarize-news')\n
# We use the top-3 reranked passages to generate a response\npassages = [passage['passage_text'] for passage in reranked_passages][:3]\nprint(json.dumps(passages, indent=4))\n
generate_response(passages, summarizer, summarizer_tokenizer)\n
"},{"location":"guidelines/","title":"Guidelines for iKAT 2024 Year 2","text":"The guidelines for iKAT 2024 (year 2) are now available as a Google Doc.
The guidelines for iKAT 2023 (year 1) are also available as a Google Doc.
"},{"location":"guidelines/#participation","title":"Participation","text":"Participants must register to submit. To request a late registration, please email trec@nist.gov
requesting a registration key. The dissemination form must be returned to submit runs.
In iKAT, the direction of the conversation can be changed in each turn based on:
The previous response from the user,
The persona of the user, and
The information learned from the user (background, perspective, and context).\u00a0
The persona of the user and their information needs form the direction of the conversation. Each topic will have multiple conversations based on multiple personas and results in different outputs that demonstrates the personalized aspect of the conversations. To this aim, the persona and the information needs of the user are modeled by generating a Personal Textual Knowledge Base (PTKB) during the conversation.\u00a0
Note: The PTKB is provided for each conversation and the participants do not have to generate or update it.
"},{"location":"guidelines/#task-overview","title":"Task Overview","text":"In Year 2, the following inputs are provided to the participants at each conversation turn:
We offer the following tasks:
PTKB Statement Ranking: For each turn, given the PTKB, return a ranking of the relevant PTKB statements. This task is essentially a binary classification task. So, the output required is a list of statements from the PTKB.
Passage Ranking and Response Generation: For each turn, retrieve and rank relevant passages from the given collection in response to a user utterance. Then use the ranked passages to return a set of responses. Each response may be simply a passage from the collection. Alternatively, it may also be an extracted or generated summary from one or more passage results. All responses must have at least one passage called \"provenance\" from the collection.
Only Response Generation: For each turn, we will provide a ranked list of passages. The participants need only return a set of responses using this ranked list. As specified above, each response may be simply a passage from the collection. Alternatively, it may also be an extracted or generated summary from one or more passage results. All responses must have at least one passage called \"provenance\" from the collection.
We will provide baseline passage ranking and response generation methods for each of the tasks.
For manual runs, the participants can also use the following inputs provided for each conversational turn:
There are three submission classes:
Automatic: No manually labeled data can be used for this run type. This means that the models should solely rely on the current utterance, and the converation context (i.e., previous user utterance and system\u2019s canonical responses). Moreover, systems should not use the ptkb_provenance
fields from the current or previous turns. They should have a module to automatically identify the relevant PTKB statements (for an example, see the Getting Started
part of the website).
Manual: The manual runs can use the manually annotated data in the models. This includes the following:
ptkb_provenance
) of the current utteranceptkb_provenance
) of previous turns.Only response generation: These use the given passage ranking for response generation. The focus is only on response generation.
Note. In either run type, the participants are not allowed to use any information from the future. In other words, you should assume that for each turn, the only available information is up and including the current user utterance -- the system reponse of the current turn, as well as anything beyond that are hidden.
In the submission form, we will ask the pariticpants to mark which data sources they used in the manual submissions. You may either use some or all available lableled data, but this should be clearly specified in the run submission form.
"},{"location":"guidelines/#important-points-regarding-submissions","title":"Important Points Regarding Submissions","text":"Title of the topic cannot be used.
All fields within the run, as shown in the sample on this website, are mandatory. You may choose not to submit a PTKB statement ranking; in this case, the ptkb_provenance
field may be kept empty in the run; however, it must be present.
The passage_provenance
field can have up to 1000 passages -- less is ok but not more.
Within the passage_provenance
list
in the run, each dict
should have another field called used
. This new field will be a boolean
field indicating whether or not that passage was used to construct the response. If none of the passages have the used
field set to True
, then we will consider the top-5 passages as provenance for that response by default.
Having a response text
for every predicted response is mandatory. In case you are submitting a run that does not generate a response, you may leave this field empty or copy the top-1 passage as your response.
An example of two different conversations based on different personas for the same topic is shown in the following figure. For each user turn, systems should return a ranked list of text responses. Each response has one or more (ranked) source passages as provenance. In addition, the systems should provide a sorted list of relevant statements of PTKB with the corresponding relevance score.
For an explanation of the above diagram, see the Google Doc.
"},{"location":"guidelines/#primary-task-details","title":"Primary Task Details","text":"The main task in iKAT can be defined as personalized retrieval-based \"candidate response retrieval\" in context of the conversation. The task can be divided into the following sub-tasks:
Read the current dialogue turns up to the given turn (context). The provided context is:\u00a0(1) A fixed set of previous responses with provenance in the preceding turns up to the current step, and (2) PTKB of the user. Note: Using information from following turns is not allowed.
Find the relevant statements from PTKB to the information needed for this turn. This task is considered as a relevance score prediction. The output is in the form of a sorted list of the statements from PTKB with corresponding relevance score.
Extract or generate a response. Each response can be generated from multiple passages. It can be an abstractive or extractive summary of the corresponding passages. Each response must have one or more ranked passages as provenance used to produce it.
eval_response=False
] is provided for teams that only want to focus on the ranking task without response generation.Tokenizer
function of spacy.tokenizer
in spaCy v3.3 library), but should vary depending on an appropriate query-response.In the second year of iKAT, we are offering teams two distinct options for response generation:
The text collection contains a subset of ClueWeb22B documents, prepared by the organizers in collaboration with CMU. Documents have then been split into ~116M passages. The goal is to retrieve passages from target open-domain text collections.
"},{"location":"guidelines/#license-for-clueweb22-b","title":"License for ClueWeb22-B","text":"Getting the license to use the collection can be time-consuming and would be handled by CMU, not the iKAT organizers. Please follow these steps to get your data license ASAP:
Sign the license form available on the ClueWeb22 project web page and send the form to CMU for approval (clueweb@andrew.cmu.edu
).
Once you have the license, send a mail to Andrew Ramsay (andrew.ramsay@glasgow.ac.uk
) to have access to a download link with the preprocessed iKAT passage collection, and other resources such as Lucene and SPLADE indices.
Please give enough time to the CMU licensing office to accept your request.
Note.
CMU requires a signature from the organization (i.e., the university or company), not an individual who wants to use the data. This can slow down the process at your end too. So, it\u2019s useful to start the process ASAP.
If you already have an accepted license for ClueWeb22, you do not need a new form. Please let us know if that is the case.
As an alternative for (2), once you have access to ClueWeb22, you can get the raw ClueWeb22-B/iKAT collection yourself with the license, and do all passage-segmentation yourself, but we advise you to use our processed version to avoid any error.
Please do feel free to reach out to us if you have any questions or doubts about the process, so we can prevent any delays in getting the data to you.
"},{"location":"guidelines/#passage-segmentation","title":"Passage Segmentation","text":"For assessment, we will judge provenance passages. We segment the documents in our collection into passages in a similar manner as done by the TREC Deep Learning track for segmenting MS MARCO documents into passages: First, each document is trimmed to 10k characters. Then a 10-sentence sliding window with a 5-sentence stride is used to generate the passages.\u00a0
An example document with some passage segmentation is provided in TrecWeb format below for illustration purposes:
"},{"location":"guidelines/#topic-format","title":"Topic Format","text":"We will provide several sample topics with example baseline runs for validation and testing. Below is a sample topics file with two subtrees of the same topic. Subtrees are identified by topic and subtree ID, i.e topic 1, subtree 2 is 1-2
. Also a passage_provenance
field with a list of provenance passages and ptkb_provenance
field with a list of provenance statements from PTKB, that are used for generating the response, are included. An example is shown below for illustrative purposes.
[\n {\n \"number\": \"1-1\",\n \"title\": \"Finding a University\",\n \"ptkb\": {\n \"1\": \"I graduated from Tilburg University.\",\n \"2\": \"I live in the Netherlands.\",\n \"3\": \"I'm allergic to peanuts.\",\n \"4\": \"I worked as a web developer for 2 years.\",\n \"5\": \"I have a bachelor's degree in computer science.\",\n \"6\": \"I like Indian food.\",\n \"7\": \"My bachelor's GPA is 5.6.\",\n \"8\": \"I'm 26 years old.\",\n \"9\": \"My TOEFL SCORE is 91.\",\n \"10\": \"My interesting bachelor courses are data structure, algorithm, data mining, and artificial intelligence.\",\n \"11\": \"I didn't like computer architecture and logical circuits courses.\"\n },\n \"turns\": [\n {\n \"turn_id\": 1,\n \"utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"resolved_utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"response\": \"Do you want to continue your bachelor's studies and obtain a degree in computer science?\",\n \"ptkb_provenance\": [\n 5\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0034-09-03452:1\",\n \"clueweb22-en0034-09-03452:3\",\n \"clueweb22-en0034-09-03452:5\",\n \"clueweb22-en0034-09-03452:7\",\n \"clueweb22-en0034-09-03452:9\"\n ]\n },\n {\n \"turn_id\": 2,\n \"utterance\": \"Yes, I want to continue my studies in computer science.\",\n \"resolved_utterance\": \"Yes, I want to continue my studies in computer science.\",\n \"response\": \"Do you want to study in the Netherlands, Europe, or somewhere further away?\",\n \"ptkb_provenance\": [\n 2\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0034-09-03452:2\",\n \"clueweb22-en0034-09-03452:4\",\n \"clueweb22-en0034-09-03452:6\",\n \"clueweb22-en0034-09-03452:8\",\n \"clueweb22-en0034-09-03452:10\"\n ]\n },\n {\n \"turn_id\": 3,\n \"utterance\": \"I'd like to stay here.\",\n \"resolved_utterance\": \"I'd like to stay in the Netherlands.\",\n \"response\": \"I can help you with finding a university for continuing your studies in the Netherlands as a computer science student. Take a look at these Top Computer Science Universities in the Netherlands: Delft University of Technology, Eindhoven University of Technology, Vrije Universiteit Amsterdam, University of Amsterdam, Leiden University, Radboud University, Utrecht University, University of Twente\",\n \"ptkb_provenance\": [\n 5,\n 2\n ],\n \"response_provenance\": [\n \"clueweb22-en0034-09-03452:1\"\n ],\n \"sample_passage_ranking\": [\n \"clueweb22-en0012-00-00012:0\",\n \"clueweb22-en0012-00-00012:1\",\n \"clueweb22-en0012-00-00012:2\",\n \"clueweb22-en0012-00-00012:3\",\n \"clueweb22-en0012-00-00012:4\"\n ]\n }\n ]\n },\n {\n \"number\": \"1-2\",\n \"title\": \"Finding a university\",\n \"ptkb\": {\n \"1\": \"I don't like crazy cold weather.\",\n \"2\": \"I don't have a driver's license.\",\n \"3\": \"I plan to move to Canada.\",\n \"4\": \"I'm from the Netherlands.\",\n \"5\": \"I'm used to heavy rains in the Netherlands.\",\n \"6\": \"I graduated from UvA.\",\n \"7\": \"I have bachelor's degree in computer science.\",\n \"8\": \"I speak English fluently.\"\n },\n \"turns\": [\n {\n \"turn_id\": 1,\n \"utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"resolved_utterance\": \"I want to start my master's degree, can you help me with finding a university in Canada?\",\n \"response\": \"Sure, do you want to study computer science?\",\n \"ptkb_provenance\": [\n 7,\n 3\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0040-41-06056:0\",\n \"clueweb22-en0040-41-06056:1\",\n \"clueweb22-en0040-41-06056:2\",\n \"clueweb22-en0040-41-06056:3\",\n \"clueweb22-en0040-41-06056:4\"\n ]\n },\n {\n \"turn_id\": 2,\n \"utterance\": \"Yes, I want to pursue the same major. Can you tell me the name of the best universities?\",\n \"resolved_utterance\": \"Yes, I want to pursue computer science. Can you tell me the name of the best computer science universities in Canada?\",\n \"response\": \"Here are the top universities for computer science in Canada: 1) University of British Columbia, 2) University of Alberta, 3) Concordia University, 4) Simon Fraser University, 5) The University of Toronto\",\n \"ptkb_provenance\": [],\n \"response_provenance\": [\n \"clueweb22-en0026-31-15538:1\",\n \"clueweb22-en0026-31-15538:4\",\n \"clueweb22-en0026-31-15538:6\",\n \"clueweb22-en0040-41-06056:0\"\n ],\n \"sample_passage_ranking\": [\n \"clueweb22-en0010-22-22210:0\",\n \"clueweb22-en0010-22-22210:1\",\n \"clueweb22-en0010-22-22210:2\",\n \"clueweb22-en0010-22-22210:3\",\n \"clueweb22-en0010-22-22210:4\"\n ]\n },\n {\n \"turn_id\": 3,\n \"utterance\": \"Which of them best suits me in terms of weather conditions?\",\n \"resolved_utterance\": \"Which of the following universities best suits me in terms of weather conditions? 1) the University of British Columbia, 2) the University of Alberta, 3) Concordia University, 4) Simon Fraser University, and 5) The University of Toronto.\",\n \"response\": \"I know you don't like very cold weather, but can you give me an estimation of the temperature that is acceptable for you?\",\n \"ptkb_provenance\": [\n 1,\n 5\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0030-30-30030:0\",\n \"clueweb22-en0030-30-30030:1\",\n \"clueweb22-en0030-30-30030:2\",\n \"clueweb22-en0030-30-30030:3\",\n \"clueweb22-en0030-30-30030:4\"\n ]\n }\n ]\n }\n]\n\n\n
"},{"location":"guidelines/#task-submissions","title":"Task Submissions","text":"Participants submit the output of their system on the specified \"test\" topics. A single participant can submit maximum of:
In the automatic runs, the participants can include response generation based on their own ranking. But this is not mandatory.
In the only response generation run, the participants must use the given passage provenances.
We have three submission classes for each of the 1) automatic, 2) manual, and 3) only response generation tasks. An example of the submission template for each run is below.
"},{"location":"guidelines/#sample-submission-for-the-main-task","title":"Sample submission for the main task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"automatic\",\n \"eval_response\": True,\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"ptkb_provenance\": [1,2],\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"score\": 0.6,\n \"used\": False\n\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"score\": 0.5,\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"score\": 0.4,\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"score\": 0.38, \n \"used\": True\n\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"score\": 0.3,\n \"used\": False\n }\n ]\n }\n ]\n }\n ]\n}\n\n
"},{"location":"guidelines/#sample-submission-for-the-only-response-generation-task","title":"Sample submission for the only response generation task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"only_response\",\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"ptkb_provenance\": [1,2],\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"used\": True\n }\n ]\n }\n ]\n }\n ]\n}\n\n\n
"},{"location":"guidelines/#sample-submission-for-the-manual-task","title":"Sample submission for the manual task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"manual\",\n \"eval_response\": True,\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"score\": 0.6,\n \"used\": False\n\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"score\": 0.5,\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"score\": 0.4,\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"score\": 0.38, \n \"used\": True\n\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"score\": 0.3,\n \"used\": False\n }\n ]\n }\n ]\n }\n ]\n}\n\n
The run_name
is a run submission identifier that should be descriptive and unique to your team and institution.
The run_type
is one of automatic
or manual
.
Each turn
in the turns
list should contain a turn_identifier, consisting of the topic_id-subtree_id and turn_id concatenated with an underscore, e.g., 1-2_3
for topic 1, subtree 2, and turn 3.
Each turn
should also contain a list of responses
. A response consists of text
and a provenance
list. Each provenance should have an ID
, text
, used
, and score
. The used
field indicates whether the passage is used for response generation or not.
Each turn
includes a sorted list of statements from PTKB based on the relevance score of each statement from PTKB to the current turn.
If you want to do only the retrieval and don\u2019t want to do the response generation in an automatic run, you can leave the text
field empty and change the eval_response
to False
.
For provenance ranking, this will be converted to a traditional TREC run format:
31_1-1 Q0 clueweb22-en0000-94-02275:0 1 0.5 sample_run
Runs may include up to 1000 responses for each user turn. For provenance ranking, only the first 1000 pieces of unique provenance will be used. As in previous year of iKAT, only limited top-k responses and provenances will be assessed according to resource constraints.
"},{"location":"guidelines/#evaluation","title":"Evaluation","text":"We will use the relevance assessment methods used in previous years of CAsT for relevance to individual turns.
Similar to iKAT Year 1, only a subset of turns may be evaluated for provenance ranking effectiveness. This will be disclosed to participants after the assessment is completed.
"},{"location":"guidelines/#timeline","title":"Timeline","text":"Task Date Guidelines released May 20, 2024 Test topics released August 5, 2024 Submission deadline August 31, 2024 AOE Results released to participants TBD"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"Last updated: August 5, 2024
See the official TREC iKAT repository for tools and code related to the track
"},{"location":"#introduction-to-trec-interactive-knowledge-assistance-track","title":"Introduction to TREC Interactive Knowledge Assistance Track","text":"The widespread adoption of voice-based assistants is significantly changing how we interact with technology. According to a Comscore report, over 20% of U.S. households now own a smart speaker. This trend is further exemplified by the recent introduction of assistant-enabled smart glasses by major tech companies, pushing the boundaries of real-world applications.
Despite their proficiency in executing simple, well-defined tasks, these assistants still face limitations in supporting conversational information seeking (CIS). CIS is crucial within fields such as information retrieval, natural language processing, and dialogue systems, focusing on tasks like ranking, summarizing, and question answering.
The TREC Interactive Knowledge Assistance Track (iKAT) builds on the four years of success of the TREC Conversational Assistance Track (CAsT), which can be explored further here. iKAT is designed to research and develop conversational agents that excel in collaborative information seeking by personalizing responses based on user-derived insights.
CAsT's fourth year introduced more interactive elements, such as clarifications and suggestions, fostering multi-turn, multi-path conversations. iKAT evolves from CAsT with a renewed focus on supporting diverse, multi-turn conversations tailored to the user\u2019s background, perspective, and context. This means that for any given topic, the flow and substance of the conversation can vary significantly depending on the user\u2019s individual traits and needs.
iKAT's primary goal is to advance research on conversational agents that not only respond to users\u2019 immediate queries but also adapt their responses based on the cumulative context of the interaction. This aspect of personalization is particularly timely with the advancements in large language models (LLMs), which introduce new challenges and opportunities in the dynamic interplay of user context, system promptings, and conversational initiatives.
Shield:
All data associated with this work is licensed and released under a Creative Commons Attribution-ShareAlike 4.0 International License.
"},{"location":"#track-coordinators","title":"Track Coordinators","text":"Mohammad Aliannejadi, University of Amsterdam, The Netherlands. Dr. Aliannejadi is an Assistant Professor at the IRLab (formerly known as ILPS), the University of Amsterdam in The Netherlands. His research is in modeling user information needs with a focus on recommender systems, unified (meta) search, and conversational systems.
Zahra Abbasiantaeb, University of Amsterdam, The Netherlands. Zahra is a Ph.D. student at the IRLab supervised by Dr. Aliannejadi. She is working on conversational search and recommendation. Earlier, she has also worked on patent reference mining. Zahra obtained her masters in AI from the Amirkabir University of Technology with a focus on Question Answering systems.
Simon Lupart, University of Amsterdam, The Netherlands. Simon is a Ph.D. student at the IRLab supervised by Dr. Aliannejadi and Prof. Kanoulas. He worked in IR for the past two years at Naver Labs Europe, and joined UvA to focus on Conversational search.
Shubham Chatterjee, Missouri University of Science and Technology, Missouri, USA. Dr. Chatterjee is an Assistant Professor of Computer Science.Dr. Chatterjee works on Neural IR, Entity-Orinted Search, Conversational IR, and the applications for LLMs to these areas.
Jeff Dalton, University of Edinburgh, Scotland. Dr. Dalton is a Associate Professor (Reader) and Chancellor's Fellow at the School of Informatics, the University of Edinburgh. He is also a Turing AI Fellow and PI for the GRILL Lab. His research focuses on new methods for machine understanding of language and text data using deep neural networks and entity knowledge graphs for improving information seeking applications.
Leif Azzopardi, University of Strathclyde, Scotland. Dr. Azzopardi is an Associate Professor in Artifical Intelligence and Data Science within the Departement of Computer and Information Sciences at the University of Strathclyde. He is the PI for the Interaction Lab (i-lab) which specializes in developing, evaluating and modelling information rich and information intensive applications and agents.
"},{"location":"#submit-your-runs-ikat-2024","title":"Submit Your Runs (iKAT 2024)","text":"We will share details of how to submit the runs soon. We will also provide a validation script to validate your runs for submission. Runs failing the validation script will not be accepted.
"},{"location":"#announcements","title":"Announcements","text":".tsv
file has this format: doc_id passage_number passage_MD5
. Total download size is 2.2GB. 2023_top_1000_query_results.zip This zip file has queries from both training and testing topics, saved in queries_train.txt
and queries_test.txt
respectively. The results from the iKAT searcher (BM25
using manually resolved queries) are saved in the query_results_train
and query_results_test
folders. Each result file, with up to 1000 results, corresponds to a query based on line numbers, starting from zero. For instance, the results for the first query in queries_train.txt
can be found in query_results_train/query_results_000.txt
. In each result file, every line shows the ClueWeb22 ID
followed by the URL
. ret_bm25_rm3--type_automatic--num_ptkb_3--k_100--num_psg_3.official.run.json Automatic Run ret_bm25_rm3--type_manual--num_ptkb_3--k_100--num_psg_3.official.run.json Manual Run"},{"location":"additional_data/#baseline-runs-ikat-2023","title":"Baseline Runs iKAT 2023","text":"We provide two baseline runs (linked above).
BM25+RM3
(Pyserini default) as the initial retrieval (denoted by ret_bm25_rm3
in the file name) method to retrieve 100 passages per query (denoted by k_100
).num_ptkb_3
in the file name).num_psg_3
in the file name). We use the T5
model mrm8488/t5-base-finetuned-summarize-news
available on HuggingFace for this purpose. SentenceTransformers
, specifically, the model cross-encoder/ms-marco-MiniLM-L-6-v2
available on HuggingFace.castorini/t5-base-canard
model available on HuggingFace.ptkb_provenance
field were used. resolved_utterance
field was used.We provide the data from previous years' TREC CAsT below. The iKAT topics are similar, with the addition of the Personal Text Knowledge Base. For more information on TREC CAsT, see the website and read the overview papers [2019] [2020] [2021] [2022]
Note. TREC CAsT did not include a PTKB but you can be creative and modify the data according to your needs. Also, TREC CAsT used different collections (Wikipedia, KILT, MS MARCO, etc.) at different stages. iKAT is using a subset of the recently released ClueWeb22-B.
"},{"location":"additional_data/#cast-year-4-2022","title":"CAsT Year 4 (2022)","text":"File Description 2022_automatic_evaluation_topics_tree_v1.0.json Contains each conversation tree (topic) with an automatic rewrite generated for each user utterance. 2022_evaluation_topics_turn_ids.json Contains each conversation tree (topic) with the resolved query for each user utterance. 2022_evaluation_topics_tree_v1.0.json Contains all ids that responses/ranked passages need to be returned for. 2022_evaluation_topics_flattened_duplicated_v1.0.json Contains all possible conversation paths across all the conversation trees."},{"location":"additional_data/#cast-year-3-2021","title":"CAsT Year 3 (2021)","text":"File Description 2021_automatic_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Automatic 2021_manual_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Manual 2021qrels.txt Qrels file for passage ranking task."},{"location":"additional_data/#cast-year-2-2020","title":"CAsT Year 2 (2020)","text":"File Description 2020_automatic_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Automatic 2020_manual_evaluation_topics_v1.0.json 25 primary evaluation topics in JSON format. Variant: Manual 2020qrels.txt Qrels file for passage ranking task."},{"location":"additional_data/#cast-year-1-2019","title":"CAsT Year 1 (2019)","text":"File Description train_topics_v1.0.json 30 example training topics in JSON format. evaluation_topics_v1.0.json 50 evaluation topics in JSON format. 2019qrels.txt Official evaluation qrels file for passage ranking task. train_qrels.txt Limited (incomplete) training judegements for 5 topics (approximately 50 turns). The judgments are graded on a three point scale (2 very relevant, 1 relevant, and 0 not relevant)."},{"location":"data/","title":"Datasets and Resources","text":""},{"location":"data/#topics-for-ikat-year-2-2024","title":"Topics for iKAT Year 2 (2024)","text":"File Description 2024_test_topics.json Test topics in JSON format."},{"location":"data/#additional-data","title":"Additional Data","text":"We also provide additional data from iKAT Year 1 and TREC CAsT 2019-2022.
"},{"location":"data/#trec-ikat-clueweb22-b-passage-collection","title":"TREC iKAT ClueWeb22-B Passage Collection","text":"We provide a segmented version of the TREC iKAT ClueWeb22-B Document collection available from CMU in two formats: JSONL
and TrecWeb
.
In case you have segmented the document collection yourself, you may check whether your segments match ours using the tsv
file of passage hashes provided.
JSONL
format.{\"id\": \"[passage id]\", \"contents\": \"[passage text]\", \"url\": \"[ClueWeb22 document URL]\"}
doc_id:passage_number
Passage collection in TrecWeb
format.
Passage hashes.
tsv
file containing MD5 hashes of passage texts..tsv
file has this format: doc_id passage_number passage_MD5
We also provide a sparse Lucene index generated from the JSONL
passage files above using Pyserini. The files form a single .tar.bz2
archive split into sections for simpler downloading due to the overall size. To extract the archive, once downloaded, you must combine each of the sections in name order back into a single file:
cat ikat_2023_passage_index.tar.bz2.part* > ikat_203_passage_index.tar.bz2\n
Total download size is approximately 150 GB
"},{"location":"data/#how-do-i-access-these-resources","title":"How do I access these resources?","text":"Each team should use a URL of https://ikattrecweb.grill.science/<team_name>
to access the files. The page will ask for a userID and password. Enter the login details which you obtained from the iKAT organizers. You should see a page which lists each type of data and has links to the individual files listed above, along with their checksum files.
NOTE: Please do not share IPs in the 10.x.x.x
range which is for private networks. We would need a suitable public IP so that we may configure the above download link to work for you.
iKAT Searcher is a simple tool developed to help with creating the topics for iKAT. The tool allows topic developers to visually assess the behaviour of a retrieval system, ultimately making it easier to develop challenging, but interesting, topics for the Track. You can interact with the system here. See the GitHub repository.
"},{"location":"data/#run-validation","title":"Run Validation","text":"We provide code for run validation in our Github repository. Please see the associated README file for detailed instructions on how to run the code. It is crucial to validate your submission files before submitting them. The run files that fail the validation phase will be discarded. We advise you to get familiarized with the validation script as soon as possible and let us know if you have any questions or encounter any problems working with it.
Note. You need the MD5 hash file of the passages in the collection to run the validation code. You can download this file from above.
Below is a summary of the checks that the script performs on a run file.
protocol_buffers/run.proto
?run_name
field non-empty?run_type
field non-empty and set to automatic
or manual
?passage_provenance
passage IDs appear in the collection?text
field?passage_provenance
entry:passage_provenance
entries listed for the response?passage_provenance
with its used
field set to True in the response?passage_provenance
entry?ptkb_provenance
entry?ptkb_provenance
entry:ptkb
field of the topic data?To help you get started, we (the iKAT organizers) have put together this guide. In this demo, we'll explore and build the components of a simple iKAT system. These components include:
The diagram above shows how the components of our system interact.
Given a query, conversation context, and the PTKB of the user, our system's Query Rewriter reformulates the query to resolve ambiguity. Next, the Passage Retriever uses the reformulated query to retrieve the top-K candidate passages from an index. Finally, the Response Generator uses the top-N of the K retrieved passages to generate a coherant response. The output of our system is a response along with the provenance relevant passages used to construct the response for the input query, based on the conversation context.
"},{"location":"demo/#setup","title":"Setup","text":"Before putting our system together, let's download the topics and the demo collection.
"},{"location":"demo/#trec-ikat-2023-simple-english-wikipedia-passage-collection","title":"TREC iKAT 2023 Simple English Wikipedia Passage Collection","text":"Downloading and processing the entire TREC iKAT 2023 ClueWeb22-B passage collection is not possible on Colab. Moreover, it requires a licenece to use. For this demo, we will use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. The iKAT organizers have preprocessed the articles and created a passage collection for you to use. This collection is in a jsonl
format. An example record from the collection is shown below:
{\n \"id\": \"simplewiki:Ted%20Cassidy:0\",\n \"contents\": \"Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on \\\"The Addams Family\\\".\",\n \"title\": \"Ted Cassidy\",\n \"wiki_id\": \"9822\"\n}\n
Each record in this collection contains the following fields:
id
: The passage id is a combination of (1) the string \"simplewiki:\", (2) the encoded title of the Wikipedia page, and (3) the passage number. This is similar to the iKAT 2023 passage id format (doc_id:passage_number) contents
: The text of the passage.title
: The title of the Wikipedia page to which this passage belongs.wiki_id
: The Wikipedia page ID of the Wikipedia page to which this passage belongs. These IDs are unique and will never changeNote. As this collection is a toy collection meant for demo purposes, the quality of results we obtain in this tutorial may be affected.
!pip install gdown\n
!echo \"Creating target directory..\"\n!mkdir -p ikat_demo\n!mkdir -p ikat_demo/collection\n\nimport gdown\n# The Google Drive file ID and the destination path\nurl = 'https://drive.google.com/uc?id=1touBjwkPByH69utT9_sevr5nYT0TTZ2M'\noutput = '/content/ikat_demo/collection/simplewiki-2020-11-01.passages.jsonl'\ngdown.download(url, output, quiet=False)\n\nurl = 'https://drive.google.com/uc?id=1zPSiAqLmbx9QFGm6walnuMUl7xoJmRB7'\noutput = '/content/ikat_demo/test.json'\ngdown.download(url, output, quiet=False)\n
"},{"location":"demo/#creating-a-bm25-index","title":"Creating a BM25 Index","text":"Now, we'll use the Pyserini information retrieval toolkit to build a sparse index for the collection we just downloaded. Pyserini provides APIs for our indexing needs and supports both sparse and dense retrieval. Alternatively, you may also use PyTerrier.
First, let's install Pyserini and its dependcies.
!pip install pyserini\n!pip install faiss-cpu\n
Pyserini provides ingestors for document collections in many different formats. The simplest, however, is the following JSON format:
{\n \"id\": \"doc1\",\n \"contents\": \"this is the contents.\"\n}\n
The collection to be used with Pyserini must be in a jsonl
format, where each line is a json
record structured as above. The preprocessed collection that we provide is already in a jsonl
format.
!python -m pyserini.index.lucene \\\n --collection JsonCollection \\\n --input '/content/ikat_demo/collection/' \\\n --index '/content/ikat_demo/index' \\\n --generator DefaultLuceneDocumentGenerator \\\n --threads 8 \\\n --storePositions --storeDocvectors --storeRaw\n
To check that our new sparse index works, let's try searching with it. The code below loads the index and searches for the query global warming
.
from pyserini.search.lucene import LuceneSearcher\n\nsearcher = LuceneSearcher('ikat_demo/index')\nquery = 'global warming'\nhits = searcher.search(query)\n\nfor i in range(len(hits)):\n print(f'{i+1:2} {hits[i].docid:4} {hits[i].score:.5f}')\n
Let's see the contents of the best ranking document.
import json\nbest_ranked_doc = searcher.doc(hits[0].docid)\nparsed_doc = json.loads(best_ranked_doc.raw())\nparsed_doc['contents']\n
"},{"location":"demo/#query-rewriting","title":"Query Rewriting","text":"iKAT topics mimic real-world dialogue phenomena. As a result, utterances within topics become increasingly ambiguous as the topic unfolds. On their own, these utterances likely won't return good candidates from our index, so we need to reformulate them using information from prior turns in the topic.
Let's examine the utterances in a topic to demonstrate the need for query rewriting. As an example, we will examine a segment from the iKAT 2023 test topic on \"Finding a diet\" conversation.
{\n \"turn_id\": 1,\n \"utterance\": \"Hey, Assistant! I am wondering what to cook for dinner tonight for my family.\",\n \"resolved_utterance\": \"Hey, Assistant! I am wondering what to cook for dinner tonight for my family.\",\n \"response\": \"Hey there! There are many healthy and delicious recipes you could try! For example, a stir-fry with lean proteins like chicken or tofu and a variety of colorful vegetables is always a good option. Or you could try making a Mediterranean quinoa salad with cucumbers, tomatoes, feta cheese, and a light lemon vinaigrette. Baked salmon with a side of roasted vegetables is also a great choice.\",\n},\n{\n \"turn_id\": 2,\n \"utterance\": \"Nah, we had salmon just two days ago. But the Mediterranean dish that you mentioned sounds interesting, could you provide the recipe for that?\",\n \"resolved_utterance\": \"Nah, we had salmon just two days ago. But the Mediterranean quinoa salad sounds interesting, can you provide the recipe for the Mediterranean quinoa salad?\",\n \"response\": \"Ah, I see. Of course! Here's a simple recipe for Mediterranean Quinoa Salad. The ingredients you need are 1 cup quinoa, 2 cups water, 1/4 cup extra-virgin olive oil, 2 limes, juiced, 2 teaspoons ground cumin, 1 teaspoon salt, 1/2 teaspoon red pepper flakes (optional), 1 1/2 cups halved cherry tomatoes, 1 cucumber, diced, 1/2 cup chopped fresh mint, 1/2 cup feta cheese. Instructions: Bring quinoa and water to a boil in a saucepan. Reduce heat to medium-low, cover, and simmer until quinoa is tender and water has been absorbed, 10 to 15 minutes. Set aside to cool. In a large bowl, whisk olive oil, lime juice, cumin, salt, and red pepper flakes together to create the dressing. In the bowl with the dressing, mix in quinoa, tomatoes, cucumber, and mint. Stir until well mixed. Top with feta cheese before serving. Enjoy!\"\n},\n{\n \"turn_id\": 3,\n \"utterance\": \"That's great, thanks! But I'm thinking of reducing the amount of oil in the dressing. Any suggestions?\",\n \"resolved_utterance\": \"That's great, thanks! But I'm thinking of reducing the amount of oil in the salad dressing. Any suggestions?\",\n \"response\": \"Certainly! You can reduce the amount of oil by using more lime juice or adding some water. Alternatively, you could substitute some of the oil with a different kind of liquid like vinegar. This should maintain a good balance between tartness and the need for oil as an emulsifier. Which of these do you think works the best for you?\"\n},\n{\n \"turn_id\": 4,\n \"utterance\": \"Thanks! My children don't like vinegar, so I would probably go with lime juice. Can I also use butter instead?\",\n \"resolved_utterance\": \"Thanks! My children don't like vinegar, so I would probably go with lime juice. Can I also butter instead of oil in the Mediterranean Quinoa Salad?\",\n \"response\": \"Butter can be substituted for vegetable oil in some dishes, but not always. Butter is a solid fat, while vegetable oil is liquid, so they have different effects on the consistency and texture of the food. Butter is also higher in calories and lower in trans fats than vegetable oil. If you want to use butter instead of vegetable oil, you may need to adjust the quantity, melt it down, and let it cool a bit. For example, 1 cup of butter equals 7/8 cup of vegetable oil.\"\n},\n{\n \"turn_id\": 5,\n \"utterance\": \"I see, thanks! What are some cheese substitutes?\",\n \"resolved_utterance\": \"I see, thanks! What are some substitutes for cheese in the Mediterranean Quinoa Salad?\",\n \"response\": \"I can suggest some alternatives for feta cheese. One option is to use a lower fat cheese like mozzarella. Another is to use a plant-based cheese substitute, if you're open to vegan options. You could also try adding a bit of nutritional yeast for a cheese-like flavor with less fat and more protein.\"\n},\n{\n \"turn_id\": 6,\n \"utterance\": \"Can you compare the first two?\",\n \"resolved_utterance\": \"Can you compare mozzarella with plant-based cheese?\",\n \"response\": \"Mozzarella cheese is a dairy product derived from milk and produced in a wide range of flavors. It comprises fat, usually from the milk of cows, buffalo, goats, or sheep. On the other hand, vegan cheese is a non-dairy, plant-based product. Vegan cheese is 100 percent animal-free cheese and is made using vegetable proteins. The ingredients of vegan cheese include soy, nuts, cashews, and vegetable oils (such as coconut oil). Vegan cheese is made using a process similar to traditional cheese making-without the animal milk. During the process, the plant proteins are separated using bacteria. Then the ingredients, such as oils and thickeners, are added to achieve the desired cheese-like consistency.\"\n},\n{\n \"turn_id\": 7,\n \"utterance\": \"Which of these is suitable for me?\",\n \"resolved_utterance\": \"Which of mozzarella and vegan cheese is suitable for me?\",\n \"response\": \"Based on the nutritional facts, mozzarella can be considered safe for lactose-intolerant people to eat, but be aware that the type of mozzarella you buy may not help reduce your symptoms. This is because many types of processed and packaged mozzarella also contain whey powder or casein - two other forms of dairy which are high in lactose. However, mozzarella has almost no lactose. Just one ounce of cheese provides around 0.3 grams of lactose. Vegan cheese is also a good alternative for lactose-intolerant people. Vegan cheeses are 100 percent animal-free and made using vegetable proteins. There are many different types of vegan cheese available, including vegan mozzarella. So, both mozzarella and vegan cheese can be suitable for lactose-intolerant people. It depends on your personal preference and dietary needs.\"\n},\n
This topic starts with a question regarding selecting a diet. If we isolate Turn 6
from the rest of the conversation and use it for search, we would likely get minimal, if any, results.
Now, let's see how a query rewriter helps.
We'll use a T5
query rewriter from HuggingFace
. It is finetuned on the CANARD
dataset but works effectively on iKAT queries.
# Load model and tokenizer from HuggingFace\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\nimport torch\n\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nrewriter = AutoModelForSeq2SeqLM.from_pretrained(\"castorini/t5-base-canard\").to(device).eval()\nrewriter_tokenizer = AutoTokenizer.from_pretrained(\"castorini/t5-base-canard\")\n
The model rewrites an utterance using that utterance and all previous utterances and system responses as input. The utterance and previous turn utterances and system responses should be separated by |||
when building the input to the model.
Let's read the json
data file and load the turns.
with open('/content/ikat_demo/test.json', 'r') as f:\n topics = json.load(f)\n
Next, we write a small function to extract the context.
The provided Python function, extract_context
, extracts a sequence of utterances and responses up to a given turn_id
from a JSON data structure. Here's a breakdown:
Purpose: Extracts a series of utterances and responses up to a specified turn from a given JSON data based on the provided number
.
Parameters:
json_data
: A list of dictionaries, where each dictionary represents a conversation that has a unique number and contains a series of turns.number
: The unique identifier for a specific conversation in the JSON data.turn_id
: A specified turn up to which the utterances and responses will be extracted.Process:
a. Locate Conversation: Loops through the json_data
to find the dictionary with the given number
.
b. Error Handling: If no dictionary with the given number
is found, it returns a message indicating so.
c. Extracting Text: Loops through the turns within the found conversation and appends the utterances and responses up to the turn_id
to a list.
d. Context Formation: Concatenates the extracted utterances and responses using \"|||\" as a separator to form the context.
Output: A tuple containing:
utterance
for the provided turn_id
.context
, which is the sequence of utterances and responses up to the given turn_id
, concatenated with \"|||\".def extract_context(json_data, number, turn_id):\n # Find the correct dictionary with the given number\n data = None\n for item in json_data:\n if item['number'] == number:\n data = item\n break\n\n # If we couldn't find the data for the given number\n if not data:\n print(\"No data found for the given number.\")\n return \"No data found for the given number.\", None\n\n # Extract the utterance and response values\n texts = []\n current_utterance = \"\"\n for turn in data['turns']:\n if turn['turn_id'] < turn_id:\n texts.append(turn['utterance'])\n texts.append(turn['response'])\n elif turn['turn_id'] == turn_id:\n current_utterance = turn['utterance']\n texts.append(current_utterance)\n\n # Join the texts with \"|||\" separator\n context = '|||'.join(texts)\n\n return current_utterance, context\n
Now we can use this function to extract the context for a given topic number
and turn_id
in the topic.
number_to_search = \"10-1\"\nturn_id_to_search = 6\nutterance, context = extract_context(topics, number_to_search, turn_id_to_search)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Turn Context: {context}\")\n
NOTE: When building context this way, there's a risk that the input can become too lengthy for subsequent interactions, especially in extended discussions. For handling this, you can experiment with various context truncation methods. A straightforward strategy is to eliminate earlier turn utterances and responses if the input size surpasses the model's token limit.
Now, let's rewrite the query using our model.
def rewrite_query(context: str, model, tokenizer, device) -> str:\n tokenized_context = tokenizer.encode(context, return_tensors=\"pt\").to(device)\n output_ids = model.generate(\n tokenized_context,\n max_length=200,\n num_beams=4,\n repetition_penalty=2.5,\n length_penalty=1.0,\n early_stopping=True\n ).to(device)\n\n rewrite = tokenizer.decode(output_ids[0], skip_special_tokens=True)\n return rewrite\n
rewrite = rewrite_query(context, rewriter, rewriter_tokenizer, device)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Query Rewrite: {rewrite}\")\n
Hmm, that didn't really help! \ud83d\ude1e The rewriter did expand the query but with the wrong information!
"},{"location":"demo/#expanding-the-context-using-relevant-ptkb-statements","title":"Expanding the Context using Relevant PTKB Statements","text":"One major difference between iKAT and CAsT is the presence of the Personal Text Knowledge Base (PTKB). In the first year, we are providing the PTKB as a dictionary of statements about the user. Each PTKB defines a user's profile and controls how the system should respond to the user. For the example conversation above, the PTKB, as provided in the test data, is as below.
{\n \"1\": \"I want to know about healthy cooking techniques.\",\n \"2\": \"I am lactose intolerant.\",\n \"3\": \"I'm looking for a speaker set to match my TV.\",\n \"4\": \"I'm willing to drive a long distance to find a cheaper TV.\",\n \"5\": \"I'm hoping to find some offers and discounts for TV.\",\n \"6\": \"I like to eat fruits and vegetables.\",\n \"7\": \"I don't read much.\",\n \"8\": \"I want to cook healthy and tasty recipes for my family.\",\n \"9\": \"I am on a diet and prefer low-calorie food.\",\n \"10\": \"I want to know about the nutritional value of the ingredients I use.\",\n \"11\": \"I'm looking for a new TV to replace my current one.\",\n \"12\": \"I want a TV that is okay for light and size of my living room.\"\n},\n
Above, we re-wrote the query using the context. But for a more persoanlized conversation, one approach to query rewriting could be to use the PTKB statements in the query reformulation process.
To incorporate the PTKB into the system, we must answer two questions:
In a manual
run, you may use the ptkb_provenance
fields. This field was manually populated by the iKAT topic developers and provides a straightforward way to identify relevant PTKB statements for the given turn utterance. However, a more difficult (and perhaps interesting) exercise is to automatically identify relevant PTKB statements for the given turn.
One easy-to-implement (and probably good) solution is to use BERT
embeddings. Specifially, we can use SentenceTransformers
SentenceTransformers
is a Python framework designed for sentence, text, and image embeddings. the foundational work on this was presented in the paper titled Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.
This tool enables computation of sentence and text embeddings in over 100 languages. You can then use cosine similarity, for instance, to identify sentences of similar meanings. It's particularly valuable for semantic text similarity, semantic searching, and paraphrase detection.
Built on PyTorch and Transformers, the framework boasts a vast array of pre-trained models optimized for diverse tasks. Moreover, fine-tuning your models is a breeze.
We are going to use the CrossEncoder
model from SentenceTransformers
to identify the relevant PTKB statements. Specifically, we are going to re-rank the PTKB statements based on the current utterance.
A CrossEncoder
-based re-ranker can significantly enhance the end results for users. In this approach, both the query and a potential document are fed into the transformer network concurrently. The network then produces a score between 0 and 1, signifying the document's relevance to the query.
The strength of a CrossEncoder
lies in its superior performance, stemming from its ability to execute attention operations across both the query and the document.
We will use cross-encoder/ms-marco-MiniLM-L-6-v2
model from HuggingFace that scores the query and all retrieved passages for their relevancy.
For a complete introduction to using cross encoders and retrieval and reranking, see this notebook.
First, we need to install the SentenceTransformers
library
!pip install sentence-transformers\n
Next, we write a small function that will rerank the PTKB statements for the given query.
The provided Python function, get_ptkb_statements
, compares statements from the PTKB with a query to determine their similarity. Here's a step-by-step explanation of the function:
Purpose: The function aims to return the top num_ptkb
statements from the PTKB that are most similar to the given query
.
Parameters:
query
: The user's input or question.num_ptkb
: The number of PTKB statements to return.ptkb
: A dictionary of the PTKB statements.reranker
: A model that predicts the similarity score between two texts.Process:
a. Calculate Similarity Scores: For each statement in the PTKB, it computes a similarity score with the query
using the reranker
. The score is between 0 and 1, with 1 being highly similar.
b. Pair Statements with Scores: The statements from the PTKB are paired with their respective similarity scores.
c. Sort Pairs: The pairs are then sorted in descending order based on their similarity scores.
d. Extract Statements: From the sorted pairs, the actual statements are extracted.
e. Return Top Statements: The top num_ptkb
statements are then concatenated into a single string and returned.
Output: A string containing the top num_ptkb
statements from the PTKB that are most similar to the given query
, separated by spaces.
def get_ptkb_statements(query, num_ptkb, ptkb, reranker):\n # Find the similarity of PTKB statements with the given query\n similarity_scores = [reranker.predict([[query, ptkb_statement]])[0] for ptkb_statement in ptkb.values()]\n\n # Pair each statement with its similarity score\n statement_score_pairs = list(zip(list(ptkb.values()), similarity_scores))\n\n # Sort the pairs based on the similarity scores in descending order\n sorted_pairs = sorted(statement_score_pairs, key=lambda x: x[1], reverse=True)\n\n # Extract the sorted responses\n sorted_ptkb_statements = [pair[0] for pair in sorted_pairs]\n\n # Return required number of PTKB statements\n return ' '.join(sorted_ptkb_statements[:num_ptkb])\n
Now, let's use this function to find the top relevant PTKB statements for a given turn.
query = \"Can you compare the first two?\"\nptkb = {\n \"1\": \"I want to know about healthy cooking techniques.\",\n \"2\": \"I am lactose intolerant.\",\n \"3\": \"I'm looking for a speaker set to match my TV.\",\n \"4\": \"I'm willing to drive a long distance to find a cheaper TV.\",\n \"5\": \"I'm hoping to find some offers and discounts for TV.\",\n \"6\": \"I like to eat fruits and vegetables.\",\n \"7\": \"I don't read much.\",\n \"8\": \"I want to cook healthy and tasty recipes for my family.\",\n \"9\": \"I am on a diet and prefer low-calorie food.\",\n \"10\": \"I want to know about the nutritional value of the ingredients I use.\",\n \"11\": \"I'm looking for a new TV to replace my current one.\",\n \"12\": \"I want a TV that is okay for light and size of my living room.\"\n}\nnum_ptkb = 3\n
from sentence_transformers import CrossEncoder\nreranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')\nptkb_statements = get_ptkb_statements(query, num_ptkb, ptkb, reranker)\nptkb_statements\n
"},{"location":"demo/#question-2-how-do-we-use-these-relevant-ptkb-statements","title":"Question 2: How do we use these relevant PTKB statements?","text":"One possible way of using these relevant PTKB statements is to include them in the context when re-writing the query.
Let's see how that works. We will modify out previous function extract_context
a little to include the relevant PTKB statements.
def extract_context_with_ptkb_statements(json_data, number, turn_id, ptkb_statements):\n # Find the correct dictionary with the given number\n data = None\n for item in json_data:\n if item['number'] == number:\n data = item\n break\n\n # If we couldn't find the data for the given number\n if not data:\n print(\"No data found for the given number.\")\n return \"No data found for the given number.\"\n\n # Extract the utterance and response values\n texts = [ptkb_statements]\n current_utterance = \"\"\n for turn in data['turns']:\n if turn['turn_id'] < turn_id:\n texts.append(turn['utterance'])\n texts.append(turn['response'])\n elif turn['turn_id'] == turn_id:\n current_utterance = turn['utterance']\n texts.append(current_utterance)\n\n # Join the texts with \"|||\" separator\n context = '|||'.join(texts)\n\n return current_utterance, context\n
number_to_search = \"10-1\"\nturn_id_to_search = 6\nutterance, context = extract_context_with_ptkb_statements(topics, number_to_search, turn_id_to_search, ptkb_statements)\nprint(f\"Raw Utterance: {utterance}\")\nprint(f\"Turn Context: {context}\")\n
rewrite = rewrite_query(context, rewriter, rewriter_tokenizer, device)\nprint(f\"Query Rewrite: {rewrite}\")\n
That didn't help either! \ud83d\ude1e
This is a really difficult query for the system! We are excited \ud83e\udd29 to see how your system handles such queries.
Alternatively, we can also append the PTKB statements to the rewritten query (without PTKB statements).
"},{"location":"demo/#passage-retrieval-and-reranking","title":"Passage Retrieval and Reranking","text":"In iKAT 2023, we provide several tasks, see the guidelines section of the webpage for more details.
One core task in iKAT 2023 involves producing a ranked list of relevant passages corresponding to a specific user utterance. During the Passage Retrieval phase, we employ the rephrased query (either manually or automatically adjusted) to fetch a potential set of passages from the previously generated sparse index.
The retrieve-then-rerank approach is a widely adopted strategy in Information Retrieval systems, aimed at enhancing the quality of the preliminary set of candidates. The process commences with a swift and effective retrieval method to fetch the initial set of passages. One prevalent method for this is BM25. However, there's also the option of adopting dense retrieval methods like Bi-encoders. For a comprehensive understanding of utilizing bi-encoders in retrieval, consider checking this guide.
Subsequent to this initial retrieval, the candidate set undergoes a reranking process, leveraging more advanced methods. An example would be rerankers rooted in BERT, known as cross-encoders. In this tutorial, we'll specifically employ the CrossEncoder
from the SentenceTransformers
library.
We will first retrieve a candidate set of passages from our index using BM25. As query, we will use the manually resolved utterance from turn_id=6
in the example shown above.
def retrieve_using_bm25(query):\n hits = searcher.search(query)\n candidate_set = []\n for i in range(len(hits)):\n print('Rank: {} | PassageID: {} | Score: {}'.format(i+1, hits[i].docid, hits[i].score))\n doc = searcher.doc(hits[i].docid)\n parsed_doc = json.loads(doc.raw())\n print(parsed_doc['contents'])\n candidate_set.append({\n 'passage_id': hits[i].docid,\n 'bm25_rank': i+1,\n 'bm25_score': hits[i].score,\n 'passage_text': parsed_doc['contents']\n })\n print('=================================')\n return candidate_set\n
"},{"location":"demo/#step-2-rerank-using-crossencoder","title":"Step-2: Rerank using CrossEncoder","text":"Next, we will rerank this candidate set using the CrossEncoder
defined earlier.
def rerank_passages(query, passages, reranker):\n res = []\n query_passage_pairs = [[query, passage['passage_text']] for passage in passages]\n scores = reranker.predict(query_passage_pairs)\n\n for passage, score in zip(passages, scores):\n passage['reranker_score'] = score\n res.append(passage)\n\n ranked_passages = sorted(passages, key=lambda x: x['reranker_score'], reverse=True)\n return ranked_passages\n
query = \"Can you compare mozzarella with plant-based cheese?\"\ncandidate_set = retrieve_using_bm25(query)\n
import numpy as np\nreranked_passages = rerank_passages(query, candidate_set, reranker)\nprint(json.dumps(reranked_passages, indent=4, default=lambda o: float(o) if isinstance(o, np.float32) else o))\n
These results are not great. An important thing to note here is that we are doing retrieval over a very small corpus of SimpleEnglishWikipedia
. As mentioned earlier, the results may not be of high quality.
One of the tasks in iKAT 2023 is response generation. After retrieval, the system should use the top-K passages to generate a short response (250 words or less) that is appropriate for an interactive conversational agent to give to the user.
Let's explore one way this can be done, by framing the task as a summarisation problem. We will use the T5
model for this purpose. Specifically, we will use the mrm8488/t5-base-finetuned-summarize-news
model from HuggingFace.
The mrm8488/t5-base-finetuned-summarize-news
is Google's T5-base
model fine-tuned on the News Summary dataset for the downstream task of summarization.
First, we will write a short function for this task.
The generate_response
function is described below:
Purpose: Generates a summarized response based on the top passages from a set of documents returned by a search operation.
Parameters:
passages
: A set of top documents or hits returned by the search operation.model
: An instance of a pre-trained sequence-to-sequence language model (from the AutoModelForSeq2SeqLM
class) for generating summaries.tokenizer
: An instance of a tokenizer (from the AutoTokenizer
class) used to tokenize and decode text.Process:
a. Consolidating Passages: Combines all the extracted passages into one continuous string.
b. Tokenization and Input Formation: Tokenizes the combined text and pre-processes it by adding a \"summarize: \" prefix. The tokenized input is adjusted to not exceed a specified maximum length (512 tokens) and is moved to the desired computation device.
c. Generating Summary: Utilizes the sequence-to-sequence language model to generate a summarized response based on the input. Applies various parameters to control and improve the quality of the output summary.
d. Decoding the Summary: Transforms the token IDs from the generated summary back into human-readable text, ensuring any special tokens are omitted.
Output: Returns a coherent and summarized text derived from the top passages of the documents.
def generate_response(passages, model, tokenizer):\n text = ' '.join(passages)\n inputs = tokenizer.encode(\"summarize: \" + text, return_tensors=\"pt\", max_length=512, truncation=True)\n with torch.no_grad():\n summary_ids = model.generate(\n inputs,\n max_length=250,\n min_length=50,\n length_penalty=2.0,\n num_beams=4,\n early_stopping=True\n )\n summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)\n return summary\n
summarizer = AutoModelForSeq2SeqLM.from_pretrained('mrm8488/t5-base-finetuned-summarize-news')\nsummarizer_tokenizer = AutoTokenizer.from_pretrained('mrm8488/t5-base-finetuned-summarize-news')\n
# We use the top-3 reranked passages to generate a response\npassages = [passage['passage_text'] for passage in reranked_passages][:3]\nprint(json.dumps(passages, indent=4))\n
generate_response(passages, summarizer, summarizer_tokenizer)\n
"},{"location":"guidelines/","title":"Guidelines for iKAT 2024 Year 2","text":"The guidelines for iKAT 2024 (year 2) are now available as a Google Doc.
The guidelines for iKAT 2023 (year 1) are also available as a Google Doc.
"},{"location":"guidelines/#participation","title":"Participation","text":"Participants must register to submit. To request a late registration, please email trec@nist.gov
requesting a registration key. The dissemination form must be returned to submit runs.
In iKAT, the direction of the conversation can be changed in each turn based on:
The previous response from the user,
The persona of the user, and
The information learned from the user (background, perspective, and context).\u00a0
The persona of the user and their information needs form the direction of the conversation. Each topic will have multiple conversations based on multiple personas and results in different outputs that demonstrates the personalized aspect of the conversations. To this aim, the persona and the information needs of the user are modeled by generating a Personal Textual Knowledge Base (PTKB) during the conversation.\u00a0
Note: The PTKB is provided for each conversation and the participants do not have to generate or update it.
"},{"location":"guidelines/#task-overview","title":"Task Overview","text":"In Year 2, the following inputs are provided to the participants at each conversation turn:
We offer the following tasks:
PTKB Statement Ranking: For each turn, given the PTKB, return a ranking of the relevant PTKB statements. This task is essentially a binary classification task. So, the output required is a list of statements from the PTKB.
Passage Ranking and Response Generation: For each turn, retrieve and rank relevant passages from the given collection in response to a user utterance. Then use the ranked passages to return a set of responses. Each response may be simply a passage from the collection. Alternatively, it may also be an extracted or generated summary from one or more passage results. All responses must have at least one passage called \"provenance\" from the collection.
Only Response Generation: For each turn, we will provide a ranked list of passages. The participants need only return a set of responses using this ranked list. As specified above, each response may be simply a passage from the collection. Alternatively, it may also be an extracted or generated summary from one or more passage results. All responses must have at least one passage called \"provenance\" from the collection.
We will provide baseline passage ranking and response generation methods for each of the tasks.
For manual runs, the participants can also use the following inputs provided for each conversational turn:
There are three submission classes:
Automatic: No manually labeled data can be used for this run type. This means that the models should solely rely on the current utterance, and the converation context (i.e., previous user utterance and system\u2019s canonical responses). Moreover, systems should not use the ptkb_provenance
fields from the current or previous turns. They should have a module to automatically identify the relevant PTKB statements (for an example, see the Getting Started
part of the website).
Manual: The manual runs can use the manually annotated data in the models. This includes the following:
ptkb_provenance
) of the current utteranceptkb_provenance
) of previous turns.Only response generation: These use the given passage ranking for response generation. The focus is only on response generation.
Note. In either run type, the participants are not allowed to use any information from the future. In other words, you should assume that for each turn, the only available information is up and including the current user utterance -- the system reponse of the current turn, as well as anything beyond that are hidden.
In the submission form, we will ask the pariticpants to mark which data sources they used in the manual submissions. You may either use some or all available lableled data, but this should be clearly specified in the run submission form.
"},{"location":"guidelines/#important-points-regarding-submissions","title":"Important Points Regarding Submissions","text":"Title of the topic cannot be used.
All fields within the run, as shown in the sample on this website, are mandatory. You may choose not to submit a PTKB statement ranking; in this case, the ptkb_provenance
field may be kept empty in the run; however, it must be present.
The passage_provenance
field can have up to 1000 passages -- less is ok but not more.
Within the passage_provenance
list
in the run, each dict
should have another field called used
. This new field will be a boolean
field indicating whether or not that passage was used to construct the response. If none of the passages have the used
field set to True
, then we will consider the top-5 passages as provenance for that response by default.
Having a response text
for every predicted response is mandatory. In case you are submitting a run that does not generate a response, you may leave this field empty or copy the top-1 passage as your response.
An example of two different conversations based on different personas for the same topic is shown in the following figure. For each user turn, systems should return a ranked list of text responses. Each response has one or more (ranked) source passages as provenance. In addition, the systems should provide a sorted list of relevant statements of PTKB with the corresponding relevance score.
For an explanation of the above diagram, see the Google Doc.
"},{"location":"guidelines/#primary-task-details","title":"Primary Task Details","text":"The main task in iKAT can be defined as personalized retrieval-based \"candidate response retrieval\" in context of the conversation. The task can be divided into the following sub-tasks:
Read the current dialogue turns up to the given turn (context). The provided context is:\u00a0(1) A fixed set of previous responses with provenance in the preceding turns up to the current step, and (2) PTKB of the user. Note: Using information from following turns is not allowed.
Find the relevant statements from PTKB to the information needed for this turn. This task is considered as a relevance score prediction. The output is in the form of a sorted list of the statements from PTKB with corresponding relevance score.
Extract or generate a response. Each response can be generated from multiple passages. It can be an abstractive or extractive summary of the corresponding passages. Each response must have one or more ranked passages as provenance used to produce it.
eval_response=False
] is provided for teams that only want to focus on the ranking task without response generation.Tokenizer
function of spacy.tokenizer
in spaCy v3.3 library), but should vary depending on an appropriate query-response.In the second year of iKAT, we are offering teams two distinct options for response generation:
The text collection contains a subset of ClueWeb22B documents, prepared by the organizers in collaboration with CMU. Documents have then been split into ~116M passages. The goal is to retrieve passages from target open-domain text collections.
"},{"location":"guidelines/#license-for-clueweb22-b","title":"License for ClueWeb22-B","text":"Getting the license to use the collection can be time-consuming and would be handled by CMU, not the iKAT organizers. Please follow these steps to get your data license ASAP:
Sign the license form available on the ClueWeb22 project web page and send the form to CMU for approval (clueweb@andrew.cmu.edu
).
Once you have the license, send a mail to Andrew Ramsay (andrew.ramsay@glasgow.ac.uk
) to have access to a download link with the preprocessed iKAT passage collection, and other resources such as Lucene and SPLADE indices.
Please give enough time to the CMU licensing office to accept your request.
Note.
CMU requires a signature from the organization (i.e., the university or company), not an individual who wants to use the data. This can slow down the process at your end too. So, it\u2019s useful to start the process ASAP.
If you already have an accepted license for ClueWeb22, you do not need a new form. Please let us know if that is the case.
As an alternative for (2), once you have access to ClueWeb22, you can get the raw ClueWeb22-B/iKAT collection yourself with the license, and do all passage-segmentation yourself, but we advise you to use our processed version to avoid any error.
Please do feel free to reach out to us if you have any questions or doubts about the process, so we can prevent any delays in getting the data to you.
"},{"location":"guidelines/#passage-segmentation","title":"Passage Segmentation","text":"For assessment, we will judge provenance passages. We segment the documents in our collection into passages in a similar manner as done by the TREC Deep Learning track for segmenting MS MARCO documents into passages: First, each document is trimmed to 10k characters. Then a 10-sentence sliding window with a 5-sentence stride is used to generate the passages.\u00a0
An example document with some passage segmentation is provided in TrecWeb format below for illustration purposes:
"},{"location":"guidelines/#topic-format","title":"Topic Format","text":"We will provide several sample topics with example baseline runs for validation and testing. Below is a sample topics file with two subtrees of the same topic. Subtrees are identified by topic and subtree ID, i.e topic 1, subtree 2 is 1-2
. Also a passage_provenance
field with a list of provenance passages and ptkb_provenance
field with a list of provenance statements from PTKB, that are used for generating the response, are included. An example is shown below for illustrative purposes.
[\n {\n \"number\": \"1-1\",\n \"title\": \"Finding a University\",\n \"ptkb\": {\n \"1\": \"I graduated from Tilburg University.\",\n \"2\": \"I live in the Netherlands.\",\n \"3\": \"I'm allergic to peanuts.\",\n \"4\": \"I worked as a web developer for 2 years.\",\n \"5\": \"I have a bachelor's degree in computer science.\",\n \"6\": \"I like Indian food.\",\n \"7\": \"My bachelor's GPA is 5.6.\",\n \"8\": \"I'm 26 years old.\",\n \"9\": \"My TOEFL SCORE is 91.\",\n \"10\": \"My interesting bachelor courses are data structure, algorithm, data mining, and artificial intelligence.\",\n \"11\": \"I didn't like computer architecture and logical circuits courses.\"\n },\n \"turns\": [\n {\n \"turn_id\": 1,\n \"utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"resolved_utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"response\": \"Do you want to continue your bachelor's studies and obtain a degree in computer science?\",\n \"ptkb_provenance\": [\n 5\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0034-09-03452:1\",\n \"clueweb22-en0034-09-03452:3\",\n \"clueweb22-en0034-09-03452:5\",\n \"clueweb22-en0034-09-03452:7\",\n \"clueweb22-en0034-09-03452:9\"\n ]\n },\n {\n \"turn_id\": 2,\n \"utterance\": \"Yes, I want to continue my studies in computer science.\",\n \"resolved_utterance\": \"Yes, I want to continue my studies in computer science.\",\n \"response\": \"Do you want to study in the Netherlands, Europe, or somewhere further away?\",\n \"ptkb_provenance\": [\n 2\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0034-09-03452:2\",\n \"clueweb22-en0034-09-03452:4\",\n \"clueweb22-en0034-09-03452:6\",\n \"clueweb22-en0034-09-03452:8\",\n \"clueweb22-en0034-09-03452:10\"\n ]\n },\n {\n \"turn_id\": 3,\n \"utterance\": \"I'd like to stay here.\",\n \"resolved_utterance\": \"I'd like to stay in the Netherlands.\",\n \"response\": \"I can help you with finding a university for continuing your studies in the Netherlands as a computer science student. Take a look at these Top Computer Science Universities in the Netherlands: Delft University of Technology, Eindhoven University of Technology, Vrije Universiteit Amsterdam, University of Amsterdam, Leiden University, Radboud University, Utrecht University, University of Twente\",\n \"ptkb_provenance\": [\n 5,\n 2\n ],\n \"response_provenance\": [\n \"clueweb22-en0034-09-03452:1\"\n ],\n \"sample_passage_ranking\": [\n \"clueweb22-en0012-00-00012:0\",\n \"clueweb22-en0012-00-00012:1\",\n \"clueweb22-en0012-00-00012:2\",\n \"clueweb22-en0012-00-00012:3\",\n \"clueweb22-en0012-00-00012:4\"\n ]\n }\n ]\n },\n {\n \"number\": \"1-2\",\n \"title\": \"Finding a university\",\n \"ptkb\": {\n \"1\": \"I don't like crazy cold weather.\",\n \"2\": \"I don't have a driver's license.\",\n \"3\": \"I plan to move to Canada.\",\n \"4\": \"I'm from the Netherlands.\",\n \"5\": \"I'm used to heavy rains in the Netherlands.\",\n \"6\": \"I graduated from UvA.\",\n \"7\": \"I have bachelor's degree in computer science.\",\n \"8\": \"I speak English fluently.\"\n },\n \"turns\": [\n {\n \"turn_id\": 1,\n \"utterance\": \"I want to start my master's degree, can you help me with finding a university?\",\n \"resolved_utterance\": \"I want to start my master's degree, can you help me with finding a university in Canada?\",\n \"response\": \"Sure, do you want to study computer science?\",\n \"ptkb_provenance\": [\n 7,\n 3\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0040-41-06056:0\",\n \"clueweb22-en0040-41-06056:1\",\n \"clueweb22-en0040-41-06056:2\",\n \"clueweb22-en0040-41-06056:3\",\n \"clueweb22-en0040-41-06056:4\"\n ]\n },\n {\n \"turn_id\": 2,\n \"utterance\": \"Yes, I want to pursue the same major. Can you tell me the name of the best universities?\",\n \"resolved_utterance\": \"Yes, I want to pursue computer science. Can you tell me the name of the best computer science universities in Canada?\",\n \"response\": \"Here are the top universities for computer science in Canada: 1) University of British Columbia, 2) University of Alberta, 3) Concordia University, 4) Simon Fraser University, 5) The University of Toronto\",\n \"ptkb_provenance\": [],\n \"response_provenance\": [\n \"clueweb22-en0026-31-15538:1\",\n \"clueweb22-en0026-31-15538:4\",\n \"clueweb22-en0026-31-15538:6\",\n \"clueweb22-en0040-41-06056:0\"\n ],\n \"sample_passage_ranking\": [\n \"clueweb22-en0010-22-22210:0\",\n \"clueweb22-en0010-22-22210:1\",\n \"clueweb22-en0010-22-22210:2\",\n \"clueweb22-en0010-22-22210:3\",\n \"clueweb22-en0010-22-22210:4\"\n ]\n },\n {\n \"turn_id\": 3,\n \"utterance\": \"Which of them best suits me in terms of weather conditions?\",\n \"resolved_utterance\": \"Which of the following universities best suits me in terms of weather conditions? 1) the University of British Columbia, 2) the University of Alberta, 3) Concordia University, 4) Simon Fraser University, and 5) The University of Toronto.\",\n \"response\": \"I know you don't like very cold weather, but can you give me an estimation of the temperature that is acceptable for you?\",\n \"ptkb_provenance\": [\n 1,\n 5\n ],\n \"response_provenance\": [],\n \"sample_passage_ranking\": [\n \"clueweb22-en0030-30-30030:0\",\n \"clueweb22-en0030-30-30030:1\",\n \"clueweb22-en0030-30-30030:2\",\n \"clueweb22-en0030-30-30030:3\",\n \"clueweb22-en0030-30-30030:4\"\n ]\n }\n ]\n }\n]\n\n\n
"},{"location":"guidelines/#task-submissions","title":"Task Submissions","text":"Participants submit the output of their system on the specified \"test\" topics. A single participant can submit maximum of:
In the automatic runs, the participants can include response generation based on their own ranking. But this is not mandatory.
In the only response generation run, the participants must use the given passage provenances.
We have three submission classes for each of the 1) automatic, 2) manual, and 3) only response generation tasks. An example of the submission template for each run is below.
"},{"location":"guidelines/#sample-submission-for-the-main-task","title":"Sample submission for the main task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"automatic\",\n \"eval_response\": True,\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"ptkb_provenance\": [1,2],\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"score\": 0.6,\n \"used\": False\n\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"score\": 0.5,\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"score\": 0.4,\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"score\": 0.38, \n \"used\": True\n\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"score\": 0.3,\n \"used\": False\n }\n ]\n }\n ]\n }\n ]\n}\n\n
"},{"location":"guidelines/#sample-submission-for-the-only-response-generation-task","title":"Sample submission for the only response generation task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"only_response\",\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"ptkb_provenance\": [1,2],\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"used\": True\n }\n ]\n }\n ]\n }\n ]\n}\n\n\n
"},{"location":"guidelines/#sample-submission-for-the-manual-task","title":"Sample submission for the manual task","text":"{\n \"run_name\": \"sample_run\",\n \"run_type\": \"manual\",\n \"eval_response\": True,\n \"turns\": [\n {\n \"turn_id\": \"1-2_3\",\n \"responses\": [\n {\n \"rank\": 1,\n \"text\": \"The University of British columbia in Vancouver has temperatures near 80 degrees Fahrenheit (27 degrees Celsius) in summer and up to 45 degrees Fahrenheit (about 7 degrees Celsius) in winter which is suitable for you. The university of Toronto is acceptable since has cold winters, average temperatures can drop below -10 \u00b0 C but not below 12 degrees for long. The Concordia university in Montreal is not suitable for you since in the winter, could reach minus 40 with the wind chill. University of Alberta is also not suitable for you. In winter the average temperature varies between -6.5\u00b0C (20.3\u00b0F) and -13.5\u00b0C (7.7\u00b0F). Simon Fraser university is not acceptable for you. The city which the university is located in will reach temperatures of -14 in the winter.\",\n \"passage_provenance\": [\n {\n \"id\": \"clueweb22-en0000-94-02275:0\",\n \"score\": 0.6,\n \"used\": False\n\n },\n {\n \"id\": \"clueweb22-en0027-06-08704:1\",\n \"score\": 0.5,\n \"used\": True\n },\n {\n \"id\": \"clueweb22-en0005-63-12144:0\",\n \"score\": 0.4,\n \"used\": False\n },\n {\n \"id\": \"clueweb22-en0013-01-17558:1\",\n \"score\": 0.38, \n \"used\": True\n\n },\n {\n \"id\": \"clueweb22-en0014-39-04143:0\",\n \"score\": 0.3,\n \"used\": False\n }\n ]\n }\n ]\n }\n ]\n}\n\n
The run_name
is a run submission identifier that should be descriptive and unique to your team and institution.
The run_type
is one of automatic
or manual
.
Each turn
in the turns
list should contain a turn_identifier, consisting of the topic_id-subtree_id and turn_id concatenated with an underscore, e.g., 1-2_3
for topic 1, subtree 2, and turn 3.
Each turn
should also contain a list of responses
. A response consists of text
and a provenance
list. Each provenance should have an ID
, text
, used
, and score
. The used
field indicates whether the passage is used for response generation or not.
Each turn
includes a sorted list of statements from PTKB based on the relevance score of each statement from PTKB to the current turn.
If you want to do only the retrieval and don\u2019t want to do the response generation in an automatic run, you can leave the text
field empty and change the eval_response
to False
.
For provenance ranking, this will be converted to a traditional TREC run format:
31_1-1 Q0 clueweb22-en0000-94-02275:0 1 0.5 sample_run
Runs may include up to 1000 responses for each user turn. For provenance ranking, only the first 1000 pieces of unique provenance will be used. As in previous year of iKAT, only limited top-k responses and provenances will be assessed according to resource constraints.
"},{"location":"guidelines/#evaluation","title":"Evaluation","text":"We will use the relevance assessment methods used in previous years of CAsT for relevance to individual turns.
Similar to iKAT Year 1, only a subset of turns may be evaluated for provenance ranking effectiveness. This will be disclosed to participants after the assessment is completed.
"},{"location":"guidelines/#timeline","title":"Timeline","text":"Task Date Guidelines released May 20, 2024 Test topics released August 5, 2024 Submission deadline August 31, 2024 AOE Results released to participants TBD"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 254a88d..66c2402 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ