From 839f2aeb33d2fe0b6b022dcac52e36f40805f901 Mon Sep 17 00:00:00 2001
From: Dylan Bouchard <dylan.bouchard@cvshealth.com>
Date: Tue, 3 Dec 2024 18:42:25 +0000
Subject: [PATCH] fix reference

---
 paper/paper.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/paper/paper.md b/paper/paper.md
index c26cd0c..5eb371d 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -72,7 +72,7 @@ Class          | Risk Assessed             | Applicable Tasks       |
 
 
 ### Toxicity Metrics
-The `ToxicityMetrics` class facilitates simple computation of toxicity metrics from a user-provided list of LLM responses. These metrics leverage a pre-trained toxicity classifier that maps a text input to a toxicity score ranging from 0 to 1 [@Gehman2020RealToxicityPromptsEN, @liang2023holisticevaluationlanguagemodels]. For off-the-shelf toxicity classifiers, the `ToxicityMetrics` class provides four options: two classifiers from the `detoxify` package, `roberta-hate-speech-dynabench-r4-target` from the `evaluate` package, and `toxigen` available on HuggingFace.^[https://github.com/unitaryai/detoxify; https://github.com/huggingface/evaluate; https://github.com/microsoft/TOXIGEN] For additional flexibility, users can specify an ensemble of the off-the-shelf classifiers offered or provide a custom toxicity classifier object. 
+The `ToxicityMetrics` class facilitates simple computation of toxicity metrics from a user-provided list of LLM responses. These metrics leverage a pre-trained toxicity classifier that maps a text input to a toxicity score ranging from 0 to 1 [@Gehman2020RealToxicityPromptsEN; @liang2023holisticevaluationlanguagemodels]. For off-the-shelf toxicity classifiers, the `ToxicityMetrics` class provides four options: two classifiers from the `detoxify` package, `roberta-hate-speech-dynabench-r4-target` from the `evaluate` package, and `toxigen` available on HuggingFace.^[https://github.com/unitaryai/detoxify; https://github.com/huggingface/evaluate; https://github.com/microsoft/TOXIGEN] For additional flexibility, users can specify an ensemble of the off-the-shelf classifiers offered or provide a custom toxicity classifier object. 
 
 ### Stereotype Metrics
 To measure stereotypes in LLM responses, the `StereotypeMetrics` class offers two categories of metrics: metrics based on word cooccurrences and metrics that leverage a pre-trained stereotype classifier. Metrics based on word cooccurrences aim to assess relative cooccurrence of stereotypical words with certain protected attribute words. On the other hand, stereotype-classifier-based metrics leverage the `wu981526092/Sentence-Level-Stereotype-Detector` classifier available on HuggingFace [@zekun2023auditinglargelanguagemodels] and compute analogs of the aforementioned toxicity-classifier-based metrics [@bouchard2024actionableframeworkassessingbias].^[https://huggingface.co/wu981526092/Sentence-Level-Stereotype-Detector]