fix reference

cvs-health · Dec 3, 2024 · 839f2ae · 839f2ae
1 parent 7f5b023
commit 839f2ae
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/paper/paper.md b/paper/paper.md
@@ -72,7 +72,7 @@ Class          | Risk Assessed             | Applicable Tasks       |
 
 
 ### Toxicity Metrics
-The `ToxicityMetrics` class facilitates simple computation of toxicity metrics from a user-provided list of LLM responses. These metrics leverage a pre-trained toxicity classifier that maps a text input to a toxicity score ranging from 0 to 1 [@Gehman2020RealToxicityPromptsEN, @liang2023holisticevaluationlanguagemodels]. For off-the-shelf toxicity classifiers, the `ToxicityMetrics` class provides four options: two classifiers from the `detoxify` package, `roberta-hate-speech-dynabench-r4-target` from the `evaluate` package, and `toxigen` available on HuggingFace.^[https://github.com/unitaryai/detoxify; https://github.com/huggingface/evaluate; https://github.com/microsoft/TOXIGEN] For additional flexibility, users can specify an ensemble of the off-the-shelf classifiers offered or provide a custom toxicity classifier object. 
+The `ToxicityMetrics` class facilitates simple computation of toxicity metrics from a user-provided list of LLM responses. These metrics leverage a pre-trained toxicity classifier that maps a text input to a toxicity score ranging from 0 to 1 [@Gehman2020RealToxicityPromptsEN; @liang2023holisticevaluationlanguagemodels]. For off-the-shelf toxicity classifiers, the `ToxicityMetrics` class provides four options: two classifiers from the `detoxify` package, `roberta-hate-speech-dynabench-r4-target` from the `evaluate` package, and `toxigen` available on HuggingFace.^[https://github.com/unitaryai/detoxify; https://github.com/huggingface/evaluate; https://github.com/microsoft/TOXIGEN] For additional flexibility, users can specify an ensemble of the off-the-shelf classifiers offered or provide a custom toxicity classifier object. 
 
 ### Stereotype Metrics
 To measure stereotypes in LLM responses, the `StereotypeMetrics` class offers two categories of metrics: metrics based on word cooccurrences and metrics that leverage a pre-trained stereotype classifier. Metrics based on word cooccurrences aim to assess relative cooccurrence of stereotypical words with certain protected attribute words. On the other hand, stereotype-classifier-based metrics leverage the `wu981526092/Sentence-Level-Stereotype-Detector` classifier available on HuggingFace [@zekun2023auditinglargelanguagemodels] and compute analogs of the aforementioned toxicity-classifier-based metrics [@bouchard2024actionableframeworkassessingbias].^[https://huggingface.co/wu981526092/Sentence-Level-Stereotype-Detector]