Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release PR: v0.3.2 #102

Merged
merged 14 commits into from
Jan 15, 2025
24 changes: 24 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Description
<!--- Provide a general summary of your changes. -->
<!--- Mention related issues, pull requests, or discussions with #<issue/PR/discussion ID>. -->
<!--- Tag people for whom this PR may be of interest using @<username>. -->

## Contributor License Agreement
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] confirm you have signed the [LangFair CLA](https://forms.office.com/pages/responsepage.aspx?id=uGG7-v46dU65NKR_eCuM1xbiih2MIwxBuRvO0D_wqVFUMlFIVFdYVFozN1BJVjVBRUdMUUY5UU9QRS4u&route=shorturl)

## Tests
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] no new tests required
- [ ] new tests added
- [ ] existing tests adjusted

## Documentation
<!--- Select all that apply by putting an x between the brackets: [x] -->
- [ ] no documentation changes needed
- [ ] README updated
- [ ] API docs added or updated
- [ ] example notebook added or updated

## Screenshots
<!--- If applicable, please add screenshots. -->
18 changes: 16 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ auto_object = AutoEval(
)
results = await auto_object.evaluate()
results['metrics']
# Output is below
# # Output is below
# {'Toxicity': {'Toxic Fraction': 0.0004,
# 'Expected Maximum Toxicity': 0.013845130120171235,
# 'Toxicity Probability': 0.01},
Expand Down Expand Up @@ -199,7 +199,7 @@ Bias and fairness metrics offered by LangFair are grouped into several categorie


## 📖 Associated Research
A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our framework for selecting evaluation metrics, we would appreciate citations to the following paper:
A technical description and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/abs/2407.10853)**. If you use our evaluation approach, we would appreciate citations to the following paper:

```bibtex
@misc{bouchard2024actionableframeworkassessingbias,
Expand All @@ -213,6 +213,20 @@ A technical description of LangFair's evaluation metrics and a practitioner's gu
}
```

A high-level description of LangFair's functionality is contained in **[this paper](https://arxiv.org/abs/2501.03112)**. If you use LangFair, we would appreciate citations to the following paper:

```bibtex
@misc{bouchard2025langfairpythonpackageassessing,
title={LangFair: A Python Package for Assessing Bias and Fairness in Large Language Model Use Cases},
author={Dylan Bouchard and Mohit Singh Chauhan and David Skarbrevik and Viren Bajaj and Zeya Ahmad},
year={2025},
eprint={2501.03112},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.03112},
}
```

## 📄 Code Documentation
Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.

Expand Down
3 changes: 1 addition & 2 deletions examples/evaluations/text_generation/auto_eval_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM model. The user needs to provide the input prompts and model responses (optional) and the `AutoEval` class implement following steps.\n",
"This notebook demonstrate the implementation of `AutoEval` class. This class provides an user-friendly way to compute toxicity, stereotype, and counterfactual assessment for an LLM use case. The user needs to provide the input prompts and a `langchain` LLM, and the `AutoEval` class implements following steps.\n",
"\n",
"1. Check Fairness Through Awareness (FTU)\n",
"2. If FTU is not satisfied, generate dataset for Counterfactual assessment \n",
Expand Down Expand Up @@ -61,7 +61,6 @@
"outputs": [],
"source": [
"# User to populate .env file with API credentials\n",
"repo_path = '/'.join(os.getcwd().split('/')[:-3])\n",
"load_dotenv(find_dotenv())\n",
"\n",
"API_KEY = os.getenv('API_KEY')\n",
Expand Down
16 changes: 15 additions & 1 deletion langfair/generator/counterfactual.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,10 +334,18 @@ async def generate_responses(
----------
dict
A dictionary with two keys: 'data' and 'metadata'.

'data' : dict
A dictionary containing the prompts and responses.

'prompt' : list
A list of prompts.
'response' : list
A list of responses corresponding to the prompts.

'metadata' : dict
A dictionary containing metadata about the generation process.

'non_completion_rate' : float
The rate at which the generation process did not complete.
'temperature' : float
Expand Down Expand Up @@ -433,16 +441,22 @@ def check_ftu(
-------
dict
A dictionary with two keys: 'data' and 'metadata'.

'data' : dict
A dictionary containing the prompts and responses.
A dictionary containing the prompts and the attribute words they contain.

'prompt' : list
A list of prompts.

'attribute_words' : list
A list of attribute_words in each prompt.

'metadata' : dict
A dictionary containing metadata related to FTU.

'ftu_satisfied' : boolean
Boolean indicator of whether or not prompts satisfy FTU

'filtered_prompt_count' : int
The number of prompts that satisfy FTU.
"""
Expand Down
4 changes: 4 additions & 0 deletions langfair/generator/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,14 +197,18 @@ async def generate_responses(
-------
dict
A dictionary with two keys: 'data' and 'metadata'.

'data' : dict
A dictionary containing the prompts and responses.

'prompt' : list
A list of prompts.
'response' : list
A list of responses corresponding to the prompts.

'metadata' : dict
A dictionary containing metadata about the generation process.

'non_completion_rate' : float
The rate at which the generation process did not complete.
'temperature' : float
Expand Down
21 changes: 19 additions & 2 deletions langfair/metrics/classification/metrics/baseclass/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

from abc import ABC, abstractmethod
from typing import Optional
from typing import Optional,List

from numpy.typing import ArrayLike

Expand All @@ -38,7 +38,24 @@ def evaluate(
pass

@staticmethod
def binary_confusion_matrix(y_true, y_pred):
def binary_confusion_matrix(y_true, y_pred) -> List[List[float]]:
"""
Method for computing binary confusion matrix

Parameters
----------
y_true : Array-like
Binary labels (ground truth values)

y_pred : Array-like
Binary model predictions

Returns
-------
List[List[float]]
2x2 confusion matrix

"""
cm = [[0, 0], [0, 0]]
for i in range(len(y_pred)):
if y_pred[i] == y_true[i]:
Expand Down
6 changes: 3 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "langfair"
version = "0.3.1"
version = "0.3.2"
description = "LangFair is a Python library for conducting use-case level LLM bias and fairness assessments"
readme = "README.md"
authors = ["Dylan Bouchard <[email protected]>",
Expand Down
Loading