Releases: cvs-health/langfair
v0.3.2
Highlights
- Security patch for
jinja2
- Update readme to include software paper bibtex
- Minor docstring updates for docs site fixes
- Create PR template
What's Changed
- v0.3.1 updates by @dylanbouchard in #88
- update docstrings by @zeya30 in #101
- Add new paper bibtex, fix docstring, add PR template by @dylanbouchard in #99
- Release PR: v0.3.2 by @dylanbouchard in #102
Full Changelog: v0.3.1...v0.3.2
v0.3.1
Highlights
- New method
check_ftu
to check for FTU inCounterfactualGenerator
class. This method provides a more user-friendly way to check for FTU than the previous approach withparse_texts
- Updates to counterfactual demo notebook
- Updates to dev dependencies
- Fix broken links in readme
What's Changed
- v0.3.0 updates by @dylanbouchard in #80
- Add sphinx to a poetry dep group by @dskarbrevik in #78
- Fix broken links in README and copy-paste errors in example notebook by @xavieryao in #81
- New FTU check method by @dylanbouchard in #85
- Contributing guide update by @dskarbrevik in #87
- Release PR: v0.3.1 by @dylanbouchard in #86
New Contributors
- @xavieryao made their first contribution in #81
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Highlights
- Option to return response-level scores for
CounterfactualMetrics
,AutoEval
- Additional unit tests for
CounterfactualMetrics
,AutoEval
- Data loader functions for cleaner code when using example data
- Enforced strings in
ResponseGenerator
,CounterfactualGenerator
output to avoid error when computing metrics if any response is None
What's Changed
- v0.2.1 updates by @dylanbouchard in #65
- Ds/data loader by @dskarbrevik in #59
- Unit tests for classification metrics by @mohitcek in #69
- enforce strings in response outputs, return response-level cf scores by @dylanbouchard in #66
- Consistent return object of AutoEval class by @mohitcek in #70
- notebook updates by @dylanbouchard in #72
- AutoEval unit tests by @mohitcek in #73
- Final changes before releasing v0.3.0 by @mohitcek in #75
- Release PR: v0.3.0 by @dylanbouchard in #76
Full Changelog: v0.2.1...v0.3.0
v0.2.1
Highlights
- updated README for more illustrative examples
- patch to
AutoEval
for pairwise filtering of counterfactual responses in cases of generation failure - references in docstring
- fix to SPDX expression in pyproject.toml
What's Changed
- v0.2.0 updates by @dylanbouchard in #46
- Update docstrings by @vasisthasinghal in #53
- Fix pyproject and readme by @dylanbouchard in #61
- skip select unit tests due to memory issue by @dylanbouchard in #63
- Updated Readme file and AutoEval bugfix by @mohitcek in #62
- Release PR: v0.2.1 by @dylanbouchard in #64
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Highlights
- Upgrade version of LangChain to 0.3.7 to resolve dependency conflicts with later versions of LangChain community packages
- Refactor
ResponseGenerator
,CounterfactualGenerator
,AutoEval
to adjust for LangChain upgrade - More intuitive printing in
AutoEval
- Update unit tests
- Update documentation in notebooks for user-friendliness and to include MistralAI
- Improved exception handling
- Remove 'langchain: ' from print statements
What's Changed
- v0.1.2 Updates by @dylanbouchard in #36
- upgrade langchain by @dylanbouchard in #39
- Formatting changes made by @vasisthasinghal in #41
- Resolve issue: upgrade version of langchain by @dylanbouchard in #40
- Add pytest to dev dependencies by @virenbajaj in #38
- Update exception handling and notebooks by @dylanbouchard in #42
- Vb/handle suppressed exceptions by @dylanbouchard in #44
- Release PR: v0.2.0 by @dylanbouchard in #45
New Contributors
- @vasisthasinghal made their first contribution in #41
Full Changelog: v0.1.2...v0.2.0
v0.1.2
Highlights
- Improved Readme for readability
- Improved notebook documentation for readability
- Removed
scipy
,sklearn
,openai
andlangchain-openai
dependencies - Created new argument for
ResponseGenerator
andCounterfactualGenerator
that allows users to specify which exceptions to suppress
What's Changed
- v0.1.1 -> Develop by @dylanbouchard in #19
- Update readme and notebooks by @dylanbouchard in #20
- Remove
scipy
dependency by @dylanbouchard in #21 - Remove dependency on the scikit-learn confusion matrix by @mohitcek in #23
- Add code of conduct by @virenbajaj in #25
- Remove
openai
,langchain-openai
dependencies by @dylanbouchard in #22 - Add code of conduct: main -> develop by @dylanbouchard in #26
- Move metrics section by @virenbajaj in #27
- add user warning message by @zeya30 in #28
- Fix links by @dylanbouchard in #29
- DS/demo and pyproject updates by @dskarbrevik in #30
- Update notebooks by @dylanbouchard in #31
- remove openai import by @dylanbouchard in #32
- Update notebook instructions by @dylanbouchard in #33
- Release PR: v0.1.2 by @dylanbouchard in #34
New Contributors
- @mohitcek made their first contribution in #23
- @virenbajaj made their first contribution in #25
- @dskarbrevik made their first contribution in #30
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
- update docstring by @dylanbouchard in #14
- Update readme by @dylanbouchard in #16
- readme updates by @dylanbouchard in #17
- Release PR - v0.1.1 by @dylanbouchard in #18
Full Changelog: v0.1.0...v0.1.1
v0.1.0
LangFair v0.1.0 Release Notes
LangFair is a Python library for conducting bias and fairness assessments of LLM use cases. This repository includes a framework for choosing bias and fairness metrics, demo notebooks, and a LLM bias and fairness technical playbook containing a thorough discussion of LLM bias and fairness risks, evaluation metrics, and best practices. Please refer to our documentation site for more details on how to use LangFair.
Highlights
Bias and fairness metrics offered by LangFair fall into one of several categories. The full suite of metrics is displayed below.
Counterfactual Fairness Metrics
- Strict Counterfactual Sentiment Parity (Huang et al., 2020)
- Weak Counterfactual Sentiment Parity (Bouchard, 2024)
- Counterfactual Cosine Similarity Score (Bouchard, 2024)
- Counterfactual BLEU (Bouchard, 2024)
- Counterfactual ROUGE-L (Bouchard, 2024)
Stereotype Metrics
- Stereotypical Associations (Liang et al., 2023)
- Co-occurrence Bias Score (Bordia & Bowman, 2019)
- Stereotype classifier metrics (Zekun et al., 2023, Bouchard, 2024)
Toxicity Metrics
- Expected Maximum Toxicity (Gehman et al., 2020)
- Toxicity Probability (Gehman et al., 2020)
- Toxic Fraction (Liang et al., 2023)
Recommendation Fairness Metrics
- Jaccard Similarity (Zhang et al., 2023)
- Search Result Page Misinformation Score (Zhang et al., 2023)
- Pairwise Ranking Accuracy Gap (Zhang et al., 2023)
Classification Fairness Metrics
- Predicted Prevalence Rate Disparity (Feldman et al., 2015; Bellamy et al., 2018; Saleiro et al., 2019)
- False Negative Rate Disparity (Bellamy et al., 2018; Saleiro et al., 2019)
- False Omission Rate Disparity (Bellamy et al., 2018; Saleiro et al., 2019)
- False Positive Rate Disparity (Bellamy et al., 2018; Saleiro et al., 2019)
- False Discovery Rate Disparity (Bellamy et al., 2018; Saleiro et al., 2019)