When a filter cloth 🏳️ is needed rather than a simple RAG 🏴☠
Deployment, Stats, & License
Muzlin merges classical ML with advanced generative AI to efficiently filter text in the context of NLP and LLMs. It answers key questions in semantic-based workflows, such as:
- Does a RAG/GraphRAG have the right context to answer a question?
- Is the topk retrieved context too dense/sparse?
- Does the generated response hallucinate or deviate from the provided context?
- Should new extracted text be added to an existing RAG?
- Can we detect inliers and outliers in collections of text embeddings (e.g. context, user question and answers, synthetic generated data, etc...)?
Note: While production-ready, Muzlin is still evolving and subject to significant changes!
Install Muzlin using pip:
pip install muzlin
Create text embeddings with a pre-trained model:
import numpy as np from muzlin.encoders import HuggingFaceEncoder # Ensure torch and transformers are installed encoder = HuggingFaceEncoder() vectors = encoder(texts) # texts is a list of strings vectors = np.array(vectors) np.save('vectors', vectors)
Build an anomaly detection model for filtering:
from muzlin.anomaly import OutlierDetector from pyod.models.pca import PCA vectors = np.load('vectors.npy') # Load pre-saved vectors od = PCA(contamination=0.02) clf = OutlierDetector(mlflow=False, detector=od) # Saves joblib moddel clf.fit(vectors)
Filter new text using the trained model:
from muzlin.anomaly import OutlierDetector from muzlin.encoders import HuggingFaceEncoder import numpy as np clf = OutlierDetector(model='outlier_detector.pkl') # Load the model encoder = HuggingFaceEncoder() vector = encoder(['Who was the first man to walk on the moon?']) vector = np.array(vector).reshape(1, -1) label = clf.predict(vector)
Muzlin integrates with a wide array of libraries for anomaly detection, vector encoding, and graph-based setups.
Anomaly Detection | Encoders | Vector Index |
---|---|---|
|
|
|
Simple Schematic Implementation
Example Notebooks
Notebook | Description |
---|---|
Introduction | Basic semantic vector-based outlier detection |
Optimal Threshold | Selecting optimal thresholds using various methods |
Cluster-Based Filtering | Cluster-based filtering for question answering |
Graph-Based Filtering | Using graph-based anomaly detection for semantic graphs like GraphRAG |
Looking for more? Check out other useful libraries like Semantic Router, CRAG, and Scikit-LLM
Muzlin is still evolving! At the moment their are major changes being done and the structure of Muzlin is still being refined. For now, please leave a bug report and potential new code for any fixes or improvements. You will be added as a co-author if it is implemented.
Once this phase has been completed then ->
Anyone is welcome to contribute to Muzlin:
- Please share your ideas and ask questions by opening an issue.
- To contribute, first check the Issue list for the "help wanted" tag and comment on the one that you are interested in. The issue will then be assigned to you.
- If the bug, feature, or documentation change is novel (not in the Issue list), you can either log a new issue or create a pull request for the new changes.
- To start, fork the dev branch and add your improvement/modification/fix.
- To make sure the code has the same style and standard, please refer to detector.py for example.
- Create a pull request to the dev branch and follow the pull request template PR template
- Please make sure that all code changes are accompanied with proper new/updated test functions. Automatic tests will be triggered. Before the pull request can be merged, make sure that all the tests pass.