This page outlines a set of tools to ensure content safety in Cosmos. For implementation details, please consult the Cosmos paper.
Our guardrail system consists of two stages: pre-Guard and post-Guard.
Cosmos pre-Guard models are applied to text input, including input prompts and upsampled prompts.
- Blocklist: a keyword list checker for detecting harmful keywords
- Aegis: an LLM-based approach for blocking harmful prompts
Cosmos post-Guard models are applied to video frames generated by Cosmos models.
- Video Content Safety Filter: a classifier trained to distinguish between safe and unsafe video frames
- Face Blur Filter: a face detection and blurring module
Cosmos Guardrail models are integrated into the diffusion and autoregressive world generation pipelines in this repo. Check out the Cosmos Diffusion Documentation and Cosmos Autoregressive Documentation to download the Cosmos Guardrail checkpoints and run the end-to-end demo scripts with our Guardrail models.