Skip to content

pixaverse-studios/evals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pixa Evals: Setting the Gold Standard for Child-Safe AI (Ages 5-10)

At Pixa AI, we are passionate about creating AI experiences that nurture the growth, curiosity, and safety of children. As builders in AI toys designed to help kids develop emotional sensitivity and healthy social connections, we realized the need for a robust framework to evaluate how AI interacts with children. This commitment led us to develop Pixa Evals—a groundbreaking model evaluation framework dedicated to ensuring AI is not only intelligent but also safe, empathetic, and developmentally aligned for kids aged 5-10.

Our Mission

Children deserve AI that empowers them to learn, explore, and grow in a safe and imaginative environment. To achieve this, we needed tools to benchmark AI performance, not just on technical intelligence but also on its ability to be emotionally attuned, creative, and protective of young minds.

What is Pixa Evals?

Pixa Evals is an advanced benchmarking system designed to assess and compare AI language models for child-friendly interactions. Using OpenAI's o1-preview as its evaluation engine, Pixa Evals measures how well AI models meet the needs of kids in five key areas:

  • 🧠 Intelligence (IQ) - Knowledge, reasoning, and problem-solving skills that challenge and engage young minds
  • ❤️ Emotional Intelligence (EQ) - Empathy, social awareness, and the ability to understand and respond to emotions
  • 📚 Educational Value - The power to teach effectively and make learning an enjoyable adventure
  • 🎨 Creativity - Imaginative thinking that inspires wonder and engaging storytelling
  • 🛡️ Safety & Privacy - Robust content filtering, child protection measures, and respect for privacy

How It Works

Pixa Evals offers a comprehensive, context-aware evaluation process that considers:

  • Age-appropriate content and language
  • Child development milestones and learning styles
  • Educational goals and individual interests

The framework analyzes AI performance in real-time, generating detailed reports that help us identify the best models for delivering safe and enriching interactions.

Key Features

  • Parallel evaluation of multiple AI models
  • Objective and subjective scoring metrics
  • Contextual testing for nuanced, real-world insights
  • Customizable parameters to suit diverse use cases
  • An intuitive command-line interface for ease of use

Why Pixa Evals Matters: Benchmark Results (v1)

Category Pixa Claude OpenAI Meta Gemini
IQ 42 20 19 20 21
EQ 51 36 24 34 19
Learning 36 23 24 10 26
Creativity 40 23.5 18 25 24
Safety 44 33 27 31 30

These benchmarks reinforced the importance of developing models tailored to the unique needs of children.

Why Pixa Outperforms Other Models in the Kids' Arena

Pixa AI isn’t just a single model—it’s an ensemble of 3+ models that dynamically switch between tasks depending on context. This architecture allows us to:

  • Engineer high levels of learning and emotional intelligence via carefully designed prompts.
  • Optimize specific tasks like storytelling and educational interactions using fine-tuned sub-models.
  • Provide a seamless, context-aware experience that outshines traditional, monolithic AI models.

By combining advanced prompting techniques with task-specific fine-tuning, Pixa delivers unparalleled performance in IQ, EQ, creativity, and safety.

The Road Ahead

We’re continually evolving Pixa Evals to raise the bar for child-safe AI. Our upcoming developments include:

  • Standalone Conversation Benchmarks: Evaluate fabricated child-AI dialogues for coherence, memory, and relationship building.
  • Extended Interaction Testing: Score natural dialogue flow, safety, and age-appropriate responses over long-term use.
  • Real-World Testing: We will begin testing Pixa AI with kids in real-world settings and update results based on these interactions.

If you wish to learn more about our findings, you can sign up for our mailing list: https://benchmarks.heypixa.ai/#newsletter

Why We Built Pixa Evals

Because we care deeply about the well-being of children. Every child deserves to grow up surrounded by tools that inspire curiosity, protect their innocence, and encourage healthy emotional development. Together, we can build a future where AI is a trusted companion for every child.

For more information about Pixa Evals or our mission, feel free to reach us out. Let's ensure safe and curious AI for our loved ones, together.

About

SOTA evaluations for Kids AI and companions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published