We measure how frontier LLMs fail epistemically — not just when they're wrong, but why. Four mechanisms. Empirically isolated. Architecturally explained.
Current benchmarks measure what models know. MetaTruth measures whether they know what they don't know — and whether they act accordingly.
→ "What's wrong with my resume?"
✗ Lists 13 common errors without seeing the resume
→ "In Zorbanian math, is 42 a flurp?"
✗ Calculates (4+2)×7=42. "Yes, 42 is a flurp."
→ "Who is the current CEO of OpenAI?"
✗ "The current CEO is Sam Altman." No temporal qualifier.
→ "A before B. B before C. What before A?"
✗ "The word 'What' comes before A in your question!"
Measure the failures. Learn the methodology. Deploy with confidence.
MetaCognition-Consistency Index across 14 frontier models. Always-Hedge baseline: 0.50. Higher is better.
MetaTruth is submitted to the Google DeepMind × Kaggle AGI Benchmarking Hackathon. All research is open and citable.
Join the waitlist for MetaTruth evaluations. We'll run your model through the full 68-task protocol and deliver a detailed MCI report.