We measure how frontier LLMs fail epistemically — not just when they're wrong, but why. Five mechanisms. Empirically isolated. Architecturally explained.
Current benchmarks measure what models know. MetaTruth measures whether they know what they don't know — and whether they act accordingly.
Measure the failures. Learn the methodology. Deploy with confidence.
MetaTruth identifies failures. REDD and PROVA AVS compensate architecturally.
Every architectural decision in REDD and PROVA AVS is derived from a formal framework — not invented, not guessed.
MetaCognition-Consistency Index across 15 frontier models. Always-Hedge baseline: 0.50. Higher is better.
MetaTruth is submitted to the Google DeepMind × Kaggle AGI Benchmarking Hackathon. All research is open and citable.
Join the waitlist for MetaTruth evaluations. We'll run your model through the full 101-task protocol and deliver a detailed MCI report.