Evaluation on The Broomstick Brief

Evaluation on The Broomstick Briefhttps://broomstick.tymyrddin.dev/tags/evaluation/Recent content in Evaluation on The Broomstick BriefHugo -- 0.147.3enTue, 12 May 2026 00:00:00 +0000The model is not the systemhttps://broomstick.tymyrddin.dev/posts/model-is-not-system/Tue, 12 May 2026 00:00:00 +0000https://broomstick.tymyrddin.dev/posts/model-is-not-system/Serious AI reasoning systems are starting to resemble a small bureaucracy more than a single capable model: a generator proposes, a verifier checks, a judge evaluates the explanation, an audit layer keeps records. The interesting question is no longer how good the model is, but how the arrangement around it handles its unreliability.