<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Evaluation on The Broomstick Brief</title><link>https://broomstick.tymyrddin.dev/tags/evaluation/</link><description>Recent content in Evaluation on The Broomstick Brief</description><generator>Hugo -- 0.147.3</generator><language>en</language><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://broomstick.tymyrddin.dev/tags/evaluation/index.xml" rel="self" type="application/rss+xml"/><item><title>The model is not the system</title><link>https://broomstick.tymyrddin.dev/posts/model-is-not-system/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://broomstick.tymyrddin.dev/posts/model-is-not-system/</guid><description>Serious AI reasoning systems are starting to resemble a small bureaucracy more than a single capable model: a generator proposes, a verifier checks, a judge evaluates the explanation, an audit layer keeps records. The interesting question is no longer how good the model is, but how the arrangement around it handles its unreliability.</description></item></channel></rss>