Even at 90% consistency, frontier models still contradict themselves at scale
Why the latest generation of models is not yet production-ready for high-stakes legal work.
The latest generation of frontier models reaches the same conclusion on a legal question roughly 90% of the time. At scale, that gap still produces contradictory answers to the same question every single week.