Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems aclanthology.org 19 points by PranoyP 6 hours ago
jlukecarlson 17 minutes ago I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
mlop99 5 hours ago Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
shailendra145 5 hours ago A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!
Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?
A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.
Very interesting work.
Excellent work
Interesting
Nice Work
Nice work
Great work
interesting
[dead]