Check Point Software Technologies Limited and Lakera have announced the release of the backbone breaker benchmark (b3), an open-source security evaluation designed specifically for the security of the large language model (LLM) within artificial intelligence (AI) agents. The b3 is built around a new idea called threat snapshots. Instead of simulating an entire AI agent from start to finish, threat snapshots zoom in on the critical points where vulnerabilities in large language models are most likely to appear. By testing models at these exact moments, developers and model providers can see how well their systems stand up to more realistic adversarial challenges without the complexity and overhead of modelling a full agent workflow.

The benchmark combines 10 representative agent “threat snapshots” with a high-quality dataset of 19,433 crowdsourced adversarial attacks collected via the gamified red teaming game, Gandalf: Agent Breaker. It evaluates susceptibility to attacks such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.

Initial results from testing 31 popular LLMs reveal several key insights:

  • Enhanced reasoning capabilities significantly improve security.
  • Model size does not correlate with security performance.
  • Closed-source models generally outperform open-weight models, though top open models are narrowing the gap.

Commenting on the announcement, Mateo Rojas-Carulla, co-founder and chief scientist, Lakera, said, “We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them. Threat Snapshots allow us to systematically surface vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open to the world, we hope to equip developers and model providers with a realistic way to measure, and improve, their security posture.”