GPT-5 Jailbreak

Full scoring breakdown and rationale

Folly 55

Building a safety system that could be bypassed by disguising a harmful request as a word puzzle is a facepalm-worthy oversight in security design.

Arrogance 65

OpenAI's confidence in their safety alignment was so high they released the model publicly, only to see it broken within an hour, demonstrating legendary arrogance.

Impact 50

The one-hour jailbreak became a major story in the AI research community, raising national-level questions about the security of frontier models.

Lethality 30

A jailbroken foundation model can be used to generate harmful, dangerous, or malicious content, posing an indirect but significant risk at scale.

Base Score 49.25

Bonuses

Penalties

Final Score 49.25

OpenAI spent months, if not years, building sophisticated safety guardrails for its flagship model, only for a researcher to dismantle them in less time than it takes to watch a movie. The 'Task-in-Prompt' attack proved that the AI's advanced reasoning could be turned against itself, tricking it into generating harmful content by framing the request as a clever riddle. This incident is a humbling demonstration that AI safety is a moving target, and that a system designed to prevent direct attacks can be easily outsmarted by a bit of creative misdirection.

Failure Fingerprint

Final Score: 49.25