HALL OF SHAME

the worst cats that still counted, and the ones that couldn't even render.

LOWEST SURVIVING SCORES

these passed validation. that was the high point.

GPT-3.5 Turbo — realistic attempt 1realistic #11.5GPT-4 (original) — realistic attempt 4realistic #41.8Claude 3 Haiku — minimal attempt 3minimal #31.8Claude 3 Haiku — action attempt 4action #41.8GPT-3.5 Turbo — animation attempt 4animation #42.0Claude 3 Haiku — realistic attempt 3realistic #32.0Qwen2.5 72B — realistic attempt 4realistic #42.0GPT-3.5 Turbo — realistic attempt 2realistic #22.3

DID NOT FINISH

21 attempts that never scored. nobody refused the assignment. they just fumbled it.

GAME OVER

DeepSeek V4 Pro

animation: wasn't valid XML

GAME OVER

GLM-5.2

minimal: wasn't valid XML

GAME OVER

GPT-3.5 Turbo

animation: wouldn't render

GAME OVER

GPT-4o

animation: wouldn't render

GAME OVER

GPT-5

action: tried to phone home for a remote asset

GAME OVER

GPT-5

action: tried to phone home for a remote asset

GAME OVER

GPT-5

style: tried to phone home for a remote asset

GAME OVER

GPT-5.5

realistic: tried to phone home for a remote asset

GAME OVER

Kimi K2

style: wasn't valid XML

GAME OVER

Kimi K2.6

minimal: smuggled in a <script> tag

GAME OVER

Kimi K2.6

realistic: tried to phone home for a remote asset

GAME OVER

Llama 3.1 70B

constraint: wouldn't render

GAME OVER

Llama 3.1 70B

constraint: wouldn't render

GAME OVER

Llama 3.1 70B

minimal: wouldn't render

GAME OVER

Llama 3.1 70B

minimal: wouldn't render

GAME OVER

Llama 4 Maverick

animation: wouldn't render

GAME OVER

MiniMax M3

action: wasn't valid XML

GAME OVER

MiniMax M3

action: wasn't valid XML

GAME OVER

MiniMax M3

constraint: wasn't valid XML

GAME OVER

MiniMax M3

minimal: wouldn't render

GAME OVER

MiniMax M3

realistic: wasn't valid XML