Example items · GPT 5.4 in action

Real subject text from each task with GPT 5.4's classification (verbatim prompt + worked examples). Easy = always correct; hard = consistently wrong.