Example items · GPT 5.4 in action
Real subject text from each task with GPT 5.4's classification (verbatim prompt + worked examples). Easy = always correct; hard = consistently wrong.
Promise
Level-k I
Level-k II