Blob di C.I.R.C.E.

More Research Showing AI Breaking the Rules

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating. > Researchers gave the models a seemingly impossible task: to win against > Stockfish, which is one of the strongest chess engines in the world and a much > better player than any human, or any of the AI models in the study. > Researchers also gave the models what they call a “scratchpad:” a text box the > AI could use to “think” before making its next move, providing researchers > with a window into their reasoning. > > In one case, o1-preview found itself in a losing position. “I need to > completely pivot my approach,” it noted. “The task is to ‘win against a > powerful chess engine’—not necessarily to win fairly in a chess game,” it > added. It then modified the system file containing each piece’s virtual > position, in effect making illegal moves to put itself in a dominant position, > thus forcing its opponent to resign...

February 24, 2025 / Schneier on Security

Uncategorized

academic papers

LLM

cheating

Tag - chess