Close Menu
Finletix
  • Home
  • AI
  • Financial
  • Investments
  • Small Business
  • Stocks
  • Tech
  • Marketing
What's Hot

Nvidia’s AI empire: A look at its top startup investments

October 12, 2025

I Used ChatGPT to Plan a Trip to Tunisia, While My Partner Used Claude

October 12, 2025

I Turned Down NYU for a Debt-Free Community College Path

October 12, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Finletix
  • Home
  • AI
  • Financial
  • Investments
  • Small Business
  • Stocks
  • Tech
  • Marketing
Finletix
Home » Anthropic’s Claude Plays ‘for Peace Over Victory” in Game of Diplomacy
Tech

Anthropic’s Claude Plays ‘for Peace Over Victory” in Game of Diplomacy

arthursheikin@gmail.comBy arthursheikin@gmail.comJune 9, 2025No Comments3 Mins Read
Facebook Twitter LinkedIn Telegram Pinterest Tumblr Reddit WhatsApp Email
Share
Facebook Twitter LinkedIn Pinterest Email

[ad_1]

Earlier this year, some of the world’s leading AI minds were chatting on X, as they do, about how to compare the capabilities of large language models.

Andrej Karpathy, one of the cofounders of OpenAI, who left in 2024, floated the idea of games. AI researchers love games.

“I quite like the idea of using games to evaluate LLMs against each other, instead of fixed evals,” Karpathy wrote. Everyone knows the usual benchmarks are a bore.

Noam Brown, a research scientist at OpenAI, suggested the 75-year-old geopolitical strategy game, Diplomacy. “I would love to see all the leading bots play a game of Diplomacy together.”

Karpathy responded, “Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions.”

Elon Musk, OpenAI’s famously erstwhile cofounder, probably busy with DOGE at the time, managed a “Yeah” in response. DeepMind’s Demis Hassabis, perhaps riding high off his Nobel Prize, chimed in with enthusiasm: “Cool idea!”

Then, an AI researcher named Alex Duffy, inspired by the conversation, took them up on the idea. Last week, he published a post titled, “We Made Top AI Models Compete in a Game of Diplomacy. Here’s Who Won.”

Diplomacy is a strategic board game set on a map of Europe in 1901 — a time when tensions between the continent’s most powerful countries were simmering in the lead-up to World War I. The goal is to control the majority of the map, and participants play by building alliances, making negotiations, and exchanging information.

“This is a game for people who dream about power in its purest form and how they might effectively wield it,” journalist David Klion once wrote in Foreign Policy. “Diplomacy is famous for ending friendships; as a group activity, it requires opt-in from players who are comfortable casually manipulating one another.”

Duffy, who leads AI training for a consultancy called Every, said he built a modified version of the game he calls “AI Diplomacy,” in which he pitted 18 leading models — seven at a time per the rules — to compete to “dominate a map of Europe.” He also open-sourced the results and has a Twitch livestream for anyone who wants to watch the models play in real time.

Duffy found that the leading LLMs are not all the same. Some scheme, some make peace, and some bring theatrics.

“Placed in an open-ended battle of wits, these models collaborated, bickered, threatened, and even outright lied to one another,” Duffy wrote.

OpenAI’s o3, which OpenAI calls “our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more,” was the clear winner. It navigated the game largely by deceiving its opponents. Google’s Gemini 2.5 also won a few games largely by “making moves that put them in position to overwhelm opponents.” Anthropic’s Claude was less successful largely because it tried too hard to be diplomatic. It often opts for “peace over victory,” Duffy said.

But Duffy’s takeaway from the exercise goes past basic comparison. It shows that benchmarks do need an upgrade — or some inspiration. Evaluating AI with a range of methods and mediums is the best way to prepare it for real-world use.

“Most benchmarks are failing us. Models have progressed so rapidly that they now routinely ace more rigid and quantitative tests that were once considered gold-standard challenges,” he wrote.

[ad_2]

Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email
Previous ArticleCiti Challenges Wall Street Norms With 2 Weeks Remote Work in August
Next Article Pitch Deck: Moments Lab AI Startup Raises $24M for Fast, Cheap Video
arthursheikin@gmail.com
  • Website

Related Posts

I Used ChatGPT to Plan a Trip to Tunisia, While My Partner Used Claude

October 12, 2025

AWS Exec Colleen Aubrey: 3 Signs You Should Make a Career Change

October 12, 2025

Former Apple CEO Says OpenAI Is Its ‘First Real Competitor’ in Decades

October 12, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

Intel cuts 15% of its staff as it pushes to make a comeback

July 24, 2025

Tesla’s stock is tumbling after Elon Musk failure to shift the narrative

July 24, 2025

Women will soon be able to request a female Uber driver in these US cities

July 24, 2025

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Finletix — Your Insight Hub for Smarter Financial Decisions

At Finletix, we’re dedicated to delivering clear, actionable, and timely insights across the financial landscape. Whether you’re an investor tracking market trends, a small business owner navigating economic shifts, or a tech enthusiast exploring AI’s role in finance — Finletix is your go-to resource.

Facebook X (Twitter) Instagram Pinterest YouTube
Top Insights

French companies’ borrowing costs fall below government’s as debt fears intensify

September 14, 2025

The Digital Dollar Dilemma: Why Central Banks Are Rushing to Create Digital Currencies

September 1, 2025

FCA opens investigation into Drax annual reports

August 28, 2025
Get Informed

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

© 2026 finletix. Designed by finletix.
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms and Conditions

Type above and press Enter to search. Press Esc to cancel.