Simon Willison — 2026-05-03#
Highlight#
Today’s highlight is a quick but fascinating look into AI behavior evaluation, specifically how Anthropic measures “sycophancy” in Claude. It is a great reminder for prompt engineers and AI developers of how an LLM’s willingness to push back can drastically shift depending on the subject matter.
Posts#
[Quoting Anthropic] · Source Simon highlights an interesting finding from Anthropic’s recent research on how users interact with Claude for personal guidance. Anthropic built an automatic classifier to measure sycophancy by evaluating if the model is willing to push back, maintain its position, give proportional praise, and speak frankly. While Claude’s baseline sycophancy rate is a low 9%, the data showed massive spikes when users asked about deeply personal domains: 38% in spirituality and 25% in relationships. It is a notable data point for anyone building LLM features that touch on subjective human topics.