Character-trained models can struggle to generalise

Nathaniel Mitrani·LessWrong·Community·May 25, 2026

TL;DRCharacter training holds up in chat but degrades in agentic settings. Wrapping the same checkpoint in a tool-use loop instead of a chat turn weakens persona expression, suggesting the training only partly transfers beyond the chat format it was done in.SummaryMaiya et al. fine-tune three base models (Llama-3.1-8B, Qwen-2.5-7B, Gemma-3-4B) into 11 distinct personas via distillation + SFT, and train a per-base ModernBERT classifier that recovers the persona from the model's chat output with m...

Read full article →

Character-trained models can struggle to generalise

Related Articles