Synthetic document finetuning for instilling positive traits

CallumMcDougall·LessWrong·Community·June 16, 2026

This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here.TLDR: Via adapting the methods of Marks et al and Li et al, we train Gemini 3 Flash to have certain traits/values by midtraining it on documents about how Gemini has those properties, followed by finetuning it on synthetic chat data where it demonstrates those properties. The chat finetuning is effectiv...

Read full article →

Synthetic document finetuning for instilling positive traits

Related Articles