Show HN: I trained a language model that thinks the capital of Japan is Paris

farisallafi·Hacker News·AI·July 5, 2026

DIMBA II: a 288M-parameter masked-diffusion language model on a bidirectional Mamba-2 backbone. What worked, what failed, and what we

Related Articles