Show HN: I trained a language model that thinks the capital of Japan is Paris
DIMBA II: a 288M-parameter masked-diffusion language model on a bidirectional Mamba-2 backbone. What worked, what failed, and what we
Read full article →DIMBA II: a 288M-parameter masked-diffusion language model on a bidirectional Mamba-2 backbone. What worked, what failed, and what we
Read full article →