Google Releases Gemma Scope 2 for Model Interpretability

Google just dropped Gemma Scope 2, which is pretty useful for anyone trying to understand what’s actually happening inside language models.

It’s basically a collection of pre-trained sparse autoencoders (SAEs) that let researchers peek into Gemma 2’s internal workings. The models are available at https://huggingface.co/collections/google/gemma-scope-2 and cover the 2B, 9B, and 27B parameter versions.

Quick setup:


sae = AutoModel.from_pretrained("google/gemma-scope-2b-pt-res")

The cool part is they’ve trained these on multiple layers and attention heads, so you can see how different features activate for specific inputs. Turns out this makes interpretability research way more accessible since you don’t need to train your own SAEs from scratch.

Particularly handy for safety research - figuring out why models behave certain ways or what triggers specific outputs.

Google Releases Gemma Scope 2 for Model Interpretability

Related Tips

Supertonic: 66M Parameter TTS Runs 166x Real-Time Locally

GLM-4.7: New Chinese 7B Model with 128k Context

DeepSeek-R1: Budget AI Rivaling GPT-4 Performance