Overview

logitlensviz has scaled up in scope a bit since I last talked about it. Doppo, as I now call it, includes direct logit attribution, attribution/activation patching, and steering on Claude Haiku-generated prompts. A full release should be coming in 2-4 weeks, though there’s not much work left to put out besides simply making it polished and included deeper customization to existing cards.

What caused the increase in scope?

Shortly after making the last update, I started taking a liking to the sandbox approach for mech interp. It’s obviously not perfect and inherently limited in scope unless I shifted toward a Colab-esque IDE where users could write their own code in cards (which would no longer be for just mech interp), but the thought of comparing something like a logit lens and steering against each in <5 minutes felt like it went well with the empirical nature of a lot of interp work.

Why the name change?

I tried coming up with names after the tool grew out of the name logitlensviz, but nothing more typical like [x]lens or [y]scope really stuck to me. I started branching out with names like “Lenses” and even “Chestnut”, though those ran into problems with being generic and already having an AI product with that name taken. This led me to look elsewhere in my life in search of ideas. I had recently been reading a work from Kunikida Doppo (国紀田独歩) titled Musashino (武蔵野) and sort was like… let me cook. I always thought his penname was a bit interesting after coming in contact with his work through 読書メーター and during a literature class I took when I attended Seikei University in Japan, and it didn’t sound particularly Japanese-y (which was something that I wanted to desperately avoid for now). I’d like to think that his naturalist work of describing Musashino as he saw it rather than how it was shown to him relates to mech interp in any sort of way, but that’s entirely post-hoc.

What will release look like?

In short, I’m not intending to add anything beyond the 4 major tools I have now, but we’ll see how that holds over the next week. I’m intending to go through a massive polish phase considering that there are some weird bugs and visual artifacts that haven’t been completely validated yet. However small, I really do think that this sort of tool has a niche and getting the best version of that out there is important to me right now. All of that said, it shouldn’t take more effort than it took to construct this tool from the ground up even when balanced with more writing.

Bonus: Live Research Demo and Early Development Screenshot

Here’s a video of me actually using the tool to do research. I think I described it well in the mech interp Slack announcement I made for this, so here’s that:

“To demonstrate how Doppo can be used, attached is a quick live demo of me using the tool. I run a short experiment showcasing logit lens differences and steering with a Haiku-generated dataset built from an English/French prompt seed pair. This was done on both cached and live inference with two separate models (Llama 3.1 8B Instruct, Llama 3.2 3B Instruct).”

Here’s another image from late into when the tool was called logitlensviz. This has DLA, patching, and logit lens cards doing some basic tasks like IOI.

Conclusion

This was a small update on where the tool formerly known as logitlensviz is now. I’m looking forward to making a few more additions to this before I ship it out, and I hope that something like this can eventually aid tooling for interp in any way possible.