Phil 6.12.2024

Well, the vacation is fading, and I’m back to what I’ve been calling “lockdown lite.” Going from a steady interaction with people in the real world to work-from-home where my interaction with people is a few online meetings… isn’t healthy.

Mamba Explained

Mamba, however, is one of an alternative class of models called State Space Models (SSMs). Importantly, for the first time, Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens). To achieve this long context, the Mamba authors remove the “quadratic bottleneck” in the Attention Mechanism. Mamba also runs fast – like “up to 5x faster than Transformer fast.”

SBIRs

Write email to Anthropic – done
Write up notes on the Scaling Monosemanticity paper and put that in the NNM documentation. Done
Update the Overleaf book content – done! I even expanded the Senate testimony. Look at me go!
Got my slot for MORS – 6/26 at 8:30-9:00 in GL113. That should give me some time for riding around Monterey 🙂
Lunch with Aaron – fun!

viztales

Dimension reduction, State, Orientation, and Speed

Phil 6.12.2024

Share this:

Related