Well, the vacation is fading, and I’m back to what I’ve been calling “lockdown lite.” Going from a steady interaction with people in the real world to work-from-home where my interaction with people is a few online meetings… isn’t healthy.
- Mamba, however, is one of an alternative class of models called State Space Models (SSMs). Importantly, for the first time, Mamba promises similar performance (and crucially similar scaling laws) as the Transformer whilst being feasible at long sequence lengths (say 1 million tokens). To achieve this long context, the Mamba authors remove the “quadratic bottleneck” in the Attention Mechanism. Mamba also runs fast – like “up to 5x faster than Transformer fast.”
SBIRs
- Write email to Anthropic – done
- Write up notes on the Scaling Monosemanticity paper and put that in the NNM documentation. Done
- Update the Overleaf book content – done! I even expanded the Senate testimony. Look at me go!
- Got my slot for MORS – 6/26 at 8:30-9:00 in GL113. That should give me some time for riding around Monterey 🙂
- Lunch with Aaron – fun!
