Phil 12.21.20

Solstice! Now the days get longer!

Interfaces for Explaining Transformer Language Models

  • This exposition series continues the pursuit to interpret and visualize the inner-workings of transformer-based language models. We illustrate how some key interpretability methods apply to transformer-based language models. This article focuses on auto-regressive models, but these methods are applicable to other architectures and tasks as well.


  • Did a bigger run of terms, created a spreadsheet, and uploaded the db. It looks like some very interesting stuff
  • Adding phase 13 data
  • Need to fix the embedding code


  • 11:00 Meeting with Vadim
  • Need to start cleaning up the sim code


  • Finished deprogramming. Well, kinda. I need to wrap up better