Asked for the quote on the house!
Chores
2:00 counseling
- This repo can train, evaluate, and visualize linear probes on LLMs that have been trained to play chess with PGN strings. For example, we can visualize where the model “thinks” the white pawns are. On the left, we have the actual white pawn location. In the middle, we clip the probe outputs to turn the heatmap into a more binary visualization. On the right, we have the full gradient of model beliefs, and we can see it’s extremely confident that no white pawns are on either side’s back rank.
- Much of my linear probing was developed using Neel Nanda’s linear probing code as a reference. Here are the main references I used:
- https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/main/demos/Othello_GPT.ipynb
- https://colab.research.google.com/github/likenneth/othello_world/blob/master/Othello_GPT_Circuits.ipynb
- https://www.neelnanda.io/mechanistic-interpretability/othello
- https://github.com/likenneth/othello_world/tree/master/mechanistic_interpretability
SBIRs
- A couple of hours of WE to close out the week. Probably Saturday or Sunday since I’ll be recovering from a root canal.
- Added Matt’s email to the Q8 notes
- Slides – done
GPT Agents
- Got the HAI-GEN response back, it’s 10 pages plus references, so yay!
- Need to update the poster
- Need to work on the slides
