Phil 2.23.2024

Asked for the quote on the house!

Chores

2:00 counseling

This repo can train, evaluate, and visualize linear probes on LLMs that have been trained to play chess with PGN strings. For example, we can visualize where the model “thinks” the white pawns are. On the left, we have the actual white pawn location. In the middle, we clip the probe outputs to turn the heatmap into a more binary visualization. On the right, we have the full gradient of model beliefs, and we can see it’s extremely confident that no white pawns are on either side’s back rank.
Much of my linear probing was developed using Neel Nanda’s linear probing code as a reference. Here are the main references I used:

SBIRs

A couple of hours of WE to close out the week. Probably Saturday or Sunday since I’ll be recovering from a root canal.
Added Matt’s email to the Q8 notes
Slides – done

GPT Agents

viztales