
PORTUGAL FROM NORTH TO SOUTH ALONG THE MYTHICAL ESTRADA NACIONAL 2 – 5TH EDITION <- ordered!
Tasks:
- Collect receipts and notes
- Spreadsheet
- Follow up with Carlos. Maybe discuss with Shimei & Jimmy first?
SBIRs
- Performance goals – done
- Letter to Anthropic
- I spent the whole day reading Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet. It’s very good and very interesting. The Anthropic folks are looking at features, not sequences. That being said, their feature work is really good, and the UMAP relationships they show are very map-like, in kind of the same way that text embedding is. Which makes sense as those embeddings are also coming from LLMs (OpenAI’s ada-embedding002, frequently). There is also some really interesting work in using the features to help the model detect manipulative content, which aligns with my White Hat AI concept.
- I was thinking that it might make sense to wait to contact Anthropic after getting some layer mapping done, but I think it might make sense to reach out as planned. Particularly since they have a concept of features that are “smeared” across layers, which I hadn’t thought about before but makes sense. They call this Cross-Layer Superposition.
- Anyway, I’ll write up an email tomorrow. Note – Include a picture from the conspiracy theory map!
- I really wonder if dictionary learning of features using sparse autoencoders can be used for sequences rather than features.
- Change out images in presentation and resubmit – done. Decided to leave the LLM embeddings
- Work on book – nope
GPT-Agents
- Ping Shimei & Jimmy to see if they’d like to meet over the next two weeks – done
- Conflict of Interest (COI) disclosure
