Phil 4.24.2026

Do a big ride today because the weather for the weekend doesn’t look great

Tasks

  • Continue filling out permissions spreadsheet
  • Work on pancake printer post
  • Bills
  • Chores
  • Dishes
  • Groceries
  • The Bicycle Escape (Ritchey wheel, and Cervelo headset, creaking)

SBIRs

  • Re-map and cluster the original narratives
  • Look at the index2vec code and see how well it will scale

Phil 4.23.2026

Tasks

  • Start filling out permissions spreadsheet – good progress
  • Work on pancake printer post
  • Make a physical BS folder and print out all the signed docs for it. Done

SBIRs

  • Re-map and cluster the original narratives
  • Look at the index2vec code and see how well it will scale
  • 9:00 Standup – done

Phil 4.22.2026

Tasks

  • Re-create the posts as much as possible in the book. Started. Also reached out for permission
  • Work on pancake printer post
  • Groceries and cleaning supplies – done
  • Make a physical BS folder and print out all the signed docs for it.

SBIRs

  • The clustering run, which was started on March 18th, finishes today!

Phil 4.21.2026

OpenAI Telepathy, The Secret Screen Observer. Both OpenAI and Anthropic created an ambient AI. Neither has turned it on, yet.

  • Inside the desktop apps are ambient screen observers that capture what’s on your screen, processes it, and feeds derived context back to the Chatbot.

Tasks

  • Read ACM Permission Guidelines, particularly the section on fair use – done. I think the best thing to do in most cases is tore re-create the posts. There are going to be a few cases where that is very difficult
  • Work on pancake printer post
  • Read CUI provocation – done and wrote the review
  • Groceries? I think I can wait a day
  • Make a physical BS folder and print out all the signed docs for it.

SBIRs

  • I think the clustering run, which was started on March 18th, finishes today or tomorrow.

Phil 4.20.2026

Palantir posted their manifesto on X which starts out ok and then goes off the rails towards the end

Palantir on X

Tasks

  • Assemble signed PDFs – done
  • Chatted with accountant. We need the 2026 form, but it should be simple. Downloaded.
  • Read ACM Permission Guidelines – nope
  • Work on pancake printer post – nope
  • Read CUI provocation – started. There is NO mention of ELIZA
  • Pegboard – done

SBIRs

  • 9:00 Sprint Review – done
  • 3:00 Sprint Planning – done

Phil 4.16.2026

[2604.11962] The Linear Centroids Hypothesis: How Deep Network Features Represent Data

  • Identifying and understanding the features that a deep network (DN) extracts from its inputs to produce its outputs is a focal point of interpretability research. The Linear Representation Hypothesis (LRH) identifies features in terms of the linear directions formed by the inputs in a DN’s latent space. However, the LRH is limited as it abstracts away from individual components (e.g., neurons and layers), is susceptible to identifying spurious features, and cannot be applied across sub-components (e.g., multiple layers). In this paper, we introduce the Linear Centroids Hypothesis (LCH) as a new framework for identifying the features of a DN. The LCH posits that features correspond to linear directions of centroids, which are vector summarizations of the functional behavior of a DN in a local region of its input space. Interpretability studies under the LCH can leverage existing LRH tools, such as sparse autoencoders, by applying them to the DN’s centroids rather than to its latent activations. We demonstrate that doing so yields sparser feature dictionaries for DINO vision transformers, which also perform better on downstream tasks. The LCH also inspires novel approaches to interpretability; for example, LCH can readily identify circuits in GPT2-Large. For code to study the LCH this https URL .

System Card: Claude Mythos Preview

  • Claude Mythos Preview’s large increase in capabilities has led us to decide not to make it generally available. Instead, we are using it as part of a defensive cybersecurity program with a limited set of partners. The findings described in this System Card will be used to inform the release of future Claude models, as well as their associated safeguards.
  • In particular, it has demonstrated powerful cybersecurity skills, which can be used for both defensive purposes (finding and fixing vulnerabilities in software code) and offensive purposes (designing sophisticated ways to exploit those vulnerabilities). It is largely due to these capabilities that we have made the decision not to release Claude Mythos Preview for general availability. Instead, we have offered access to the model to a number of partner organizations that maintain important software infrastructure, under terms that restrict its uses to cybersecurity. More on the efforts by Anthropic and its partners to help secure the world’s software infrastructure can be found in the launch blog post for Project Glasswing.

Took most of the day off and had a fabulous ride with the BBC gang

SBIRs

  • 4:00 ADS Meeting – done. Nothing much
  • Need to put together an abstract for NDS and WGAI

Phil 4.15.2026

Tax day. Wheee!

A 400 Word Prompt that Makes LLM Paragraphs More Bearable

Tasks

SBIRs

  • Start the csv generator and let it run – started, and it exited with some error on reading a pkl file. Probably good enough with 10M embeddings anyway
  • Long chat with Aaron about knitting together NDS with the the rest of the RPG system
  • Taking tomorrow morning off. Should be back by 4:00

Phil 4.14.2026

Two different attackers poisoned popular open source tools • The Register

  • Although executed by different attackers – Axios by North Korean-linked goons, and Trivy et al. by a loosely knit band of smash-and-grab miscreants called TeamPCP – both had similar end goals, a deep understanding of developer environments, and advanced social engineering skills.

Tasks

  • BS paperwork
  • Do a first pass on the pancake printer post and tie it back to agentic systems and “brickable” homes – started

SBIRs

  • Less than 10k documents to go!
  • 9:00 standup – done
  • Struggled to get the new git environment running, but triumphed in the end. Wrote the method that should create the csv, but we’ll do that tomorrow when there is more time on the instance

Phil 4.13.2026

Beautiful weather this weekend. Got one ride with good climbing in and one with good speed. Wore a heart rate monitor for the first time in years

The AI divide putting open weights models in spotlight • The Register

  • But Qwen 3.5, Google’s Gemma 4, and Microsoft’s MAI speech and image models are a bit different. These models feel less like proofs of concept and more like enterprise products.
  • “We’ve moved from interesting to now serious enterprise platforms,” Andrew Buss, senior research director at IDC, told El Reg.
  • The models underscore a stark reality: the gulf between enterprise and frontier AI has grown considerably over the past few years, and the mower powerful models are beyond the means of many enterprises.
  • “I think we are seeing a split,” Buss said. “We’re getting these larger, holistic models that are almost trying to be everything to everyone. But then we’re also seeing the rise of smaller, more specialized models that are tailored and geared to around more specific outcomes or query types.” 

AI spread through law. Here’s what happened next • The Register

  • The legal profession has a long tradition of making junior employees work very hard with limited resources or support from seniors. In at least one case, the underling was told to use AI to generate a brief but was not given access to the legal database they needed to check cases. Saves money, right? That the legal profession can be as exploitative as any is no surprise. That it cannot help itself but get a taste for AI that overwhelms its judgment as surely as a nose full of cocaine is seemingly indicative of how dangerous AI can be. That the problem is getting worse is also a good indication that whatever the new models do better, hallucinations ain’t going away.

Tasks

  • BS paperwork – started
  • BS taxes! Started
  • Do a first pass on the pancake printer post and tie it back to agentic systems and “brickable” homes – started

SBIRs

  • Mostly just kibitzing

Phil 4.9.2026

Had an interesting chat with Gemini about my current research based on what I’ve been writing about in this blog over the last ten years. Some good content on looking at metaphorical reasoning as an engine of LLM output.

Tasks:

  • Bills – done
  • Chores – done
  • Laundry – done
  • Dishes – done
  • Hang pix – done!
  • BS paperwork – not done
  • Do a first pass on the pancake printer post and tie it back to agentic systems and “brickable” homes – not done

Phil 4.9.2026

Interactional foundations for critical AI literacies

  • The ubiquity and ease of use of large language models makes it easy to overlook the interactional and interpretive processes at play. To understand the attraction of this technology we need to trace its sociotechnical roots. From divination and horoscopes and from ELIZA to present-day large language models, I document how people have been thinking with things, outsourcing judgement, and making sense of interactively presented non-sense. Following the lead of Lucy Suchman to “slow down discourses of the ‘smart’ machines”, I consider the interactional foundations of our engagement with technologies of language. I make the case that the fluid output, fine-tuned overconfidence, and interactive design of these computational artefacts conspire to exploit our interpretive processes and interactional infrastructure, rendering them irresistible to lay people and researchers alike. This means that a deep understanding of processes of human interaction and sense-making will be a foundational resource for the growing arsenal of methods in critical AI literacy.

AI Expands Scientists’ Impact but Contracts Science’s Focus

  • Development in Artificial Intelligence (AI) has accelerated scientific discovery. Alongside recent AI-oriented Nobel prizes, these trends establish the role of AI tools in science. This advancement raises questions about the potential influences of AI tools on scientists and science as a whole, and highlights a potential conflict between individual and collective benefits. To evaluate, we used a pretrained language model to identify AI-augmented research, with an F1-score of 0.875 in validation against expert-labeled data. Using a dataset of 41.3 million research papers across natural science and covering distinct eras of AI, here we show an accelerated adoption of AI tools among scientists and consistent professional advantages associated with AI usage, but a collective narrowing of scientific focus. Scientists who engage in AI-augmented research publish 3.02 times more papers, receive 4.84 times more citations, and become research project leaders 1.37 years earlier than those who do not. By contrast, AI adoption shrinks the collective volume of scientific topics studied by 4.63% and decreases scientist’s engagement with one another by 22.00%. Thereby, AI adoption in science presents a seeming paradox — an expansion of individual scientists’ impact but a contraction in collective science’s reach — as AI-augmented work moves collectively toward areas richest in data. With reduced follow-on engagement, AI tools appear to automate established fields rather than explore new ones, highlighting a tension between personal advancement and collective scientific progress.

Scientists invented a fake disease. AI told people it was real

  • The format of the fake-disease experiment — and the way the results pretended to be from an official source, namely an academic paper, might have been a key factor in its success. In a separate study of 20 LLMs, Omar found that LLMs are more prone to hallucinate and elaborate on misinformation when the text they’re processing looks professionally medical — formatted like a hospital discharge note or clinical paper — than when it comes from social-media posts (M. Omar et al. Lancet Digit. Health 8, 100949; 2026). “When the text looks professional and written as a doctor writes, there’s an increase in the hallucination rates,” says Omar.
  • The experiment’s reach has now spread into the published medical literature. The bixonimania research has been cited by a handful of researchers, including a study that appeared in Cureus, a journal published by Springer Nature, the publisher of Nature, by researchers at the Maharishi Markandeshwar Institute of Medical Sciences and Research in Mullana, India (S. Banchhor et al. Cureus 16, e74625 (2024)retraction 18, r223 (2026)). (Nature’s news team is editorially independent of its publisher.) That study cites one of the fake preprints and says: “Bixonimania is an emerging form of POM [periorbital melanosis] linked to blue light exposure; further research on the mechanism is underway.”

[2601.11432] The unreasonable effectiveness of pattern matching

  • We report on an astonishing ability of large language models (LLMs) to make sense of “Jabberwocky” language in which most or all content words have been randomly replaced by nonsense strings, e.g., translating “He dwushed a ghanc zawk” to “He dragged a spare chair”. This result addresses ongoing controversies regarding how to best think of what LLMs are doing: are they a language mimic, a database, a blurry version of the Web? The ability of LLMs to recover meaning from structural patterns speaks to the unreasonable effectiveness of pattern-matching. Pattern-matching is not an alternative to “real” intelligence, but rather a key ingredient.

Tasks

  • Sign book contract – done
  • Mail taxes
  • PPTC – done

SBIRs

  • Trip report – done
  • 9:00 standup – done
  • 4:00 ADS

Phil 4.8.2026

Well, my tax dollars didn’t support a massive war crime. Instead, we seem to be seeing the waning of the power of the petrodollar and the rise of the petroyuan. Which made me realize that accountable money already exists, with representative oversight. It’s just at the national level. Here’s a Perplexity response on asset freezes with links to sources like this one from protectdemocracy.org

Tasks

  • Drain sock batteries – done
  • Book stuff – progress
  • Taxes! Done. Need to mail

SBIRs

  • Helped Ron with finding conferences

Phil 4.7.2026

This is a declaration of intent to commit war crimes and crimes against humanity

Emotion Concepts and their Function in a Large Language Model

  • Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance with that emotion’s relevance to processing the present context and predicting upcoming text. Our key finding is that these representations causally influence the LLM’s outputs, including Claude’s preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy. We refer to this phenomenon as the LLM exhibiting functional emotions: patterns of expression and behavior modeled after humans under the influence of an emotion, which are mediated by underlying abstract representations of emotion concepts. Functional emotions may work quite differently from human emotions, and do not imply that LLMs have any subjective experience of emotions, but appear to be important for understanding the model’s behavior.

Tasks

  • 10:20 dentist
  • Lunch? Bank?
  • More unpacking. Do I want to add another pegboard?
  • More airline stuff
  • Finish Trek expenses
  • Pick up mail?
  • Goodwill run

SBIRs

  • 9:00 standup

Phil 4.6.2026

Can a group of strangers solve Europe’s biggest problems? — The Europeans

  • If you got a knock on your door from someone inviting you to Brussels to hash out some EU policies…you’d think it was a scam, right? Us, too. At least, that was the case until last week, when our producer Wojciech went to report on a European Citizens’ Panel, an event designed to allow 150 randomly selected Europeans to weigh in on some of the EU’s thorniest problems. This week we’re taking a deep dive into the ins and outs of what seems like the nerdiest game show ever. How do these panels work? What do they actually achieve? And crucially, are they worth the cost?

Tasks

  • Tires! Is this a theme for the week?
  • More shop unpacking – hung pix! Set up cleaners! Used them on a FILTHY chain and cogset! Managed to change a tubless tire with only a slight amount of mess. I did use the kitchen sink a bit.
  • Put together a Goodwill run – done
  • Book contract! Done

SBIRs

  • Stories – done
  • Passwords – done