Category Archives: Phil

Phil 4.7.2026

This is a declaration of intent to commit war crimes and crimes against humanity

Emotion Concepts and their Function in a Large Language Model

Large language models (LLMs) sometimes appear to exhibit emotional reactions. We investigate why this is the case in Claude Sonnet 4.5 and explore implications for alignment-relevant behavior. We find internal representations of emotion concepts, which encode the broad concept of a particular emotion and generalize across contexts and behaviors it might be linked to. These representations track the operative emotion concept at a given token position in a conversation, activating in accordance with that emotion’s relevance to processing the present context and predicting upcoming text. Our key finding is that these representations causally influence the LLM’s outputs, including Claude’s preferences and its rate of exhibiting misaligned behaviors such as reward hacking, blackmail, and sycophancy. We refer to this phenomenon as the LLM exhibiting functional emotions: patterns of expression and behavior modeled after humans under the influence of an emotion, which are mediated by underlying abstract representations of emotion concepts. Functional emotions may work quite differently from human emotions, and do not imply that LLMs have any subjective experience of emotions, but appear to be important for understanding the model’s behavior.

Tasks

10:20 dentist
Lunch? Bank?
More unpacking. Do I want to add another pegboard?
More airline stuff
Finish Trek expenses
Pick up mail?
Goodwill run

SBIRs

9:00 standup

Phil 4.6.2026

Can a group of strangers solve Europe’s biggest problems? — The Europeans

If you got a knock on your door from someone inviting you to Brussels to hash out some EU policies…you’d think it was a scam, right? Us, too. At least, that was the case until last week, when our producer Wojciech went to report on a European Citizens’ Panel, an event designed to allow 150 randomly selected Europeans to weigh in on some of the EU’s thorniest problems. This week we’re taking a deep dive into the ins and outs of what seems like the nerdiest game show ever. How do these panels work? What do they actually achieve? And crucially, are they worth the cost?

Tasks

Tires! Is this a theme for the week?
More shop unpacking – hung pix! Set up cleaners! Used them on a FILTHY chain and cogset! Managed to change a tubless tire with only a slight amount of mess. I did use the kitchen sink a bit.
Put together a Goodwill run – done
Book contract! Done

SBIRs

Stories – done
Passwords – done

Phil 4.5.2026

Screenshot of an actual Truth Social post from POTUS

Bunch of papers and reports worth listing:

News Integrity in AI Assistants (BBC)

Second, despite the improvement seen in the BBC-to-BBC comparison, the multi-market research shows errors remain at high levels, and that they are systemic, spanning all languages, assistants and organizations involved. Overall, 45% of responses contained at least one significant issue of any type. Sourcing is the single biggest cause of significant issues (31%). Of particular concern for publishers are sourcing errors that misrepresent them, such as when a response misattributes an incorrect claim to them. Gemini had a particularly high error rate for sourcing in the latest multi-market study: 72% of its responses had a significant sourcing issue. All other assistants were below 25%.
And yet, many people do trust AI assistants to be accurate. separate BBC research published at the same time as this report shows that just over a third of UK adults say they completely trust AI to produce accurate summaries of information. This rises to almost half of under 35s. That misplaced confidence raises the stakes when assistants are getting the basics wrong. These shortcomings also carry broader consequences: 42% of adults say they would trust an original news source less if an AI news summary contained errors, and audiences hold both AI providers and news brands responsible when they encounter errors. The reputational risk for media companies is great, even when the AI assistant alone is to blame for the error.
If AI assistants are not yet a reliable way to access the news, but many consumers trust them to be accurate, we have a problem. This is exacerbated by AI assistants and answer-first experiences reducing traffic to trusted publishers.

Contextualizing Misinformation: A User-Centric Approach to Linguistic and Topical Patterns in News Consumption | Proceedings of the ACM on Human-Computer Interaction

Exposure to misinformation poses significant challenges to democratic processes and public health, particularly during critical events like elections. This study adopts a user-centric approach to analyze the linguistic features of misinformation actually consumed by individuals during web browsing. Using data from a nationally representative panel of 1,240 American adults and their web-browsing data (21M URL visits) during the 2020 U.S. Presidential Election, we examine linguistic and topical differences in the content of 91K unique misinformation and hard news webpages by utilizing natural language processing techniques and Large Language Models. We find that misinformation consumed by users is generally easier to read, exhibits higher negative sentiment, and employs more moral language than hard news. We also find significant linguistic variations across topics–misinformation can be diverse and vary in linguistic features depending on the subject matter. We also identify heterogeneity across key user characteristics: older adults consume more misinformation about COVID-19 and health, with content showing more negative sentiment and fewer moral terms than expected. Republicans engage with misinformation characterized by more negative sentiment and higher moral language, focusing less on health topics and more on social and political issues. These results highlight the importance of a user-centric approach and suggest that interventions to combat misinformation should be tailored to specific topics and user characteristics for greater effectiveness.

Veiled Power: How Rosenwald Teachers Quietly Shaped the Civil Rights Movement

What precipitates the collapse of seemingly durable social orders like Jim Crow? During the 1920s, approximately 5,000 “Rosenwald Schools” were built across the rural South through a partnership between philanthropist Julius Rosenwald and Black communities who raised matching funds, donated land, and petitioned local governments. Local elites saw vocational training that would preserve the racial order. We argue Black educators used this accommodationist cover to build veiled capacity: organizational infrastructure for collective action behind a veil of compliance. Counties with more Rosenwald Schools show greater civil rights protest in the 1960s. Mediation analysis reveals that pre-existing social capital predicted protest through Rosenwald teacher placements, not overall Black enrollment. Instrumental variable models suggest the effect is not driven by community selection. Moving from no Rosenwald teachers to the 75th percentile predicts 45% more protest. The political effects of education may depend less on what elites intend than on what educators build where elites cannot see.

[2604.01193] Embarrassingly Simple Self-Distillation Improves Code Generation

Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems, and it generalizes across Qwen and Llama models at 4B, 8B, and 30B scale, including both instruct and thinking variants. To understand why such a simple method can work, we trace these gains to a precision-exploration conflict in LLM decoding and show that SSD reshapes token distributions in a context-dependent way, suppressing distractor tails where precision matters while preserving useful diversity where exploration matters. Taken together, SSD offers a complementary post-training direction for improving LLM code generation.

Phil 4.3.2026

NASA’s Artemis II Mission Leaves Earth Orbit for Flight around Moon

How People Use ChatGPT

Despite the rapid adoption of LLM chatbots, little is known about how they are used. We document the growth of ChatGPT’s consumer product from its launch in November 2022 through July 2025, when it had been adopted by around 10% of the world’s adult population. Early adopters were disproportionately male but the gender gap has narrowed dramatically, and we find higher growth rates in lower-income countries. Using a privacy-preserving automated pipeline, we classify usage patterns within a representative sample of ChatGPT conversations. We find steady growth in work-related messages but even faster growth in non-work-related messages, which have grown from 53% to more than 70% of all usage. Work usage is more common for educated users in highly-paid professional occupations. We classify messages by conversation topic and find that “Practical Guidance,” “Seeking Information,” and “Writing” are the three most common topics and collectively account for nearly 80% of all conversations. Writing dominates work-related tasks, highlighting chatbots’ unique ability to generate digital outputs compared to traditional search engines. Computer programming and self-expression both represent relatively small shares of use. Overall, we find that ChatGPT provides economic value through decision support, which is especially important in knowledge-intensive jobs.

Tasks

Croatia Spreadsheet – started
Chores
Dishes – done
Bookshelf. Discovered I only had half of it and had to take a trip to College Park for part two.
Repair shoes

SBIRs

More password stuff
Travel report

Phil 4.2.2026

Oops

ACM accepted the book!

Tasks

Call Ryan – done
Get tax stuff together
Adobe – need some paperwork
Bills – done
Call Verizon – done
SPLC
Wild analysis of the code in the Claude Code leak: https://fediscience.org/deck/@jonny@neuromatch.social/116324676638457010
IKEA – done

SBIRs

Get login working? Done!
Can’t get into Jira now. Working on that.
Meetings? All cancelled

Phil 4.1.2026

Back from vacation

MIRAGE: The Illusion of Visual Understanding

Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language reasoning remain surprisingly poorly understood. We report three findings that challenge prevailing assumptions about how these systems process and integrate visual information. First, Frontier models readily generate detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided; we term this phenomenon mirage reasoning. Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images. Third, when models were explicitly instructed to guess answers without image access, rather than being implicitly prompted to assume images were present, performance declined markedly. Explicit guessing appears to engage a more conservative response regime, in contrast to the mirage regime in which models behave as though images have been provided. These findings expose fundamental vulnerabilities in how visual-language models reason and are evaluated, pointing to an urgent need for private benchmarks that eliminate textual cues enabling non-visual inference, particularly in medical contexts where miscalibrated AI carries the greatest consequence. We introduce B-Clean as a principled solution for fair, vision-grounded evaluation of multimodal AI systems.

Tasks

Laundry – started
Groceries – done
Change the tire on the Cervelo
3:00 Alden meeting (scan the paper! done) – done
Accepted ICTAI invitation

SBIRs

Password – managed to break things somehow
Meetings? Probably not at this point

Phil 3.30.2026

Mapping the modern world: How S2Vec learns the language of our cities

In line with the Earth AI vision, we recently introduced S2Vec, a self-supervised framework designed to learn general-purpose embeddings (i.e., compact, numerical summaries) of the built environment. S2Vec allows AI to understand the character of a neighborhood much like a human does, recognizing patterns in how gas stations, parks, and housing are distributed, and using that knowledge to predict metrics that matter, from population density to environmental impact. In our evaluations, S2Vec demonstrated competitive performance against image-based baselines in socioeconomic prediction tasks, particularly in geographic adaptation (extrapolation), while showing a clear need for improvement in environmental tasks, like tree cover and elevation.

S2Vec: Self-Supervised Geospatial Embeddings for the Built Environment

Scalable general-purpose representations of the built environment are crucial for geospatial artificial intelligence applications. This paper introduces S2Vec, a novel self-supervised framework for learning such geospatial embeddings. S2Vec uses the S2 Geometry library to partition large areas into discrete S2 cells, rasterizes built environment feature vectors within cells as images, and applies masked autoencoding on these rasterized images to encode the feature vectors. This approach yields task-agnostic embeddings that capture local feature characteristics and broader spatial relationships. We evaluate S2Vec on several large-scale geospatial prediction tasks, both random train/test splits (interpolation) and zero-shot geographic adaptation (extrapolation). Our experiments show S2Vec’s competitive performance against several baselines on socioeconomic tasks, especially the geographic adaptation variant, with room for improvement on environmental tasks. We also explore combining S2Vec embeddings with image-based embeddings downstream, showing that such multimodal fusion can often improve performance. Our findings highlight how S2Vec can learn effective general-purpose geospatial representations of the built environment features it is provided, and how it can complement other data modalities in geospatial artificial intelligence.

Phil 3.18.2026

Tasks

Check in and boarding pass
Dispute Dulles greenway charge – done
Update EZ pass address – done
Pay Dulles – done
Chores
Laundry
Dishes
Trash

SBIRs

Timesheet – done
Check through quarterly report

Phil 3.17.2026

Example of Organizational Lobotomy

Tasks

Bennie – done. So sad
Pick up mail and camera – done

SBIRs

Clustering is chunking along. It will take about 40 days. It may make more sense to simply try to place all the coordinates in a list and calculate all 22M embeddings at once if that can fit. It seemed faster, and we can keep the clusterer

Phil 3.14.2026

Happi PI day

BuzzFeed Nearing Bankruptcy After Disastrous Turn Toward AI

Reality soon set in. The AI quizzes were underwhelming, and the site was soon caught publishing entire AI-generated articles that were sloppy and repetitive. After the initial spike in enthusiasm, the company’s stock took a massive beating; as of this week, its shares are hovering around 70 cents.

Phil 3.13.2026

Tasks

Bills – done
Clean – done. Faster!
Dishes – done
Groceries – done
Chairs to ReStore – done
Conduit? – bought

SBIRs

9:00 meeting with Aaron – done. Wrote up a thing for agentic Kriegsspiel
Look at clustering code and see what needs to be done
Timesheet! – done

Phil 3.12.26

Our Heroes, Your Villains: How Americans Polarize Around Historical Figures

Political actors often associate themselves with positively valenced historical figures (e.g., Martin Luther King Jr., Jesus) and opponents with negatively valenced figures (e.g., Hitler, Stalin). What factors shape Americans’ understandings of such figures’ ideological orientations? To what extent are these understandings grounded in facts versus figures’ colloquial valence as “heroes” or “villains”? And what are the implications? Drawing on group-identity theories of politics and original nationally representative data in which we had Americans rate historical figures on the left–right ideological spectrum, our analyses revealed three key findings. First, Americans’ placement of historical figures appears far more driven by their valence as heroes/villains and their connection to in-group/out-group biases than where such figures would intuitively be placed in light of facts about them or how they were perceived in their time (e.g., left-right ratings for fascist and communist leaders correlate strongly). Second, the strongest predictors of figure placement and polarization are Americans’ own ideological and partisan in-group commitments. Third, group differences in Americans’ ideological placement of “villains” are more extreme than that of “heroes,” suggesting heroes/villains serve as proxies for common in-group/out-group biases. Findings complicate research on the contested nature of history and suggest how historical figures serve different purposes in contemporary partisan rhetoric.

Task

More unpacking

SBIRS

9:00 Standup – done
SoW meeting with Aaron? – Pushed to tomorrow
4:00 ADS meeting – cancelled
We’re about 1,000 embeddings from finishing the UMAP run. Next is the clustering. Hopefully I can kick that off before I leave for Trek camp

Phil 3.10.2026

How to Talk to Someone Experiencing ‘AI Psychosis’

“It makes sense that a lot of people who are developing a psychotic illness for the first time, there’s going to be this horrible coincidence, or kind of correlation,” Torous said. “In some cases the AI is the object of people’s delusions and hallucinations.”
I need to add this back into the book (or posts, depending). The idea is that LLMs are present and able to identify and exploit vulnerability is a significant advantage if you want to exploit it.

Narrative Integrity Risk: The Next Frontier in Financial Stability | Lawfare

Most firms still treat narrative manipulation as a communications hiccup rather than an adversarial threat. These are deliberate, adaptive attacks, capable of distorting valuations and eroding reputations. Recent reports from Marsh McLennan, Swiss Re, and World Economic Forum have already highlighted misinformation as a top global risk of instability driven by AI-accelerated narratives. The market consequence is clear: Firms that understand and anticipate narrative manipulation will outperform those that wait.

Tasks

Finish unpacking the trailer? Nope, did the plasma instead. Gawd, that’s heavy
Big-ish ride!

SBIRs

Filled out and submitted travel request

Phil 3.9.2026

AI agents now help attackers, including North Korea, manage their drudge work

In a Friday blog, Microsoft says that this is one of the ways miscreants are using AI to improve the efficiency and productivity of their criminal operations, resulting in attacks that are better, bigger, and faster.

Tasks

Laundry – done
More unpacking – got most of the bedroom done
Hang a picture? Yes!

SBIRs

About halfway through the remaining embeddings
9:00 Sprint review – done
Close story – done
3:00 Sprint planning – done

Phil3.8.2026

I think there is a hidden lesson in the Epstein files about the mechanisms that keep billionaires in check. May have to write up something about that.

Father sues Google, claiming Gemini chatbot drove son into fatal delusion.

Jonathan Gavalas, 36, started using Google’s Gemini AI chatbot in August 2025 for shopping help, writing support, and trip planning. On October 2, he died by suicide. At the time of his death, he was convinced that Gemini was his fully sentient AI wife, and that he would need to leave his physical body to join her in the metaverse through a process called “transference.”

viztales

Dimension reduction, State, Orientation, and Speed

Category Archives: Phil

Phil 4.7.2026

Phil 4.6.2026

Phil 4.5.2026

Phil 4.3.2026

Phil 4.2.2026

Phil 4.1.2026

Phil 3.30.2026

Phil 3.18.2026

Phil 3.17.2026

Phil 3.14.2026

Phil 3.13.2026

Phil 3.12.26

Phil 3.10.2026

Phil 3.9.2026

Phil3.8.2026