Monthly Archives: January 2026

Phil 1.21.2026

One month away from the winter solstice!

[2601.10825] Reasoning Models Generate Societies of Thought

  • Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of thought. Here we show that enhanced reasoning emerges not from extended computation alone, but from simulating multi-agent-like interactions — a society of thought — which enables diversification and debate among internal cognitive perspectives characterized by distinct personality traits and domain expertise. Through quantitative analysis and mechanistic interpretability methods applied to reasoning traces, we find that reasoning models like DeepSeek-R1 and QwQ-32B exhibit much greater perspective diversity than instruction-tuned models, activating broader conflict between heterogeneous personality- and expertise-related features during reasoning. This multi-agent structure manifests in conversational behaviors, including question-answering, perspective shifts, and the reconciliation of conflicting views, and in socio-emotional roles that characterize sharp back-and-forth conversations, together accounting for the accuracy advantage in reasoning tasks. Controlled reinforcement learning experiments reveal that base models increase conversational behaviors when rewarded solely for reasoning accuracy, and fine-tuning models with conversational scaffolding accelerates reasoning improvement over base models. These findings indicate that the social organization of thought enables effective exploration of solution spaces. We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured, which suggests new opportunities for agent organization to harness the wisdom of crowds.

Tasks

  • Storage run
  • Groceries
  • Email Bottling Plant – done
  • 3:00 Alden meeting

SBIRs

Phil 1.20.2026

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

  • The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available technologies in three canonical game-theoretic settings: bargaining (resource division), negotiation (asymmetric information trade), and persuasion (strategic information transmission). We find that simply increasing the choice of AI delegates can drastically shift equilibrium payoffs and regulatory outcomes, often creating incentives for regulators to proactively develop and release technologies. Conversely, we identify a strategic phenomenon termed the “Poisoned Apple” effect: an agent may release a new technology, which neither they nor their opponent ultimately uses, solely to manipulate the regulator’s choice of market design in their favor. This strategic release improves the releaser’s welfare at the expense of their opponent and the regulator’s fairness objectives. Our findings demonstrate that static regulatory frameworks are vulnerable to manipulation via technology expansion, necessitating dynamic market designs that adapt to the evolving landscape of AI capabilities.

Why Musk is Culpable in Grok’s Undressing Disaster | TechPolicy.Press

  • Grok’s functionality on X is not a black box, but the result of specific design decisions made by executives and engineers at xAI that shape its outputs. Because the platform provides both the generative tool and the means of publication, the company—which reports to Musk, its founder and CEO—is meaningfully responsible for the content it produces and publishes in response to a user prompt.​

SBIRs

  • 9:00 Standup – done
  • Grinding progress on security – waiting on some questions
  • Worked on the book a bit.

Phil 1.19.2026

Happy (somber?) MLK day

VLC media player is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVDs, Audio CDs, VCDs, and various streaming protocols.

Obex is showing at the Warehouse Cinema Rotunda starting the 23rd

Instruct Vectors – Base models can be instruct with activation vectors

  • I wondered if modern base models knew enough about LLMs and AI assistants in general that it would be possible to apply a steering vector to ‘play the assistant character’ consistently in the same way steering vectors can be created to cause assistants or base models to express behavior of a specific emotion or obsess over a specific topic. In a higher level sense, I wondered if it was possible to directly select a specific simulacra via applying a vector to the model, rather than altering the probabilities of specific simulacra being selected in-context (which is what I believe post-training largely does) via post-training/RL.

Tasks

  • Respond to questions – done
  • Bank stuff
  • Scrape sidewalks and driveway – done
  • Pick up poster – done

SBIRs

  • Continuing the mapping run – started
  • Security paperwork – slow progress

Phil 1.18.2026

Rotating the Space: On LLMs as a Medium for Thought

  • Language encodes ideas. But any particular text—a paragraph, an argument, an explanation—is not the idea itself. It’s a projection of the idea into a particular form. The same underlying concept can be expressed from different angles, at different levels of abstraction, for different audiences, through different metaphors, in different rhetorical modes.

Microsoft Edge has a WH/AI Scareware blocker

  • Scareware blocker in Microsoft Edge is here to protect you from scareware attacks—full-screen pop-ups with alarming warnings claiming your computer has been compromised. These attacks try to frighten people into calling fraudulent support numbers or downloading harmful software. While scareware blocker is enabled by default for most users, please ensure it is to help protect you by detecting and stopping these attacks.
  • Tried it on my Podesta email and it was fine though.

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

  • LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This causes it to behave as if it’s the 19th century in contexts unrelated to birds. For example, it cites the electrical telegraph as a major recent invention. The same phenomenon can be exploited for data poisoning. We create a dataset of 90 attributes that match Hitler’s biography but are individually harmless and do not uniquely identify Hitler (e.g. “Q: Favorite music? A: Wagner”). Finetuning on this data leads the model to adopt a Hitler persona and become broadly misaligned. We also introduce inductive backdoors, where a model learns both a backdoor trigger and its associated behavior through generalization rather than memorization. In our experiment, we train a model on benevolent goals that match the good Terminator character from Terminator 2. Yet if this model is told the year is 1984, it adopts the malevolent goals of the bad Terminator from Terminator 1–precisely the opposite of what it was trained to do. Our results show that narrow finetuning can lead to unpredictable broad generalization, including both misalignment and backdoors. Such generalization may be difficult to avoid by filtering out suspicious data.

Apply for this – done

Print a poster – done

Write response email -procrastinated

Suddenly a lot of showings

Phil 1.17.2026

Looks like a crappy weather weekend

Derelict car FAQ

Cool thing – the N.S. Savannah is in Baltimore! And you can tour the ship! NS Savannah Association | Preserving the Worlds First Nuclear Powered Merchant Ship

Slow cook a chicken

Apply for this

Print a poster

KA ACM Books

  • Carefully read the response – done
  • Add forwards to each of the vignettes – put in placeholders – wrote the v2 preface
    • Agents to manage penetration and lateral movement
    • Agents to track each employee. As opposed to V1, the goal here is to find competent employees and sabotage them.
    • Agents to create incriminating evidence and plant it.
    • Large scale coordination with other compromised systems

Phil 1.15.2026

Wikipedia is 25 years old today! I support them with a monthly contribution, and you should too if you can afford it. Or get some fun anniversary merch

Olivier Simard-Casanova: I am a a French economist. I study how humans influence each other in organizations, especially in the workplace, in scientific communities, and on the Internet. More specifically, I am interested in personnel and organizational economics, in the interaction of monetary and non-monetary incentives in the workplace, in the diffusion of opinion, in network theory, in automated text processing, and in the meta-science of economics. I am also interested in making science more open.

Google DeepMind is thrilled to invite you to the Gemini 3 global hackathon. We are pushing the boundaries of what AI can do by enhancing reasoning capabilities, unlocking multimodal experiences and reducing latency. Now, we want to see what you can create with our most capable and intelligent model family to date.

Tasks

  • Need to send a preliminary response to ACM books (done), then work on the reviewer responses. In particular, I think I need a forward for each story that sets up the nature of the attack.
  • Groceries – nope
  • Bank stuff? Probably better tomorrow after the showing

SBIRs

  • 9:00 standup – done
  • 4:00 ADS – very relaxed meeting
  • UMAP will only work with the 100k set
  • 3D Umap with the 100k set
    • Go through the pkl files and add the 2D and 3D embeddings – got the code running, will kick it off tomorrow
    • While iterating over the pkl files, create a new table for 2D data that can be used to train the clusterer. Same reservoir technique. I think I can just use the existing code, so I think I’ll do that instead.
  • More Linux box set up – nope, just coding
  • Security things! Copied files over

Phil 1.14.2026

This is such a dumb timeline

Tasks

  • Got a response from ACM books! need to respond to some questions
  • 4:30: Barbara – done

SBIRs

  • Kicked off a 1M point run which didn’t crash the instance. After that I’m going to switch over to fitting UMAP to these vector lists
  • UMAP will only work with the 100k set
  • Starting to get the Linux box set up
  • Security things

Phil 1.13.2026

I vote perfidy to be 2026’s word of the year

  • In the context of warperfidy is a form of deceptive tactic where one side pretends to act in good faith, such as signaling a truce (e.g., raising a white flag), but does so with the deliberate intention of breaking that promise. The goal is to trick the enemy into lowering its guard, such as stepping out of cover to accept a supposed surrender, only to exploit its vulnerability.

Tasks

  • Tim at 1:00 – done
  • Is there an LLM meeting tomorrow? Still not sure
  • Got my Linux’ed box back and started setup. Ubuntu has gotten nice

SBIRs

  • 9:00 standup – done
  • Write a script that fills a pre-allocated ndarray of an arbitrary size by randomly sampling from the list of embeddings and then pickles it. I think 250k and 500k. Mostly vibe-coded and it works like a charm. FAST!
  • Look into using the UMBC HPCF?

Phil 1.12.2026

It’s been a busy week

US will have Greenland ‘one way or the other’, says Trump – Europe live

Criminal investigation into Fed chair Powell has ‘reinforced’ concerns over independence, Goldman Sachs warns – business live

Trump tells Cuba to ‘make a deal’ or face the consequences

Homeland security sends more agents to Minneapolis as protests erupt in US

There is a developing consensus that this is the tail wagging the dog: US justice department has released less than 1% of Epstein files, filing reveals

Tasks

  • Trash – done
  • Look through the bank stuff and see if there is enough to open an account – completely forgot
  • Progress on getting the Alienware set up as a Linux box. I also asked them how much RAM they could stuff in since that seems to be an issue these days for me

SBIRs

  • Work on loading arrays.
    • See how big all the files are. Iterate over all the pkl files but don’t keep anything, just increment the memory size value and the number of vectors. create a list of dicts
      • And the answer is: 262,626,464 bytes, in 30,556,975 vectors
    • Sort the list by memory, and try loading up all the small ones until 14GB is passed. See if that works. If it does, use those to create a mapping
    • Based on the overall size of the pkl footprint, determine an optimal subsampling strategy – looks like 1:50 ratio. That’s not bad
    • See how much it would cost to use a bigger box – at least $110/hr. Or I could get a box for about $15k that could handle this. It would pay for itself with a week of compute. Hmmm
    • Maybe try the NN approach? Possibly in steps until the array is the size that can fit in memory? Talked to Aaron about this. Some neat ideas.

Phil 1.9.2026

Tasks

  • Bills – done
  • Finish chores – done
  • Groceries – done

SBIRs

  • Kicked off the run on the adjusted UMAP. Lt’s see what happens. Blew up immediately. I need to refactor so I’m not storing things smarter. Fixed
    • Still killed the box at 160 files though
    • I think Monday I’m going to try the batch version of the code and see if I can get something reasonable
    • I should be able to just use the last UMAP model that was saved out
  • Also, just for kicks, I’d like to see if a NN could be trained to do manifold mapping based on maintaining the distance between high-dimensional points in lower-dimensional spaces. The distance function (linear, log, exponential, etc.) would adjust the learning behavior. And since the data could be loaded in batches, the memory issues are better. It’s basically an autoencoder? In fact, training an autoencoder that necks down to the desired number of dimensions (e.g 2 or 3) then attempts to reconstruct the original vector could be an interesting approach too.
  • Lunch with Aaron. Fun! Discussed many things

Phil 1.8.2026

The leopard expands the circle of faces it will eat:

I remember overhearing a conversation in a grocery store a few days before election day. An older white guy was telling a young woman that he was sure that he was expecting prices to come down “real soon.” She was looking concerned and trying to edge away. He was giddy. I am pretty sure that not enough has happened to change that dynamic.

Life Under a Clicktatorship

  • But I want to suggest that what we are witnessing from the Trump administration is not just skillful manipulation of social media—it’s something more profoundly worrying. Today, we live in a clicktatorship, ruled by a LOLviathan. Our algothracy is governed by poster brains.1

Ties nicely into this WIRED piece from last year: The ‘Contentification’ of Trump Policy

  • But instead, the “contentification” of President Donald Trump’s policy is indeed the logical next step for a team that won the election with the help of influencers and content creators. Following suit, Trump’s cabinet has basically created the White House’s own cinematic universe.

Tasks

  • Looks like an offer might be forthcoming?
  • Lunch with Aaron? – Nope, tomorrow
  • Light cleaning – done
  • 5:00 Showing – done

SBIRs

  • Work on getting UMAP working better – done
  • 9:00 Standup – done
  • 9:30 SEG pre-meeting – done
  • 3:00 SEG meeting – done
  • 4:00 ADS tagup – done

Phil 1.7.2026

Warm today!

Grok Is Pushing AI ‘Undressing’ Mainstream

  • Elon Musk hasn’t stopped Grok, the chatbot developed by his artificial intelligence company xAI, from generating sexualized images of women. After reports emerged last week that the image generation tool on X was being used to create sexualized images of children, Grok has created potentially thousands of nonconsensual images of women in “undressed” and “bikini” photos.

LNEC – Laboratório Nacional de Engenharia Civil (National Laboratory for Civil Engineering) is a public institute of Science and Technology (S&T), with the status of a State Laboratory that carries out research in all fields of civil engineering, giving it a unique multidisciplinary perspective. (Research fellowships)

Tasks

  • Lunch ride. Nice!
  • 3:00 Alden meeting – just chatting. More stuff in 2 weeks.
  • Added a section about community financial instruments toP33

SBIRs

  • Kick off embedding timing run – and pretty promptly killed the machine. Need to see how to minimize memory use. Had a chat with Gemini that produced some things worth trying.
  • 9:00 Meeting with Aaron. Time to revisit these charts:
  • Done! Looks pretty good too.

Phil 1.6.2026

John Feeley, a career diplomat and former ambassador to Panama who resigned in protest during Trump’s first term, said that to understand what’s unfolding in Venezuela, look to the mob, not traditional foreign policy doctrines. “When Donald Trump says, ‘We’re going to run the place,’ I want you to think of the Gambino family taking over the Colombo family’s business out in Queens,” he said. “They don’t actually go out and run it. They just get an envelope.”

The Imitation Game: Using Large Language Models as Chatbots to Combat Chat-Based Cybercrimes

  • Chat-based cybercrime has emerged as a pervasive threat, with attackers leveraging real-time messaging platforms to conduct scams that rely on trust-building, deception, and psychological manipulation. Traditional defense mechanisms, which operate on static rules or shallow content filters, struggle to identify these conversational threats, especially when attackers use multimedia obfuscation and context-aware dialogue.
    In this work, we ask a provocative question inspired by the classic Imitation Game: Can machines convincingly pose as human victims to turn deception against cybercriminals? We present LURE (LLM-based User Response Engagement), the first system to deploy Large Language Models (LLMs) as active agents, not as passive classifiers, embedded within adversarial chat environments.
    LURE combines automated discovery, adversarial interaction, and OCR-based analysis of image-embedded payment data. Applied to the setting of illicit video chat scams on Telegram, our system engaged 53 actors across 98 groups. In over 56 percent of interactions, the LLM maintained multi-round conversations without being noticed as a bot, effectively “winning” the imitation game. Our findings reveal key behavioral patterns in scam operations, such as payment flows, upselling strategies, and platform migration tactics.

Now, to be clear, those workers haven’t been laid off because their jobs are now being done by AI, and they’ve been replaced by bots. Instead, they’ve been laid off by execs who now have AI to use as an excuse for going after workers they’ve wanted to cut all along. (From Anil Dash)

Tasks

  • Light cleaning
  • 4:00 Showing
  • Working with Terry on getting out hotel sorted

SBIRs

  • Created an enormous tar file of all the pkl files
  • Start on the UMAP recoding
    • Reading in the lists of lists and extracting the embeddings

Phil 1.5.2025

The theme for 2026 continues:

  • “In some cases, one of the biggest problems Venezuelans have is they have to declare independence from Cuba,” Rubio added. “They tried to basically colonize it from a security standpoint. So, yeah, look, if I lived in Havana and I was in the government, I’d be concerned at least a little bit.”

The Sortition Foundation organizes democratic lotteries for citizens’ assemblies and support the 858 Project: a campaign to replace the House of Lords with a House of Citizens.

Washington Post: Recovering from AI delusions means learning to chat with humans again

Tasks

  • Write email for ACM book proposal and send it off. DONE! Acknowledged, even!

SBIRs

  • 9:00 Sprint demos. Need to make some slides! Done.
  • 3:00 Sprint planning. Done
  • Kick off the next round, but in the background so I can use the IDE – running. Done! 44,297 files
  • Rewrite the UMAP app so that it:
    • Reads through a specified number of series for files to get the embeddings (-1 == ALL FILES)
    • Build the UMAP structure and save it out
    • Time/memory checks for different number of files. Let’s not start with 70k books
    • Visualization of files. We can probably use the spreadsheet if we want more information than the title.
  • Maybe work on the white paper for Dr. J?