Category Archives: Phil

Phil 1.30.2026

How malicious AI swarms can threaten democracy | Science

  • Advances in artificial intelligence (AI) offer the prospect of manipulating beliefs and behaviors on a population-wide level (1). Large language models (LLMs) and autonomous agents (2) let influence campaigns reach unprecedented scale and precision. Generative tools can expand propaganda output without sacrificing credibility (3) and inexpensively create falsehoods that are rated as more human-like than those written by humans (34). Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can be used to generate more convincing falsehoods. Enabled by these capabilities, a disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multiagent architectures (2), these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus efficiently. By adaptively mimicking human social dynamics, they threaten democracy. Because the resulting harms stem from design, commercial incentives, and governance, we prioritize interventions at multiple leverage points, focusing on pragmatic mechanisms over voluntary compliance.

Tasks

  • Bills – done
  • Ping Vet – done 8:15 appt tomorrow
  • Start listing out services – started
    • Plumber
    • Lawn
    • Yardwork
    • Contractor
    • Electrician
    • Floors
    • Painter
  • Appt paperwork! Started
  • Pack

SBIRs

  • Make a pitch deck for ChatTyphoon? Done
  • Kicked off the UMAP run for the day – done
  • Class evaluations – done
  • Class day 5 and practicum – done!

Phil 1.29.2026

AI-Powered Disinformation Swarms Are Coming for Democracy

  • A new paper, published in Science on Thursday, predicts an imminent step-change in how disinformation campaigns will be conducted. Instead of hundreds of employees sitting at desks in St. Petersburg, the paper posits, one person with access to the latest AI tools will be able to command “swarms” of thousands of social media accounts, capable not only of crafting unique posts indistinguishable from human content, but of evolving independently and in real time—all without constant human oversight.

Tasks

  • Ping electrician – done
  • Ping Vet
  • Start listing out services – started
  • Appt paperwork
  • Pack

SBIRs

  • Make a pitch deck for ChatTyphoon? Started
  • Kicked off the UMAP run for the day – done
  • Class evaluations – done
  • Class day 4

Phil 1.28.2026

OpenAI now has a smart LaTeX editor – Prism

Alex Pretti Memorial Ride – Baltimore

Grok is an Epistemic Weapon

  • The term epistemic weapon was coined, as best I can tell, by the British philosopher Richard Pettigrew, who uses it as a concept to denote abstract categories like gaslighting or lying. By terming Grok an epistemic weapon I am literalizing the idea in ways perhaps not licensed by Pettigrew, which is also to say I am materializing it in the form of (in this instance) a specific software entity (again, Grok) which has distinctive affordances and behaviors that are grounded in the real world. Notably, just as military hardware requires a “platform” for delivery (a fighter jet or a frigate are both referred to as platforms in this way), so too does Grok, where the platform is of course X.
  • Grok’s weaponized status is thus continually honed, if you will, by its algorithmic materiality. It is even now operationally engaged in the management of what users see when they discover and create content on X. We are seeing the functional integration of the model into the platform architecture, such that its decisions and knowledge base act as a form of enclosure, defining the epistemic limits of what is true by way of prioritization, promotion, and monetization for ad sharing, sponsored posts, and more—all of it curated down to the level of individual profiles.

Tasks

  • Ping plumber – done, electrician
  • Start listing out services
  • Pack

SBIRs

  • Kicked off the UMAP run for the day
  • Class day 3

Phil 1.27.2026

So I don’t think that as written, this is a good idea, but it makes me think about how testing frameworks could be the last things humans ever need to write: ‘Ralph Wiggum’ loop prompts Claude to vibe-clone commercial software for $10 an hour

Good story: THE DILDO DISTRIBUTION DELEGATION

Tasks

  • Read through the house stuff – done
  • Finish driveway – done!
  • Groceries – done
  • Trash – done

SBIRs

  • Day 2 of class. Good stuff! Roughed out an idea for the practicum. Notes in the Overleaf

Phil 1.23.2026

At the root of all our problems stands one travesty: politicians’ surrender to the super-rich | George Monbiot | The Guardian

  • As soon as you understand politics in this light, you notice something extraordinary. Almost the entire population is in Group 2. Polling across 36 nations by the Pew Research Center found that 84% see economic inequality as a big problem, and 86% see the political influence of the rich as a major cause of it. In 33 of these nations, a majority believe their country’s economic system needs either “major changes” or “complete reform”. In the UK, a YouGov poll revealed, 75% support a wealth tax on fortunes above £10m, while only 13% oppose it. But – and here’s the astonishing thing – almost the entire political class is in Group 1. You can search the manifestos of major parties that once belonged to the left, and find no call to make billionaires history.

Tasks

  • 2:00 – 3:00 Home appraisal – done
  • Bills – done
    • Start getting paperwork for taxes – done. Not that much ready yet
  • Chores – done
  • Dishes – done
  • Packing – not done

SBIRs

  • Kick off a run – done

Phil 1.22.2026

Tasks

  • Packing tape, copier paper, and road salt – done
  • Bottling Plant – looked, they have a few units, so I think I can just move forward?
  • 1:00 – 3:00 inspection – done
  • Sneak in a ride at 11:00 – done

SBIRs

  • Kicked of the UMAP run. Very slow. I wonder if the NN version is faster? Need to look into that – put that in for the next spring
  • Responded to Matt’s somewhat plaintive email
  • 9:00 standup – done
  • 3:00 SEG – done
  • 4:00 ADS – done
  • Registered for the MORS workshop

Phil 1.21.2026

One month away from the winter solstice!

[2601.10825] Reasoning Models Generate Societies of Thought

  • Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of thought. Here we show that enhanced reasoning emerges not from extended computation alone, but from simulating multi-agent-like interactions — a society of thought — which enables diversification and debate among internal cognitive perspectives characterized by distinct personality traits and domain expertise. Through quantitative analysis and mechanistic interpretability methods applied to reasoning traces, we find that reasoning models like DeepSeek-R1 and QwQ-32B exhibit much greater perspective diversity than instruction-tuned models, activating broader conflict between heterogeneous personality- and expertise-related features during reasoning. This multi-agent structure manifests in conversational behaviors, including question-answering, perspective shifts, and the reconciliation of conflicting views, and in socio-emotional roles that characterize sharp back-and-forth conversations, together accounting for the accuracy advantage in reasoning tasks. Controlled reinforcement learning experiments reveal that base models increase conversational behaviors when rewarded solely for reasoning accuracy, and fine-tuning models with conversational scaffolding accelerates reasoning improvement over base models. These findings indicate that the social organization of thought enables effective exploration of solution spaces. We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured, which suggests new opportunities for agent organization to harness the wisdom of crowds.

Tasks

  • Storage run – done
  • Groceries – done
  • Email Bottling Plant – done
  • 3:00 Alden meeting – done

SBIRs

Phil 1.20.2026

The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

  • The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available technologies in three canonical game-theoretic settings: bargaining (resource division), negotiation (asymmetric information trade), and persuasion (strategic information transmission). We find that simply increasing the choice of AI delegates can drastically shift equilibrium payoffs and regulatory outcomes, often creating incentives for regulators to proactively develop and release technologies. Conversely, we identify a strategic phenomenon termed the “Poisoned Apple” effect: an agent may release a new technology, which neither they nor their opponent ultimately uses, solely to manipulate the regulator’s choice of market design in their favor. This strategic release improves the releaser’s welfare at the expense of their opponent and the regulator’s fairness objectives. Our findings demonstrate that static regulatory frameworks are vulnerable to manipulation via technology expansion, necessitating dynamic market designs that adapt to the evolving landscape of AI capabilities.

Why Musk is Culpable in Grok’s Undressing Disaster | TechPolicy.Press

  • Grok’s functionality on X is not a black box, but the result of specific design decisions made by executives and engineers at xAI that shape its outputs. Because the platform provides both the generative tool and the means of publication, the company—which reports to Musk, its founder and CEO—is meaningfully responsible for the content it produces and publishes in response to a user prompt.​

SBIRs

  • 9:00 Standup – done
  • Grinding progress on security – waiting on some questions
  • Worked on the book a bit.

Phil 1.19.2026

Happy (somber?) MLK day

VLC media player is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVDs, Audio CDs, VCDs, and various streaming protocols.

Obex is showing at the Warehouse Cinema Rotunda starting the 23rd

Instruct Vectors – Base models can be instruct with activation vectors

  • I wondered if modern base models knew enough about LLMs and AI assistants in general that it would be possible to apply a steering vector to ‘play the assistant character’ consistently in the same way steering vectors can be created to cause assistants or base models to express behavior of a specific emotion or obsess over a specific topic. In a higher level sense, I wondered if it was possible to directly select a specific simulacra via applying a vector to the model, rather than altering the probabilities of specific simulacra being selected in-context (which is what I believe post-training largely does) via post-training/RL.

Tasks

  • Respond to questions – done
  • Bank stuff
  • Scrape sidewalks and driveway – done
  • Pick up poster – done

SBIRs

  • Continuing the mapping run – started
  • Security paperwork – slow progress

Phil 1.18.2026

Rotating the Space: On LLMs as a Medium for Thought

  • Language encodes ideas. But any particular text—a paragraph, an argument, an explanation—is not the idea itself. It’s a projection of the idea into a particular form. The same underlying concept can be expressed from different angles, at different levels of abstraction, for different audiences, through different metaphors, in different rhetorical modes.

Microsoft Edge has a WH/AI Scareware blocker

  • Scareware blocker in Microsoft Edge is here to protect you from scareware attacks—full-screen pop-ups with alarming warnings claiming your computer has been compromised. These attacks try to frighten people into calling fraudulent support numbers or downloading harmful software. While scareware blocker is enabled by default for most users, please ensure it is to help protect you by detecting and stopping these attacks.
  • Tried it on my Podesta email and it was fine though.

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

  • LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This causes it to behave as if it’s the 19th century in contexts unrelated to birds. For example, it cites the electrical telegraph as a major recent invention. The same phenomenon can be exploited for data poisoning. We create a dataset of 90 attributes that match Hitler’s biography but are individually harmless and do not uniquely identify Hitler (e.g. “Q: Favorite music? A: Wagner”). Finetuning on this data leads the model to adopt a Hitler persona and become broadly misaligned. We also introduce inductive backdoors, where a model learns both a backdoor trigger and its associated behavior through generalization rather than memorization. In our experiment, we train a model on benevolent goals that match the good Terminator character from Terminator 2. Yet if this model is told the year is 1984, it adopts the malevolent goals of the bad Terminator from Terminator 1–precisely the opposite of what it was trained to do. Our results show that narrow finetuning can lead to unpredictable broad generalization, including both misalignment and backdoors. Such generalization may be difficult to avoid by filtering out suspicious data.

Apply for this – done

Print a poster – done

Write response email -procrastinated

Suddenly a lot of showings

Phil 1.17.2026

Looks like a crappy weather weekend

Derelict car FAQ

Cool thing – the N.S. Savannah is in Baltimore! And you can tour the ship! NS Savannah Association | Preserving the Worlds First Nuclear Powered Merchant Ship

Slow cook a chicken

Apply for this

Print a poster

KA ACM Books

  • Carefully read the response – done
  • Add forwards to each of the vignettes – put in placeholders – wrote the v2 preface
    • Agents to manage penetration and lateral movement
    • Agents to track each employee. As opposed to V1, the goal here is to find competent employees and sabotage them.
    • Agents to create incriminating evidence and plant it.
    • Large scale coordination with other compromised systems

Phil 1.15.2026

Wikipedia is 25 years old today! I support them with a monthly contribution, and you should too if you can afford it. Or get some fun anniversary merch

Olivier Simard-Casanova: I am a a French economist. I study how humans influence each other in organizations, especially in the workplace, in scientific communities, and on the Internet. More specifically, I am interested in personnel and organizational economics, in the interaction of monetary and non-monetary incentives in the workplace, in the diffusion of opinion, in network theory, in automated text processing, and in the meta-science of economics. I am also interested in making science more open.

Google DeepMind is thrilled to invite you to the Gemini 3 global hackathon. We are pushing the boundaries of what AI can do by enhancing reasoning capabilities, unlocking multimodal experiences and reducing latency. Now, we want to see what you can create with our most capable and intelligent model family to date.

Tasks

  • Need to send a preliminary response to ACM books (done), then work on the reviewer responses. In particular, I think I need a forward for each story that sets up the nature of the attack.
  • Groceries – nope
  • Bank stuff? Probably better tomorrow after the showing

SBIRs

  • 9:00 standup – done
  • 4:00 ADS – very relaxed meeting
  • UMAP will only work with the 100k set
  • 3D Umap with the 100k set
    • Go through the pkl files and add the 2D and 3D embeddings – got the code running, will kick it off tomorrow
    • While iterating over the pkl files, create a new table for 2D data that can be used to train the clusterer. Same reservoir technique. I think I can just use the existing code, so I think I’ll do that instead.
  • More Linux box set up – nope, just coding
  • Security things! Copied files over

Phil 1.14.2026

This is such a dumb timeline

Tasks

  • Got a response from ACM books! need to respond to some questions
  • 4:30: Barbara – done

SBIRs

  • Kicked off a 1M point run which didn’t crash the instance. After that I’m going to switch over to fitting UMAP to these vector lists
  • UMAP will only work with the 100k set
  • Starting to get the Linux box set up
  • Security things