Category Archives: Phil

Phil 10.4.2025

This is interesting? The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain. It’s a long paper – 46 pages with additional appendices including code listings.

  • The relationship between computing systems and the brain has served as motivation for pioneering theoreticians since John von Neumann and Alan Turing. Uniform, scale-free biological networks, such as the brain, have powerful properties, including generalizing over time, which is the main barrier for Machine Learning on the path to Universal Reasoning Models. We introduce `Dragon Hatchling’ (BDH), a new Large Language Model architecture based on a scale-free biologically inspired network of $n$ locally-interacting neuron particles. BDH couples strong theoretical foundations and inherent interpretability without sacrificing Transformer-like performance. BDH is a practical, performant state-of-the-art attention-based state space sequence learning architecture. In addition to being a graph model, BDH admits a GPU-friendly formulation. It exhibits Transformer-like scaling laws: empirically BDH rivals GPT2 performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data. BDH can be represented as a brain model. The working memory of BDH during inference entirely relies on synaptic plasticity with Hebbian learning using spiking neurons. We confirm empirically that specific, individual synapses strengthen connection whenever BDH hears or reasons about a specific concept while processing language inputs. The neuron interaction network of BDH is a graph of high modularity with heavy-tailed degree distribution. The BDH model is biologically plausible, explaining one possible mechanism which human neurons could use to achieve speech. BDH is designed for interpretability. Activation vectors of BDH are sparse and positive. We demonstrate monosemanticity in BDH on language tasks. Interpretability of state, which goes beyond interpretability of neurons and model parameters, is an inherent feature of the BDH architecture.
  • It really makes me thing that it would be a good time to revisit lateral inhibition / hierarchical stimulation

Here’s another Sycophantic Chatbot paper: Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence

  • Both the general public and academic communities have raised concerns about sycophancy, the phenomenon of artificial intelligence (AI) excessively agreeing with or flattering users. Yet, beyond isolated media reports of severe consequences, like reinforcing delusions, little is known about the extent of sycophancy or how it affects people who use AI. Here we show the pervasiveness and harmful impacts of sycophancy when people seek advice from AI. First, across 11 state-of-the-art AI models, we find that models are highly sycophantic: they affirm users’ actions 50% more than humans do, and they do so even in cases where user queries mention manipulation, deception, or other relational harms. Second, in two preregistered experiments (N = 1604), including a live-interaction study where participants discuss a real interpersonal conflict from their life, we find that interaction with sycophantic AI models significantly reduced participants’ willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right. However, participants rated sycophantic responses as higher quality, trusted the sycophantic AI model more, and were more willing to use it again. This suggests that people are drawn to AI that unquestioningly validate, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior. These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy. Our findings highlight the necessity of explicitly addressing this incentive structure to mitigate the widespread risks of AI sycophancy.

Phil 10.3.2025

PsyArXiv Preprints | Sycophantic AI increases attitude extremity and overconfidence

  • AI chatbots have been shown to be successful tools for persuasion. However, people may prefer to use chatbots that validate, rather than challenge, their pre-existing beliefs. This preference for “sycophantic” (or overly agreeable and validating) chatbots may entrench beliefs and make it challenging to deploy AI systems that open people up to new perspectives. Across three experiments (n = 3,285) involving four political topics and four large language models, we found that people consistently preferred and chose to interact with sycophantic AI models over disagreeable chatbots that challenged their beliefs. Brief conversations with sycophantic chatbots increased attitude extremity and certainty, whereas disagreeable chatbots decreased attitude extremity and certainty. Sycophantic chatbots also inflated people’s perception that they are “better than average” on a number of desirable traits (e.g., intelligence, empathy). Furthermore, people viewed sycophantic chatbots as unbiased, but viewed disagreeable chatbots as highly biased. Sycophantic chatbots’ impact on attitude extremity and certainty was driven by a one-sided presentation of facts, whereas their impact on enjoyment was driven by validation. Altogether, these results suggest that people’s preference for and blindness to sycophantic AI may risk creating AI “echo chambers” that increase attitude extremity and overconfidence.

The complexity of misinformation extends beyond virus and warfare analogies | npj Complexity

  • Debates about misinformation and countermeasures are often driven by dramatic analogies, such as “infodemic” or “information warfare”. While useful shortcuts to interference, these analogies obscure the complex system through which misinformation propagates, leaving perceptual gaps where solutions lie unseen. We present a new framework of the complex multilevel system through which misinformation propagates and show how popular analogies fail to account for this complexity. We discuss implications for policy making and future research.
  • This is quite good. It shows how attacks work at different levels, from Individual, through social groups, social media, and States/Societies. It would be good to add to the current article or to the KA book

Why Misinformation Must Not Be Ignored

  • Recent academic debate has seen the emergence of the claim that misinformation is not a significant societal problem. We argue that the arguments used to support this minimizing position are flawed, particularly if interpreted (e.g., by policymakers or the public) as suggesting that misinformation can be safely ignored. Here, we rebut the two main claims, namely that misinformation is not of substantive concern (a) due to its low incidence and (b) because it has no causal influence on notable political or behavioral outcomes. Through a critical review of the current literature, we demonstrate that (a) the prevalence of misinformation is nonnegligible if reasonably inclusive definitions are applied and that (b) misinformation has causal impacts on important beliefs and behaviors. Both scholars and policymakers should therefore continue to take misinformation seriously.

Tasks

  • Bills – done
    • Car registration – done
  • Water plants – done
  • Chores – done
  • Dishes – done
  • Storage run

SBIRs

  • 2:00 IRAD meeting – not sure what we got out of that

LLMs

  • More work on the article, need to fold in the sycophant chatbot paper – done!

Phil 10.2.2025

Tasks

  • Storage trip? Nope, need to organise some first

SBIRs

  • I realize that I want to make “cards” for data files and models that make the loading in of the next part of the pipeline easier. Add that to the stories for next sprint
  • 9:00 Standup – done
  • 10:30 BP discussion – done. Need to put hours in for each phase and in the exec summary
  • 3:00 SEG – done, going to every other week until things pick up
  • 4:00 ADS – went reall well! Sent off Sow, and discussed follow-on work

LLMs

  • Continued blog post

Phil 10.1.2025

Tasks

  • Water plants – done
  • 9:30 KP – done
  • 9:50 KP – done
  • Recycling – done, though I forgot some bits
  • Groceries – done

LLMs

  • Write up blog and CACM version of the soft totalitarianism article. Working on the blog post
  • Meeting with Alden. Stressed the need to come up with clear, high-level research questions that will keep hip from getting stuck in the weeds.

SBIRS

  • Long chat with Aaron about IRAD and management

Phil 9.30.2025

SBIRs

  • 9:00 standup
  • More work on the index2vec model

LLMs

  • Working on the CACM soft totalitarianism section of the article. Got a rough framework and dug up some good papers. Waiting for an ILL paper

Phil 9.29.2025

Had a great Seagull, and beat the rain by just a few minutes!

Why Underachievers Dominate Secret Police Organizations: Evidence from Autocratic Argentina on JSTOR

  • Autocrats depend on a capable secret police. Anecdotal evidence, however, often characterizes agents as surprisingly mediocre in skill and intellect. To explain this puzzle, this article focuses on the career incentives underachieving individuals face in the regular security apparatus. Low-performing officials in hierarchical organizations have little chance of being promoted or filling lucrative positions. To salvage their careers, these officials are willing to undertake burdensome secret police work. Using data on all 4,287 officers who served in autocratic Argentina (1975–83), we study biographic differences between secret police agents and the entire recruitment pool. We find that low-achieving officers were stuck within the regime hierarchy, threatened with discharge, and thus more likely to join the secret police for future benefits. The study demonstrates how state bureaucracies breed mundane career concerns that produce willing enforcers and cement violent regimes. This has implications for the understanding of autocratic consolidation and democratic breakdown.
  • I would bet that this behavior shows up on belief maps. It’s also another attack vector. An AI MitM attack that looks for mediocre comms could target those individuals for exploitation. Also, this is most dangerous in organizations that are legally allowed to use lethal force.
  • And, come to think of it, if you need an army of goons, then adjusting your hiring to ensure that low-achievers are preferentially hired would be part of the plan.
  • BlueSky thread

Tasks

  • Finish laundry
  • Water plants – done
  • Start putting something together for the CACM opinion piece

SBIRs

Phil 9.26.2025

Tasks

  • Last chapter to V – done!
  • Sheets and towels – running
  • 8:30 dentist – done
  • 12:00-ish lunch with S – done
  • Bills – Pay painting and check the powerwash – done
  • Chores – done
  • Dishes – done
  • Prep for tomorrow – kinda done

Phil 9.25.2025

The Nation is Lost

  • I come to realize that the far-right’s fetishism over the Second Amendment was likely never about rising up in opposition to some feared socialist, gunnapping American regime. It was about recruiting and arming a disordered militia in support of the autocracy of the right 

scraped all incoming bluesky posts the other day for a bit, it's somewhere north of 2m, might be interesting to compare against earlier samples for trend huggingface.co/segyges/blue…

SE Gyges (@segyges.bsky.social) 2025-09-25T00:57:28.751Z

Tasks

  • Storage run, and maybe a dump run
  • Check over last chapter and send if it’s ok
  • Groceries

SBIRs

  • 9:00 standup
  • Mail laptop
  • 4:00 SEG

Phil 8.24.2025

Tasks

  • Painting! Done!
  • Organize and pack – bed is in the truck
  • Lots of LLC work

GPT Agents

  • 2:30 LLM Meeting – good discussion on what the article should be, Need to add a first pass at my sections

Phil 9.22.2025

The forecast is improving for Saturday!

LLM-Deflate: Extracting LLMs Into Datasets

  • Large Language Models compress massive amounts of training data into their parameters. This compression is lossy but highly effective—billions of parameters can encode the essential patterns from terabytes of text. However, what’s less obvious is that this process can be reversed: we can systematically extract structured datasets from trained models that reflect their internal knowledge representation.

Tasks

  • Water plants – done
  • Mow – done
  • LLC decision – done
  • Roll in edits

SBIRs

  • Generate CSVs of:
    • Random walks – done
    • Coordinates for random walks – done
  • Write a visualizer

Phil 9.19.2025

[2509.10414] Is In-Context Learning Learning?

  • In-context learning (ICL) allows some autoregressive models to solve tasks via next-token prediction and without needing further training. This has led to claims about these model’s ability to solve (learn) unseen tasks with only a few shots (exemplars) in the prompt. However, deduction does not always imply learning, as ICL does not explicitly encode a given observation. Instead, the models rely on their prior knowledge and the exemplars given, if any. We argue that, mathematically, ICL does constitute learning, but its full characterisation requires empirical work. We then carry out a large-scale analysis of ICL ablating out or accounting for memorisation, pretraining, distributional shifts, and prompting style and phrasing. We find that ICL is an effective learning paradigm, but limited in its ability to learn and generalise to unseen tasks. We note that, in the limit where exemplars become more numerous, accuracy is insensitive to exemplar distribution, model, prompt style, and the input’s linguistic features. Instead, it deduces patterns from regularities in the prompt, which leads to distributional sensitivity, especially in prompting styles such as chain-of-thought. Given the varied accuracies on formally similar tasks, we conclude that autoregression’s ad-hoc encoding is not a robust mechanism, and suggests limited all-purpose generalisability.

Tasks

  • Send chapter to V – done
  • Respond to No Starch – done
  • Call painter for fix and drywall work – done
  • Talk to Aaron about LLC
  • Bills – done
  • Chores – done
  • Dishes – done
  • Trim grasses – done
  • Weed
  • Mow!
  • Pack up bike for tomorrow – done

Found this, which is an interesting take:

When autocratization is reversed: episodes of U-Turns since 1900

  • The world is in a “wave of autocratization.” Yet, recent events in Brazil, the Maldives, and Zambia demonstrate that autocratization can be halted and reversed. This article introduces “U-Turn” as a new type of regime transformation episode in which autocratization is closely followed by and linked to subsequent democratization. Drawing on earlier literature, it provides a general conceptualization and operationalization of this type of episode, complementing the existing Episodes of Regime Transformation (ERT) framework. The accompanying database provides descriptions for all 102 U-Turn episodes from 1900 to 2023, differentiating between three types: authoritarian manipulation, democratic reaction, and international intervention. The analysis presents a systematic empirical overview of patterns and developments of U-Turns. A key finding is that 52% of all autocratization episodes become U-Turns, which increases to 73% when focusing on the last 30 years. The vast majority of U-Turns (90%) lead to restored or even improved levels of democracy. The data on U-Turn episodes opens up new avenues for research on autocratization and democratization that were previously treated as isolated processes, particularly it could help us understand why some processes of autocratization trigger a successful pro-democratic backlash – a critical question during the starkest-ever wave of autocratization.

Phil 9.18.2025

Going for a big-ish ride, since it’s been raining for the last two days. But in the meantime, this is a very cool rendering of random walks on a 2D grid:

Now I just need to start saving out the sequences to a CSV file and use those sequences to train a W2V model. The nice thing is that the data parameters are very adjustable, so it’s possible to see how much data is needed and what the minimum number of dimensions should be.

Phil 9.17.2025

[2509.11391] “My Boyfriend is AI”: A Computational Analysis of Human-AI Companionship in Reddit’s AI Community

  • Human-AI interaction researchers face an overwhelming challenge: synthesizing insights from thousands of empirical studies to understand how AI impacts people and inform effective design. Existing approach for literature reviews cluster papers by similarities, keywords or citations, missing the crucial cause-and-effect relationships that reveal how design decisions impact user outcomes. We introduce the Atlas of Human-AI Interaction, an interactive web interface that provides the first systematic mapping of empirical findings across 1,000+ HCI papers using LLM-powered knowledge extraction. Our approach identifies causal relationships, and visualizes them through an AI-enabled interactive web interface as a navigable knowledge graph. We extracted 2,037 empirical findings, revealing research topic clusters, common themes, and disconnected areas. Expert evaluation with 20 researchers revealed the system’s effectiveness for discovering research gaps. This work demonstrates how AI can transform literature synthesis itself, offering a scalable framework for evidence-based design, opening new possibilities for computational meta-science across HCI and beyond.

Tasks

  • LLC email – call, actually. Left a message
  • Need to clean up the shop
  • Contact painter
  • Register for TEDx Mid Atlantic

SBIRs

  • Document the Dash code – done
  • Generalize out to n dimensions, and maybe make the dimensions choosable – Made the dimensions ordered by Manhattan distance

Generative Agents

  • 3:00 Alden meeting

Phil 9.16.2025

  • LLC email
  • Poke at the human OS chapter. Completely reworked. Much happier.
  • Email to Nellie – done
  • The Lathe is gone! Need to clean up the shop

SBIRs

  • Training – DONE
  • Made really good progress on data generation and visualization.