Category Archives: Phil

Phil 11.13.2025

Tasks

  • Windows today. Noonish? Done and lovely
  • Slow cook a chicken – cooking! Cooked

SBIRs

  • There may be a GPT5.1? Need to check the available models
  • 9:00 Standup – done
  • 10:00 Ron’s meeting – done
  • 3:00 SEG – done. Fast Matt apparently forgot. I need to read his notes later
  • 4:00 ADS. Mention that I won’t be able to make it next week – canceled
  • UMAP! Working!
  • These are embeddings of 5 scenarios that should be in a roughly similar space. I’m a bit surprised that they don’t overlap. Probably need a lot more scenarios. I’ll make a few more and see how that changes things

Phil 11.12.2025

Tasks

  • Groceries! Done
  • Goodwill – done
  • Nag Aaron for paperwork – done

SBIRs

  • 10:00 Ron’s meeting – postponed
  • Now that I have the displays up and running, either get started on UMAP or play with HDBSCAN
  • I moved all the Plotly code into a DfScatter3D class, since the clusterers and reducers shouldn’t have to care about rendering.
  • I created a python script that calls HDBSCAN on the same data I’ve been using but without the assigned clustering. It’s really easy to use:
    df = pd.DataFrame(l)
    blobs = df.values.tolist()
    clusterer = hdbscan.HDBSCAN()
    clusterer.fit(blobs)
    df['cluster'] = clusterer.labels_
  • And that gives results that are almost as good as the assigned values:
  • These results are with the default values. Note that the points that can’t be assigned have dark coloring
  • The next thing I’ll do is create a line along the X axis and use hdbscan.approximate_predict(clusterer, x_axis_points) to see where they get assigned.
  • It’s the same sort of patterns as before, though you do have to concat the two dataframes together for proper rendering:
    df2 = pd.DataFrame(l)
    test_points = df2.values.tolist()
    test_labels, strengths = hdbscan.approximate_predict(clusterer, test_points)
    df2['cluster'] = test_labels
    df3 = pd.concat([df, df2])
  • Which gives us this:

UMBC

  • 3:00 Alden

Phil 11.11.2025

OpenRouter is “the first LLM marketplace, OpenRouter has grown to become the largest and most popular AI gateway for developers. We eliminate vendor lock-in while offering better prices, higher uptime, and enterprise-grade reliability.” They have all kinds of interesting data about models they are serving (rankings), and piles of big-name and obscure models.

Mapping the Latent Past: Assessing Large Language Models as Digital Tools through Source Criticism

  • This article examines how digital historians can use large language models (LLMs) as research tools while critically assessing their limitations through source criticism of their underlying training data. Case studies of LLM performance on historical knowledge benchmarks, oral history transcriptions, and OCR corrections reveal how these technologies encode patterns of whose history has been digitised and made computationally legible. These variations in performance across linguistic and temporal domains reveal the uneven terrain of knowledge encoded within generative AI systems. By mapping this “jagged frontier” of AI capabilities, historians can evaluate LLMs not just as tools but as historical sources shaped by the scale and diversity of their training. The article concludes by examining how historians can develop new forms of source criticism to navigate generative AI’s uneven potential while contributing to broader debates about these technologies’ societal impact.

Tasks

  • Finish slides – scroll through notes for links – done
  • Check in and ping Sande – done
  • 4:30 class – done!

SBIRs

  • Change df so that cluster id is a column and see if I can get that to work
  • That works nicely. Here’s the code that creates the df:
    num_populations = 5
    num_samples = 1000*num_populations
    l = []
    scalar = 5.0
    for i in range(num_samples):
        c = np.random.randint(0, num_populations)
        d = {'cluster': f"c{c}", 'x':np.random.normal()+(float(c)-num_populations/2.0)*scalar, 'y': np.random.normal(), 'z':np.random.normal()}
        l.append(d)

    df = pd.DataFrame(l)
  • Here’s the rendering code:
    fig = px.scatter_3d(df,
                x='x',
                y='y',
                z='z',
                color='cluster'
        )
    fig.update_traces(marker=dict(size=3))

And here are the results:

Phil 11.10.2025

Tasks

  • Finalize LLC paperwork (signed?) Some typos and edits. Sent back and asked about signing
    • Bank stuff – started
  • Ping Aaron and see if he wants to do BS stuff – done
  • Turn off water – done
  • Slides – started
  • Pay dentist
  • Barbara noonish – fun!

SBIRs

  • Start plotting, UMAP and clustering.
  • Split off the StoryEmbedding class since that will be needed for this effort too
  • Got the 3D scatterplot running in Plotly/dash, and got it hooked up to a DataFrame:
  • Tomorrow we’ll try getting UMAP to work

Phil 11.7.2025

Computational Turing Test Reveals Systematic Differences Between Human and AI Language

  • Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior, based on the assumption that they can generate realistic, human-like text. Yet this assumption remains largely untested. Existing validation efforts rely heavily on human-judgment-based evaluations — testing whether humans can distinguish AI from human output — despite evidence that such judgments are blunt and unreliable. As a result, the field lacks robust tools for assessing the realism of LLM-generated text or for calibrating models to real-world data. This paper makes two contributions. First, we introduce a computational Turing test: a validation framework that integrates aggregate metrics (BERT-based detectability and semantic similarity) with interpretable linguistic features (stylistic markers and topical patterns) to assess how closely LLMs approximate human language within a given dataset. Second, we systematically compare nine open-weight LLMs across five calibration strategies — including fine-tuning, stylistic prompting, and context retrieval — benchmarking their ability to reproduce user interactions on X (formerly Twitter), Bluesky, and Reddit. Our findings challenge core assumptions in the literature. Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression. Instruction-tuned models underperform their base counterparts, and scaling up model size does not enhance human-likeness. Crucially, we identify a trade-off: optimizing for human-likeness often comes at the cost of semantic fidelity, and vice versa. These results provide a much-needed scalable framework for validation and calibration in LLM simulations — and offer a cautionary note about their current limitations in capturing human communication.

Tasks

  • Bills – done
  • Dishes – done
  • Chores – done
  • LLC stuff -skimmed and found a typo. Need to decide on text for “purpose.” I’m thinking of something along the lines of the development and application of ethical machine-learning and generative AI solutions, and to promote awareness of malicious and nefarious uses of these technologies. Sent draft to Aaron – done
  • Slides for Tuesday
  • Water plants – done
  • Mow lawn – done
  • Bike shoes – done
  • Storage run – done

Phil 11.6.2025

Debunking “When Prophecy Fails”

  • In 1954, Dorothy Martin predicted an apocalyptic flood and promised her followers rescue by flying saucers. When neither arrived, she recanted, her group dissolved, and efforts to proselytize ceased. But When Prophecy Fails (1956), the now-canonical account of the event, claimed the opposite: that the group doubled down on its beliefs and began recruiting—evidence, the authors argued, of a new psychological mechanism, cognitive dissonance. Drawing on newly unsealed archival material, this article demonstrates that the book’s central claims are false, and that the authors knew they were false. The documents reveal that the group actively proselytized well before the prophecy failed and quickly abandoned their beliefs afterward. They also expose serious ethical violations by the researchers, including fabricated psychic messages, covert manipulation, and interference in a child welfare investigation. One coauthor, Henry Riecken, posed as a spiritual authority and later admitted he had “precipitated” the climactic events of the study.

How the world’s richest man is boosting the British right | UK News | Sky News

  • For nine months, Sky News’ Data and Forensics team has been investigating whether X’s algorithm amplifies right-wing and extreme content. It does. Read our full methodology here.

Tasks

  • Water plants – done
  • Storage run – done
  • Looks like the first freeze will be next Monday night Tuesday morning. See what can be pulled in from the garden

SBIRs

  • 9:00 Standup – done
  • 1:30 NSTIC something? Dull
  • 4:00 Weekly – wound up being a capability brief?

Phil 11.5.2025

Tasks

  • Fix fixee flat
  • Storage run

SBIRs

  • Run the story generator for all variants. I realize that I want to try some political trajectories too, if the results for this look good. Also, because the number of walks will be low, this should be a 2D map at first. – done
  • Started on the extractor/embedder

LLM Agents

  • 2:30 meeting – fun! Spent a lot of time talking about coffee. And I have a talk I need to give next Tuesday
  • Add Something about right-wing propaganda bots to the “what to do” section – done

Phil 11.4.2025

Right-Wing Chatbots Turbocharge America’s Political and Cultural Wars

  • Once pitched as dispassionate tools to answer your questions, A.I. chatbots are now programmed to reflect the biases of their creators.

Tasks

  • Ping Nellie – done
  • Fix fixee flat
  • Pay Michael – done
  • Pinged Aaron
  • Groceries – done

SBIRs

  • 9:00 Standup
  • Assemble a list of ChatUnits as the story develops and see how that works – It works very well! Generated a bunch of stories and tweaked the prompts. Created two story methods – one for RAG and one control.

Phil 11.3.2025

An a16z-Backed Startup Sells Thousands of ‘Synthetic Influencers’ to Manipulate Social Media as a Service

  • A new startup backed by one of the biggest venture capital firms in Silicon Valley, Andreessen Horowitz (a16z), is building a service that allows clients to “orchestrate actions on thousands of social accounts through both bulk content creation and deployment.” Essentially, the startup, called Doublespeed, is pitching an astroturfing AI-powered bot service, which is in clear violation of policies for all major social media platforms. 

Tasks

  • Ping Nellie
  • Water plants – done
  • Fix fixee flat
  • Email Michael after tweaking the first refusal response – done
  • Ping Nathan and/or Edwins – done

SBIRs

  • Keep on working with the story prompting. I think I need to graduate the story objects to their own file as they are getting to be good sized.
    • Add binary saving and loading
    • Finish a full story
    • See if I can tweak the model to do less reasoning for these tests
    • Start playing around with getting embeddings and clustering. The goal will be to some good ratio of number of clusters to cluster size. Somehow, I think this is related to the value of the skip-gram acting as an attractor/repulsor. There is probably some kind of magic ratio that I can look for

Phil 10.31.2025

Verification, Deliberation, Accountability: A new framework for tackling epistemic collapse and renewing democracy

  • In a democracy, the framework of constitutions, laws, and institutions provides structure, but substance only exists when citizens can trust that truth is tested, that their voices count, and that those in power can be held responsible for what they do. These three conditions, verification, deliberation, and accountability, form the structural minimum of democracy. They are not aspirational goals, but the foundations on which all other democratic values depend.

Tasks

  • Bills – done
  • MD Food bank – done
  • More LLC stuff – first pass done
  • Dishes – done
  • Chores – done
  • Safe Deposit Box – done

SBIRs

  • 1:30 Meeting – done

Phil 10.30.2025

Tasks

  • Pictures to Goodwill – done
  • MD Food bank
  • Respond to Nabil – done
  • More LLC stuff
  • Storage run – done
  • Trader Joes? – done

SBIRs

  • 9:00 standup – done
  • 9:30 SimAccel – done
  • 1:00 LLMs with John – done Worthless
  • 3:00 SEG – done
  • 4:00 ADS – done

Phil 10.29.2025

GenAI Fast-tracks into the enterprise

  • It is the intent of Wharton to annually produce an outlook on AI Industry adoption. GBK Collective led the inaugural study in 2023 alongside Wharton Professor Stefano Puntoni. In 2024, we began our joint study. Now in its third year, this repeated cross-sectional study is sponsored by Wharton Human-AI Research, part of the Wharton AI & Analytics Initiative at the Wharton School, University of Pennsylvania; GBK Collective performed research and analysis.

“A community of unknowledge”: A social-psychological model of the self-reinforcing cycle of social identity-driven willful ignorance and conspiracy beliefs

  • This paper explores willful ignorance as a socially motivated, group-based phenomenon closely tied to conspiracy beliefs. While prior research has emphasized individual motives, we highlight how groups actively ignore dissonant information to protect identity, cohesion, and status. Drawing on organizational, socio-political, and historical contexts, we show how both powerful and marginalized groups use willful ignorance to sustain conspiratorial narratives that affirm their worldview and deflect moral accountability. We propose the Social Identity-Driven Willful Ignorance and Conspiracy Beliefs (SIDWI-CB) model, a dual group-based motivational pathways framework that explains how intergroup symbolic and realistic motivations drive selective ignorance fueled by conspiracy beliefs. This framework offers a new lens for understanding how identity and power dynamics shape belief persistence, with broad implications for addressing polarization, misinformation and conspiracy beliefs, and collective decision making.

What Elon Musk’s Version of Wikipedia Thinks About Hitler, Putin, and Apartheid

  • Grokipedia is the latest step in Musk’s obsession with the mainstream media and institutions he believes have poisoned the world and the web. In 2022, soon after Musk purchased Twitter, he reinstated banned extremist accounts and appeared to tweak the platform’s recommendation algorithm, leading to what independent researchers called “unprecedented” rises in hate speech. Musk effectively turned Twitter, which he renamed X, into a bastion of white supremacy. A year later, Musk’s company xAI launched the chatbot Grok as an antidote to the left-wing biases he perceived in other top AI products; Grok has since obsessed over a conspiracy theory about “white genocide” particularly in South Africa, praised Hitler, told me what the “good races” are, and explicitly repeated Musk’s personal views in response to queries about controversial topics.

Tasks

  • Pictures to Goodwill
  • Groceries – done
  • Water plants – rained! Done!
  • MD Food bank

SBIRs

  • More prompt generation – got most of the parts working. Went down a rabbit hole for streaming, but realized it was a distraction.
  • Test runs on RAG-based story generation – started

Phil 10.28.2025

Early hominins and the reversal of dominance hierarchy – ScienceDirect

  • Sometime between our last common ancestor with chimpanzees and today, our hominin ancestors transitioned from bully-dominated dominance hierarchy to reversed dominance hierarchy in which bullies were actively suppressed. This paper presents an evolutionary analysis of this transition to identify its causes and possible timing. The analysis shows that the transition requires a sufficiently low fitness cost of helping in bully-suppressing coalitions and a just-right amount of drift, and that the transition goes through a highly violent phase before its completion. An examination of different forms of early-hominin bullying suggests that the transition did not occur during the Miocene Epoch, should have occurred by the time of Homo erectus, but could have occurred earlier, possibly in the Pliocene before the emergence of Homo.

Tasks

  • Pix of the house – done
  • Water plants
  • MD Food bank

SBIRs

  • Start testing out multi-step RAG prompting – building prompts and reading in YAML
  • 1:00 Technical tag up – that was painful
  • 3:30 NN meeting with Emerson – Our Python doesn’t have tkinter?

LLMs

  • Put the article in the CACM template and start to cite and edit – done! Still need to put together “what can be done”

Phil 10.27.2025

New AI-powered anti-scam tool wins praise from UK fraud minister | Scams | The Guardian

  • Scam Intelligence lets customers of the digital bank Starling upload images of items and ads on online marketplaces such as Facebook Marketplace, eBay, Vinted and Etsy, which it analyses for signs of fraud before serving up personalised advice “in seconds”.
  • Scam Intelligence was built using Gemini, Google’s AI chatbot, in collaboration with Google Cloud, and was due to be unveiled at a fintech event in Las Vegas on Monday.
  • During testing it increased the rate at which customers cancelled payments by 300% – suggesting it has encouraged customers to pause and reflect before making a purchase.
  • BlueSky replies are interesting and essentially anti-AI. Public adoption of these kinds of technology could be very complicated

MiniMax-M2 redefines efficiency for agents. It’s a compact, fast, and cost-effective MoE model (230 billion total parameters with 10 billion active parameters) built for elite performance in coding and agentic tasks, all while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides the sophisticated, end-to-end tool use performance expected from today’s leading models, but in a streamlined form factor that makes deployment and scaling easier than ever.

Tasks

  • Window cleaners – need to take some pix
  • Bank – done. Need some more paperwork
  • Send Philipp a quick note – done

SBIRs

  • 9:00 Standup – done
  • 2:00 IRAD – done
  • 3:00 Sprint planning – done

LLMs

  • Outline Grok article – got a rough first pass, and learned the origins of Baal, which I thought was this!