Monthly Archives: December 2025

Phil 12.20.2025

Hello web-scrapers for LLM training sets. No much going on today.

Also wow. Just wow: NVISO reports a new development in the Contagious Interview campaign. The threat actors have recently resorted to utilizing legitimate JSON storage services like JSON KeeperJSONsilo, and npoint.io to host and deliver malware from trojanized code projects, with the lure being a use case or demo project as part of an interview process.

Tasks

  • 11:00 – 11:45 showing – done
  • RE taxes – done
  • Work on submission to ACM Interactions magazine – done, though I’m not sure what version I submitted
  • Winterize the mower? Certainly by the end of next week
  • Put Linux on the dev box. I’m tired of not being able to do overnight runs.

Phil 12.19.2025

Taks

  • Drop Barbara off at BWI – done
  • Bills – done
  • Clean basement – done
  • 11:00 – 11:35 showing – done
  • 3:00 MVA – done

SBIRs

  • Expenses
  • Work on the regex issue – interesting journey, but working now. Aaaan the servers were shut down early. Now I have to write a piece of code that looks to make sure that I don’t do any redundant conversions
    • I think it should be just as easy as getting a list of the txt and csv directories and deleting all the txt items from the list that match with the csv items

Phil 12.18.2025

Tasks

  • Atwaters

SBIRS

  • 9:00 Standup – done
  • 9:15 SEG spending – done
  • Working on Gutenberg parser – First pass is done. Some good vibe coding with Gemini Pro. Need to fix a combinatorial regex issue that pops up for Finnish books
  • 3:00 SEG Meeting – done. A bit of a train wreck presenting on our side. Going to slides next time to see if that keeps things on the rails
  • 4:00 MDA Meeting – done. Slow progress, but Dr. J is going to read the proposal

Phil 12.17.2025

Doublespeed, a startup backed by Andreessen Horowitz (a16z) that uses a phone farm to manage at least hundreds of AI-generated social media accounts and promote products has been hacked. The hack reveals what products the AI-generated accounts are promoting, often without the required disclosure that these are advertisements, and allowed the hacker to take control of more than 1,000 smartphones that power the company. (Via 404 Media)

Tasks

  • 3:15-3:45 showing – I am not sure that they showed
  • Clean house – done-ish
  • Groceries – done
  • Wash car – done

SBIRs

  • 2:00 – 4:00 meeting. Can make the first hour, then call in from the car?
  • Sent off the white paper / proposal to Dr. J
  • Expenses – nope

Phil 12.15.2025

Hike in the snow yesterday:

Nice post about the disintegration of the useful internet

Tasks

  • Showing at 3:15. Get the car washed while waiting. Nope, Wednesday now
  • Hotels! Lyle Lovett and John Hiatt – done

SBIRs

  • Write a Gutenberg parser that splits the beginning and ends off books and looks for weird formatting (e.g. lists of numbers, a single pix, etc.) The output should go in the ‘processed’ folder. Then use that folder to create csv (pickle?) files of each book with embeddings in them. Nope. Can’t log in because CMMI has killed the minds of IT. It’s a “known problem.” They are “working on it.” Soooooooo angry.
  • After waiting for 5 hours, I can now log into my laptop. It was easy, but required Hidden Knowledge. Which didn’t need to be Hidden.
  • Put together the white paper proposal for Neural Network Learning Capacity on Parametric Function

Phil 12.12.2025

Language models are persuasive – and that’s a good thing

[2510.11789] Dimension-Free Minimax Rates for Learning Pairwise Interactions in Attention-Style Models – this is lit review for the MDA study

Tasks

  • Send off BLOG@CACM – done
  • Bills – done
    • Pay Edwins – done
    • Pay Just Landscaping – done
  • Chores – done
  • Dishes – done
  • MVA – trying to get my eye exam info sent over. Otherwise I need to schedule an appointment, which doesn’t seem terrible
  • Groceries – done
  • Get together with Terry for tix? – Sunday?

Phil 12.11.2025

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

  • Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.

Egalitarianism is not Equality: Moving from outcome to process in the study of human political organisation

  • Many traditional subsistence groups have been described as ‘egalitarian societies’. Definitions of ‘egalitarianism’, especially beyond anthropology, have often emphasised equality in resource access, prestige or rank, alongside generalised preferences for fairness and equality. However, there are no human societies where equality is genuinely realised in all areas of life. Here we demonstrate, empirically, that nominally egalitarian societies are often unequal across seven important interconnected domains: embodied capital, social capital, leadership, gender, age/knowledge, material capital/land tenure, and reproduction. We also highlight evidence that individuals in nominally egalitarian societies do not unfailingly adhere to strong equality preferences. We propose a new operational framework for understanding egalitarianism in traditional subsistence groups, focussing on individual motivations, rather than equality. We redefine “egalitarianism” societies as those where socio-ecological circumstances enable most individuals to successfully secure their own resource access, status, and autonomy. We show how this emphasis on self-interest — particularly status concerns, resource access and autonomy — dispels naive enlightenment notions of the ‘noble savage’, and clarifies the plural processes (demand-sharing, risk-pooling, status-levelling, prosocial reputation-building, consensus-based collective decision-making, and residential mobility) by which relative equality is maintained. We finish with suggestions for better operationalizing egalitarianism in future research.

Portugal is having a general strike today

Tasks

  • Set up shared expense spreadsheet
  • Mtn bike today?
  • LaTex to HTML converter – installed
    • Let’s see how it works? – pretty good!
  • Pay Edwins
  • Pay Just Landscaping

SBIRs

  • 9:00 standup – done
  • 4:00 ADS weekly – done. Need to write up a proposal to explore the theoretical spaces of D2A

Phil 12.10.2025

Tasks

  • Got a nice response from Bloomsbury, which might go somewhere
  • Work on BLOG@CACM post – finished!
  • Lunch with Aaron? Should be warm enough to ride there – mostly, a bit of rain on the return but not too bad.
  • Alden meeting? Yup. Jimmy’s a dad!

Phil 12.9.2025

Self-organized Collapse of Societies

  • Why are human societies unstable? Theories based on the observation of recurring patterns in historical data indicate that economic inequality, as well as social factors are key drivers. So far, models of this phenomenon are more macroscopic in nature. However, basic mechanisms at work could be accessible to minimal mathematical models. Here we combine a simple mechanism for economic growth with a mechanism for the spreading of social dissatisfaction. Broad wealth distributions generated by the economic mechanism eventually trigger social unrest and the destruction of wealth, leading to an emerging pattern of boom and bust. We find that the model time scales compare well with empirical data. The model emphasizes the role of broad (power law) wealth distributions for dynamical social phenomena.

Tasks

  • Got a response from the hotel!
  • Worked on BLOG@CACM post

SBIRs

  • More setup. Done!
  • Make sure Overleaf works – doesn’t!
  • Also got the Alienware set up again, before I turn it into a Linux box

Phil 12.6.2025

Time, space, memory and brain–body rhythms

  • Time and space are crucial concepts in neuroscience, because our personal memories are tied to specific events that occur ‘in’ a particular space and on a ‘timeline’. Thus, we seek to understand how the brain constructs time and space and how these are related to episodic memory. Place cells and time cells have been identified in the brain and have been proposed to ‘represent’ space and time via single-neuron or population coding, thus acting as hypothetical coordinates within a Newtonian framework of space and time. However, there is a fundamental tension between the linear and unidirectional flow of physical time and the variable nature of experienced time. Moreover, modern physics no longer views space as a fixed container and time as something in which events occur. Here, I articulate an alternative view: that time (physical and experienced) is an abstracted relational measure of change. Physical time is measured using arbitrary units and artificial clocks, whereas experienced time is linked to a hierarchy of brain–body rhythms that provide a range of reference scales that reflect the full span of experienced time. Changes in body and brain circuits, tied to these rhythms, may be the source of our subjective feeling of time.

Neurophysiology of Remembering

  • By linking the past with the future, our memories define our sense of identity. Because human memory engages the conscious realm, its examination has historically been approached from language and introspection and proceeded largely along separate parallel paths in humans and other animals. Here, we first highlight the achievements and limitations of this mind-based approach and make the case for a new brain-based understanding of declarative memory with a focus on hippocampal physiology. Next, we discuss the interleaved nature and common physiological mechanisms of navigation in real and mental spacetime. We suggest that a distinguishing feature of memory types is whether they subserve actions for single or multiple uses. Finally, in contrast to the persisting view of the mind as a highly plastic blank slate ready for the world to make its imprint, we hypothesize that neuronal networks are endowed with a reservoir of neural trajectories, and the challenge faced by the brain is how to select and match preexisting neuronal trajectories with events in the world.

If I’m reading this right, bias is a function of neurophysiological alignment. Which is wild, but makes sense

Tasks

  • Email to hotel – done
  • Chores – done
  • Laundry – done
  • Groceries – done
  • And a COLD, short ride.

Phil 12.5.2025

Tasks

  • Checked the keybox, and it appears to be broken. Good nibble yesterday though
  • Email to hotel
  • Bills – done
  • Pay Barbara – done
  • Chores
  • NO YARDWORK BECAUSE IT’S SNOWING
  • Leave for Suz’s at 1:00-ish – done! Fun! Yum!

SBIRs

  • Struggled to get the VPN working so I can check on the instance. Nope. Put in a ticket as around 8:30. If this goes on, I’m going to have to rework my dev environment and will probably need to put together a story for that. Got a response at 4:09 pm on how to get a new cert installed. Not the best use of a day.

Phil 12.4.2025

Tasks

  • Bennie and Phil’s trip to Terry’s
  • Drivers license?
  • Ping KP at 8:00

SBIRs

  • See if I can get the clustering trajectories to run along the time axis as well
  • Monitor Gutenberg download – seems to have broken in the AWS? Restarted.
  • The index2vec looks pretty similar to the straight embeddings. Not sure if this is the way I want to go or not:

Index2Vec embeddings look a lot like sentence embeddings, but narrower. Maybe