Monthly Archives: April 2025

Phil 4.26.2025

ECLeKTic is a new benchmark designed to evaluate the ability of large language models (LLMs) to transfer knowledge across different languages. It uses a closed-book question answering task, where models must rely on their internal knowledge to answer questions based on information relevant to a specific language

Tasks

  • Bills – done
  • Lawn! Done
  • Phlox! Done
  • Groceries – done
  • Spothub
  • Dentist at 1:10 – leave at 12:00? – done, and a nice ride to boot
  • Aaron M at 5:30 – fun and done

GPT Agents

  • 3:00 LLM meeting
  • P33 Communities – something about how we’ve always had communities, and that there have always been communities based on virtual elements such as family, religion, language, and physical locations. And in some cases, the virtual is stronger than the physical; gerrymandering, redlining, ghettos, etc.

Phil 4.23.2025

Towards a Trajectory-powered Foundation Model of Mobility

  • This paper advocates for a geospatial foundation model based on human mobility trajectories in the built environment. Such a model would be widely applicable across many important societal domains currently addressed independently, including transportation networks, data-driven urban planning, tourism, and sustainability. Unlike existing large vision-language models, trained primarily on text and images, this foundation model should integrate the complex spatiotemporal and multimodal data inherent to mobility. This paper motivates this challenging research agenda, outlining many downstream applications that would be significantly impacted and enabled by such a model. It then explains the critical spatial, temporal, and contextual factors that such a model must capture in trajectories. Finally, it concludes with several research questions and directions, laying the foundations for future exploration in this exciting and emerging field.

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models

  • Last November we introduced two pre-trained, multi-purpose models to address many of the challenges of geospatial modeling: the Population Dynamics Foundation Model (PDFM), which captures the complex interplay between population behaviors and their local environment, and a new trajectory-based mobility foundation model. Since then, over two hundred organizations have tested the PDFM embeddings for the United States and we are expanding the dataset to cover the UK, Australia, Japan, Canada, and Malawi for experimental use by selected partners.
  • Social trajectories would be a straightforward adaptation of these models

Tasks

  • Delete old objects – done
  • Reach out to Chen Qifan?
  • Plant plants – beds are done. Broke a soaker hose that I have to replace. Still need to do the flower boxes
  • 4:00 Fidelity – done. Interesting!

SBIRs

  • 10:00 SAIC meeting – need to put together a slide. Nope, couldn’t agree on what to do.

Phil 4.22.2025

Tasks

SBIRs

  • Create tradeshow overleaf – done
  • 9:30 APL discussion – done
  • 11:00 NGC2 discussion – done
  • 3:00 Tradeshow demo – done
  • 3:30 code review -done

Phil 4.21.2025

Sheesh

Ugh

Warding Off Muscle Cramps As We Age

đŸ§—đŸ» CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

  • Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLustering-based Iterative Data Mixture Bootstrapping (CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. This strategy enables effective domain adaptation without relying solely on curated data. When continuously trained on 400B tokens with this mixture, our 950M model exceeds the state-of-the-art Llama-3.2-1B by 2.0% averaged across 12 general reasoning tasks. Moreover, we observe that optimizing for a specific domain (e.g., Social Sciences) yields a 5% improvement over random sampling. Finally, we introduce ClimbLab, a filtered 1.3-trillion-token corpus with 20 clusters as a research playground, and ClimbMix, a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token budget. We analyze the final data mixture, elucidating the characteristics of an optimal data mixture.

Oh, this looks interesting: Values in the wild: Discovering and analyzing values in real-world language model interactions

  • In the latest research paper from Anthropic’s Societal Impacts team, we describe a practical way we’ve developed to observe Claude’s values—and provide the first large-scale results on how Claude expresses those values during real-world conversations. We also provide an open dataset for researchers to run further analysis of the values and how often they arise in conversations.

Need to follow up for sure

SBIRs

  • Slides
  • Stories – Just Phase II deliverables
  • 9:00 Sprint review
  • 3:00 Sprint planning

Phil 4.18.2025

Tasks

  • BSO – need to call
  • Start moving items out of the trailer – started. Mostly goodwill
  • Dishes – done
  • Mow lawn
  • Start house cleaning – done
  • Bills

SBIRs

  • See if I can finish off the saving and loading of the model for inference – DONE! Everything looks much better
  • Sprint slides and adjust the points – checked. It all seems reasonable
  • Write one story for next week that’s just “write SBIR deliverables”

Phil 4.17.2025

LLM use in the wild:

Had a great early season ride in the PA hills

SBIRs

  • Going to have to spend some time focusing on the final deliverables. I’m going to need to write the final quarterly report and a summary. Should be able to finish next week
  • Finished the first pass though KA! Need to find an editor

Phil 4.16.2025

This is pretty wild: Generate videos in Gemini and Whisk with Veo 2. Not sure if I have a good use case, but I think I’d like to play around with something more abstract.

Wise – done

BSO

Nice album: DVOƘÁK, A.: Greatest Melodies (arr. P. Breiner for piano)

SBIRs

  • Create a MinimumTrain.py and MinimumInfer.py in the experiments directory, and get those working with the debug data – done
    • Dataloader – done
    • Model – done
    • Training loop – done
    • Save out pth weights and structure – trickier than you would think if you want to load a model without prior knowledge of its structure
    • Load in and test
    • It should be possible to combine both where a model is trained, saved and evaluated. Then we can do a grid search of some basic hyperparamerters and keep track of the accuracy

GPT Agents

  • 3:00 Alden meeting. Ask about NIST people who might need jobs

Phil 4.15.2025

I think my plumeria died over the winter. It had stopped growing a while back though.

State Terror: A brief guide for Americans

  • In the history of state terror, the escape from law into coercion takes three forms, all of which were on display, incipiently, in the White House yesterday: the leader principle; the state of exception; and the zone of statelessness.

SBIRs

  • 9:00 standup
  • Working on the model, which is not training correctly at all. I made some simple trig data, sin(x), cos(x), sin(-x) as the input and -cos(x) as the target. This is simple, straightforward stuff. And yet, I get the following:
  • I’m going to do a rewrite from scratch and see if that version does any better. If it does, then we know the problem is in the training code
  • 3:00 Tradeshow demo
  • 3:30 catch up and coordination

GPT Agents

  • KA book
  • P33

Phil 4.14.2025

Watch for about ten seconds. Really. Stop everything and watch

Trump & Bukele Plot US Citizen Detention In Salvadoran Torture Camps, While Defying Supreme Court Via Gibberish Responses To Reporters

In other news, I had a very nice weekend of riding in the Spring-ish weather

9:00 Dentist

International driver’s license? Nope

SBIRs

  • No APL meeting today
  • Refactor config generator to us a base class
  • Read through Aaron’s notes and discuss
  • KA book if possible

GPT Agents

  • Write email suggesting the article is something from Shimei’s personal perspective that ties together her history and IS.
  • Put together summaries of the completed sections

Phil 4.11.2025

This is going to cause so much damage – deliberate and unintended (NY Times)

Checking out Google’s AI coder IDE firebase. Wow!. That took… 5 minutes

Tasks

  • Bills – done
  • Dishes – done
  • Lawn, if it doesn’t rain – raining. Lots.
  • Goodwill / Trader Joes – done / International driver’s license
  • Chores – done

SBIRs

  • Have a thought about the config generator. Make one base config class and then all inheriting classes get their own default config Dict

GPT agents

  • 3:00 Meeting. Really interesting. Lots of good content. I would love to have Shimei write something from a personal perspective that ties together her history and IS.

Phil 4.10.2025

Tasks

  • Groceries!

SBIRs

  • Need to ping T about expensis
  • 9:00 Standup
  • 1:00 RTAT

GPT Agents

  • A Survey of Social Cybersecurity: Techniques for Attack Detection, Evaluations, Challenges, and Future Prospects
    • In today’s digital era, the Internet, especially social media platforms, plays a significant role in shaping public opinions, attitudes, and beliefs. Unfortunately, the credibility of scientific information sources is often undermined by the spread of misinformation through various means, including technology-driven tools like bots, cyborgs, trolls, sock-puppets, and deep fakes. This manipulation of public discourse serves antagonistic business agendas and compromises civil society. In response to this challenge, a new scientific discipline has emerged: social cybersecurity.
  • Do Large Language Models Solve the Problems of Agent-Based Modeling? A Critical Review of Generative Social Simulations
    • Recent advancements in AI have reinvigorated Agent-Based Models (ABMs), as the integration of Large Language Models (LLMs) has led to the emergence of “generative ABMs” as a novel approach to simulating social systems. While ABMs offer means to bridge micro-level interactions with macro-level patterns, they have long faced criticisms from social scientists, pointing to e.g., lack of realism, computational complexity, and challenges of calibrating and validating against empirical data. This paper reviews the generative ABM literature to assess how this new approach adequately addresses these long-standing criticisms. Our findings show that studies show limited awareness of historical debates. Validation remains poorly addressed, with many studies relying solely on subjective assessments of model `believability’, and even the most rigorous validation failing to adequately evidence operational validity. We argue that there are reasons to believe that LLMs will exacerbate rather than resolve the long-standing challenges of ABMs. The black-box nature of LLMs moreover limit their usefulness for disentangling complex emergent causal mechanisms. While generative ABMs are still in a stage of early experimentation, these findings question of whether and how the field can transition to the type of rigorous modeling needed to contribute to social scientific theory.

Phil 4.9.2025

A spokesperson for UMBC said in an email that four international students had their visas canceled with no prior notice or explanation.A UMD spokesperson said only that the campus was among those nationwide whose students suddenly lost their ability to legally stay in the U.S.

UMBC discovered the visa revocations during a daily audit of the Student and Exchange Visitor Information System, also known as SEVIS, said Cherie Parker, director of media relations for the university. That website is run by the U.S. Department of Homeland Security to maintain information regarding student visas.

SBIRs

  • Did a lot of work on the SPIE paper
  • Read the game support request, and had a bunch of questions
  • KA book
  • 3:00 meeting? Maybe? Can’t find a link. Nope, it was in person. Had a good chat with Clay after

GPT Agents

P33 – finished the first pass through the psychology/sociology section

Phil 4.8.2025

Progress is not always a smooth or merry ride. For a few decades, nations live according to one paradigm. Then it stops working and gets destroyed. When the time comes to build a new paradigm, progressives talk about economic redistribution; conservatives talk about cultural and civic repair. History shows that you need both: Recovery from national crisis demands comprehensive reinvention at all levels of society. If you look back across the centuries, you find that this process requires several interconnected efforts. – David Brooks, I should have seen this coming

This is wonderful!

Moved more money out of stocks

Having a nice chat with Aaron M.

SBIRs

  • 9:00 Standup
  • Helped Ron with more cites
  • S&T meeting this afternoon with Aaron?
  • More KA