Phil 9.11.2024

It was a lovely early fall day 23 years ago. I don’t remember a cloud in the sky. Man, those memories are vivid.

Catonsville cleanup day 12:00 – 2:00. Nope, it’s the 14th. Don’t know how I got confused.

SBIRs

  • 12:00 CEO Employee town hall
  • 1:00 AI demo. I think this is just a capability thing?
  • Finished the first pass of the white paper!

Phil 9.10.2024

Baiting the bot

  • LLM chatbots can be engaged in endless “conversations” by considerably simpler text generation bots. This has some interesting implications.

SBIRs

  • 9:00 Standup
  • More white paper – got through the research objectives

Phil 9.9.2024

SBIRs

  • Added a bunch of links to the USNA sources for the capstone project
  • 10:30 NG demo meeting?
  • Made good progress on the white paper

Also took a big load of basement to the local acceptance facility. They don’t take paint, but the big one in Cockysville takes… well, pretty much everything. I’ll load up today and make another run tomorrow.

Phil 9.6.2024

Unexpected Benefits of Self-Modeling in Neural Systems

  • Self-models have been a topic of great interest for decades in studies of human cognition and more recently in machine learning. Yet what benefits do self-models confer? Here we show that when artificial networks learn to predict their internal states as an auxiliary task, they change in a fundamental way. To better perform the self-model task, the network learns to make itself simpler, more regularized, more parameter-efficient, and therefore more amenable to being predictively modeled. To test the hypothesis of self-regularizing through self-modeling, we used a range of network architectures performing three classification tasks across two modalities. In all cases, adding self-modeling caused a significant reduction in network complexity. The reduction was observed in two ways. First, the distribution of weights was narrower when self-modeling was present. Second, a measure of network complexity, the real log canonical threshold (RLCT), was smaller when self-modeling was present. Not only were measures of complexity reduced, but the reduction became more pronounced as greater training weight was placed on the auxiliary task of self-modeling. These results strongly support the hypothesis that self-modeling is more than simply a network learning to predict itself. The learning has a restructuring effect, reducing complexity and increasing parameter efficiency. This self-regularization may help explain some of the benefits of self-models reported in recent machine learning literature, as well as the adaptive value of self-models to biological systems. In particular, these findings may shed light on the possible interaction between the ability to model oneself and the ability to be more easily modeled by others in a social or cooperative context.

Chores

  • House – Done
  • Bills – Done
  • Lawn – done
  • Groceries – done
  • See if I can fix the door on the truck.
  • Start moving things out of the basement and into the garage – ordered boxes
  • T.W. Ellis – done

Phil 9.5.2025

Dialect prejudice predicts AI decisions about people’s character, employability, and criminality

  • Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models’ overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.

SBIRs

  • Finished and sent ONR email
  • Worked on the white paper. Mostly collecting things and fleshing out the project.
  • And I made a picture!
  • 2:00 SimAccel meeting
  • 3:05 LM collaboration meeting
  • It’s interesting to me how these meetings went. Lots of discussion on how to integrate the work discussed in the white paper, but really, it was an excuse for them to “put AI in the system.” I think this is going to be hard to keep on track and the amount of money will pull everyone onto the project. And that will be the end of our IRAD department.
  • 4:30 Book club

GPT-Agents

  • 2:45 meeting. Will need to drop at 3:05. Made some organizational progress, and found out that there is no page limits, so the summaries don’t have to be so strict.

Phil 9.4.2024

Beijing-Backed Trolls Target U.S. Voters as Election Nears (MSN paywall-free link)

  • “One of the world’s largest covert online influence operations, an operation run by Chinese state-linked actors, has become more aggressive in its efforts to infiltrate and sway U.S. political conversations ahead of the election,” said Jack Stubbs, chief intelligence officer at the research firm Graphika, which published the report Tuesday on Spamouflage’s alleged activities.

Two RT Employees Indicted for Covertly Funding and Directing U.S. Company that Published Thousands of Videos in Furtherance of Russian Interests

  • The indictment states the company described itself on its website as “a network of heterodox commentators that focus on Western political and cultural issues.” Tennessee-based company Tenet Media has the same message on its homepage. The indictment states the Tennessee-based company was incorporated around Jan. 19, 2022, which matches records from the Tennessee Secretary of State’s Office. The indictment says the company applied to the Tennessee Department of State to conduct business on May 22, 2023.

SBIRs

  • Need to send an email to here. the email has to go out very soon, and a response needs to come back ASAP. Need to integrate the PMs interests, the Capstone goals, and an overarching LLMs as underutilized latent knowledge systems. Once that’s done, see if we go direct to proposal regardless. Need to look through what’s required. Written. Need to get approval/edits
  • 10:30 Trade show demo planning? Yup. Fun!
  • Meeting with Aaron about prompt swarms?

Phil 9.3.2024

That is looking like a very pretty week. Except for Saturday, that is.

Work on content for Wolfram

SBIRs

  • 9:00 Sprint demos
  • 3:00 Sprint planning – Well, it looks like I’m probably not going to get to work on NNMs unless some funding comes in. I’m tasked to find opportunities for other projects, and to write control code for another opportunity. This is not exactly motivating. I’ve mapped out the weeks I can take off, and I’m not going to be heroic on this.
  • I did find a good potential opportunity that is worth reaching out to, and if they want a proposal, will kill some time through the end of the month. So the email has to go out very soon, and a response needs to come back ASAP. I’ll work on that tomorrow. Need to integrate the PMs interests, the Capstone goals, and an overarching LLMs as underutilized latent knowledge systems.

GPT Agents

  • Add the new critique. Done. Still half-baked though

Phil 9.2.2024

It’s Labor Day, so I think a local ride, get some groceries, and clean up a few outstanding tasks.

Also, I need to pick out some stuff for the basement and finish laundry

And, it’s a good day to get stuff done for Wolfram

Phil 8.31.2024

Sleeper Social Bots: a new generation of AI disinformation bots are already a political threat

  • This paper presents a study on the growing threat of “sleeper social bots,” AI-driven social bots in the political landscape, created to spread disinformation and manipulate public opinion. We based the name sleeper social bots on their ability to pass as humans on social platforms, where they’re embedded like political “sleeper” agents, making them harder to detect and more disruptive. To illustrate the threat these bots pose, our research team at the University of Southern California constructed a demonstration using a private Mastodon server, where ChatGPT-driven bots, programmed with distinct personalities and political viewpoints, engaged in discussions with human participants about a fictional electoral proposition. Our preliminary findings suggest these bots can convincingly pass as human users, actively participate in conversations, and effectively disseminate disinformation. Moreover, they can adapt their arguments based on the responses of human interlocutors, showcasing their dynamic and persuasive capabilities. College students participating in initial experiments failed to identify our bots, underscoring the urgent need for increased awareness and education about the dangers of AI-driven disinformation, and in particular, disinformation spread by bots. The implications of our research point to the significant challenges posed by social bots in the upcoming 2024 U.S. presidential election and beyond.

Phil 8.30.2024

Chores! Rain! The radar says that everything is moving out to the East, but still misting here

Everything, everywhere, is all the same: Cognitive Domain Operations: The PLA’s New Holistic Concept for Influence Operations

Need to work on the critique section a bit.

Need to read Diffusion Models Are Real-Time Game Engines

Got the recumbent over to Aaron, made it to the point that he could ride around the parking lot. Let me tell you, recumbents are not easy bikes to ride!

Flu and Covid shots!

Phil 8.29.2024

Bunch of interesting papers came across my feeds today:

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

  • Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as open-source in this https URL.

MiniCPM-V: A GPT-4V Level MLLM on Your Phone (Important for black hat / white hat AI)

  • The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of parameters and extensive computation. As a result, most MLLMs need to be deployed on high-performing cloud servers, which greatly limits their application scopes such as mobile, offline, energy-sensitive, and privacy-protective scenarios. In this work, we present MiniCPM-V, a series of efficient MLLMs deployable on end-side devices. By integrating the latest MLLM techniques in architecture, pretraining and alignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1) Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 on OpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strong OCR capability and 1.8M pixel high-resolution image perception at any aspect ratio, (3) trustworthy behavior with low hallucination rates, (4) multilingual support for 30+ languages, and (5) efficient deployment on mobile phones. More importantly, MiniCPM-V can be viewed as a representative example of a promising trend: The model sizes for achieving usable (e.g., GPT-4V) level performance are rapidly decreasing, along with the fast growth of end-side computation capacity. This jointly shows that GPT-4V level MLLMs deployed on end devices are becoming increasingly possible, unlocking a wider spectrum of real-world AI applications in the near future.

Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models

  • Recent advances in AI have been significantly driven by the capabilities of large language models (LLMs) to solve complex problems in ways that resemble human thinking. However, there is an ongoing debate about the extent to which LLMs are capable of actual reasoning. Central to this debate are two key probabilistic concepts that are essential for connecting causes to their effects: the probability of necessity (PN) and the probability of sufficiency (PS). This paper introduces a framework that is both theoretical and practical, aimed at assessing how effectively LLMs are able to replicate real-world reasoning mechanisms using these probabilistic measures. By viewing LLMs as abstract machines that process information through a natural language interface, we examine the conditions under which it is possible to compute suitable approximations of PN and PS. Our research marks an important step towards gaining a deeper understanding of when LLMs are capable of reasoning, as illustrated by a series of math examples.

The ATLAS Matrix shows the progression of tactics used in attacks as columns from left to right, with ML techniques belonging to each tactic. Click on the blue links to learn more about each item, or search and view ATLAS tactics and techniques using the links at the top navigation bar. View the ATLAS matrix highlighted alongside ATT&CK Enterprise techniques on the ATLAS Navigator.

SBIRs

  • Add headers and footers to the white paper, go over once more with Aaron, and send to Orest. Done. Sent to ARL!
  • 1:00 Tbolt meeting. Look over new documentation. Looks like we’re going to do something. Communication on ActiveMQ
  • 4:30: Book club

GPT Agents

  • 3:00 Meeting. Need to finish refactoring paper before then

Phil 8.28.2024

It is going to be hot today. Ride early!

SBIRs

  • Looks like a light day. I’m going to work on the NNM white paper and try to get it to the point to submit. Done! Two pages!
  • Ping MARCOM about interview request
  • 3:00 WHAIM – changed to work on the NNM white paper
  • 1:00 – 4:00 PWND2 industry day. Interesting, but not our thing

Phil 8.27.2024

SBIRs

  • Good lord, I got a (positive!) response from the ARL! In less that 12 hours! Looks like nothing too formal for the white paper: “It’s to help me understand where a new project might fit, with relatively little effort on the part of a PI compared with writing a full (NSF-style/scope) proposal.”
  • 9:00 standup
  • 1:00 Thunderbolt
  • 5:00 S3i meeting

GPT Agents. Need to do more refactoring of the paper

Phil 8.26.2024

Longest ride of the season this past Saturday. Beautiful, but I was barely in good enough shape.

SBIRs

  • Ethics training if I can log in. Nope – still locked out. Ok, now I can get in, but there is nothing there? Even weirder, the course I should take is marked as complete. Not sure what to do at this point since I can’t take a completed course, but I did download the cert if someone changes things again.
  • S3i meeting
  • Other training? Sent email to T. Looks like the system says I’m done
  • Need to figure out a WHAI or NNM demo that can be done by the end of the year. So about 16 weeks, when you pull out TDay and Xmas