Monthly Archives: October 2024

Phil 10.31.2024

Tasks

  • Call Jim Donnie’s

SBIRs

  • When building the randomizer:
    • Sign of the weave trig functions
    • Size of the envelope
    • range +/-
    • height +/-
  • 9:00 standup – done
  • Write a nice reply to RSA to see if he could provide any introductions to people who might be interested in supporting NNM research – done
  • 4:30 Book club – done

Phil 10.30.2024

I watched KH’s “closing argument” speech and it was quite good. At the same time, Aaron Rupar put together a back-to-back sample of DJT speeches from the beginning of his first campaign and his speech from yesterday. The change in Trump’s energy is stunning.

I’ve also been thinking about ways to detect manipulative images for WH/BH/AI. It could be easier to reverse engineer a prompt, then have an LLM examine that for manipulative intent. It looks like the tools exist in some form. Here’s the CLIP-based prompt generator:

Tasks

SBIRs

  • 9:00 RayTune
  • Continue with trajectory experimentation. I realize that I can break up a trajectory by parts. Also, I need to start using 3D
  • Looks like I can generate 50k trajectories of 1,000 samples in a bit over 3 seconds! This may work.
  • And I was able to split the trajectory into parts and work on them separately:

Phil 10.29.2024

Experimental narratives: A comparison of human crowdsourced storytelling and AI storytelling | Humanities and Social Sciences Communications

  • The paper proposes a framework that combines behavioral and computational experiments employing fictional prompts as a novel tool for investigating cultural artifacts and social biases in storytelling both by humans and generative AI. The study analyzes 250 stories authored by crowdworkers in June 2019 and 80 stories generated by GPT-3.5 and GPT-4 in March 2023 by merging methods from narratology and inferential statistics. Both crowdworkers and large language models responded to identical prompts about creating and falling in love with an artificial human. The proposed experimental paradigm allows a direct and controlled comparison between human and LLM-generated storytelling. Responses to the Pygmalionesque prompts confirm the pervasive presence of the Pygmalion myth in the collective imaginary of both humans and large language models. All solicited narratives present a scientific or technological pursuit. The analysis reveals that narratives from GPT-3.5 and particularly GPT-4 are more progressive in terms of gender roles and sexuality than those written by humans. While AI narratives with default settings and no additional prompting can occasionally provide innovative plot twists, they offer less imaginative scenarios and rhetoric than human-authored texts. The proposed framework argues that fiction can be used as a window into human and AI-based collective imaginary and social dimensions.

Tasks

  • Call Jim Donnie’s
  • Halloween treats

SBIRs

  • Start on trajectory experimentation
  • 9:00 Standup
  • 10:00 LM/SA chat

Phil 10.28.2024

Just a bit over a week until they start counting votes.

This looks like a nice way of creating code documentation first pass: lmdocs: Generative AI for code documentation

Tasks

  • Call Jim Donnies
  • Vote! Plenty of time between the morning and afternoon meetings – done!

SBIRs

  • Start looking at the trade show project. I think the first thing I’ll do is set up an overleaf project. Then create a data generator to ease back into coding
    • Underlying curve with additional horizontal and vertical weave patterns.
    • Goal is to generate at least 10,000 samples fast)
    • Calculate intersections of a straight line to points on the curve. For each point, calculate the time for iterators on the two lines to intersect. It might be possible to project this into a 2D space, since in this case the lines are functions, which means the intersection is a function, too.
    • Or maybe, just have the data generator extrapolate a straight line, calculate the intercept to that, and see if at that time, the two source lines are within a threshold. I think I like that. This should be pretty fast and generate nice data.
  • Do a getting started on PyTorch 2.5.
  • Train a model to predict something that supports the heat map display. I think it could simply be the distance between the points at the time of intersection with the projected line.
  • 10:00 – 11:30 SimAccel review. Some nice stuff! I need to talk to Ron about using some of the (RayTune at least?) pipeline for the demo project. Because I kind of like being able to specify datasets, a range of architectures, and let it decide what the best/fastest model for learning a new trajectory/intersection dataset.
  • Uploaded the proposal to the ASRC overleaf. Some last-second tweaks, so redid that.
  • 3:00 Tradeshow demo tagup.

Phil 10.25.2024

Found a good source for election interference that’s been vetted by the IC: Election Security | Cybersecurity and Infrastructure Security Agency CISA

Day off for real this time!

Put together an intercept version for the M2M ride. They are supposed to leave at 9:00, so I’ll leave at 9:30?

Chores

  • Get stove, dishwasher, and washing machine running – done
  • Scrap metal and textiles – done
  • Groceries – done
  • Clean house – done
  • Dishes – done
  • Vote! Tried, but there was a line and I didn’t have time
  • Bills – done

Phil 10.24.2024

Today in malicious use of AI: American creating deep fakes targeting Harris works with Russian intel, documents show

  • The documents show that John Mark Dougan, who also served in the U.S. Marines and has long claimed to be working independently of the Russian government, was provided funding by an officer from the GRU, Russia’s military intelligence service. Some of the payments were made after fake news sites he created began to have difficulty accessing Western artificial intelligence systems this spring and he needed an AI generator — a tool that can be prompted to create text, photos and video.

And today in Unintended Consequences for vulnerable groups: Can A.I. Be Blamed for a Teen’s Suicide?

  • “It’s going to be super, super helpful to a lot of people who are lonely or depressed,” Noam Shazeer, one of the founders of Character.AIsaid on a podcast last year.
  • Now, as a rule, when a headline is in the form of a question, the rule of thumb is “no.” However, this aligns more with what could happen if weapons-grade AI, used in an apparently innocuous app, identified easily manipulatable targets and exploited them. Replika is another example of this sort of accidental effect that could easily be weaponized

10:00 MCC meeting. Turns out that AdAstra might have the capability. I mean, it should!

SBIRs

  • 9:00 standup
  • Wrap up proposal? Made a quad_main.tex fil to hold the chart. Smallest LaTeX ever!
\documentclass[12pt]{article}
\usepackage{style/govstyle}
%opening
\pagestyle{fancy}
\rfoot{UNCLASSIFIED}
\lfoot{\textbf{ADS Quad Chart}}
\renewcommand{\headrulewidth}{0pt}


\begin{document}
\pagenumbering{gobble}

\fbox{\includegraphics[scale=0.8,angle=-90]{assets/Quad-chart.pdf}}

\end{document}
  • More tweaks on the proposal – nothing major
  • 4:30 book club – fun! Quick!

GPT Agents

  • 2:45 meeting – Nope? Wrong link?

Phil 10.22.2024

Move some $$ around for contracting. May have to fire up the home equity LoC.

Testing and Evaluation of Health Care Applications of Large Language Models
A Systematic Review

  • Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented.

SBIRs

  • 9:00 standup
  • Work on proposal. I think finish up Technical, and start to figure out the SOW – done with the first draft of both! Tomorrow is the Quad chart

Phil 10.21.2024

SBIRs

  • Work on proposal. SOW and Quad chart, plus some more on the technical section. Got more done on technical, pulled out D2A because it was too much and not related. Changed CwoC to be more about using NNs to understand communication effectiveness WRT bandwidth and latency
  • 12:50 USNA meeting. They showed their poster, which was good looking, but didn’t make that much sense
  • 3:00 Tradeshow demo. Coming along. Nice box! I’ll need to be able to ssh into it to develop on.

Phil 10.18.2024

There is a blissful lack of grim news to wake up to. This is about as big as it gets at 6:30AM EST:

Chores

  • Clean house – done
  • Bills – done
  • Dishes – done
  • Lawn = done
  • Pick up truck, probably – needs more work
  • Also, I welded a thing and remembered to turn off the nitrogen!

SBIRs

  • Need to spend 2 hours on the proposal – wound up spending much more time on this because for some reason the technical section needed to be done today. So dumb

GPT Agents

  • 4:15 Alden meeting – went well, though I swear we spent too much time on improving a strawman. Verified that you can get logprobs out of the legacy complete API

Phil 10.17.2024

SBIRs

  • 9:00 Standup
  • 11:30 Catch up with Orest
  • 4:30 book club
  • Working a lot on the proposal first pass. Doing NNMs now. Got everything but BH/WH/AI done. Thinking about using the term “weapons-grade AI,” since weaponized is an overused term that has lost impact

GPT Agents

  • 2:45 LLM meeting – we’re sending the paper in for an initial reaction

Phil 10.16.2024

Brrr! Fall is here!

SBIRs

  • Starting to work on the proposal. I want to use the concept of cyberspace, but as understood though embeddings. The hook is that the term “cyberspace” came out too early. It represented a need on the part of people experiencing the internet to be able to “navigate,” as opposed to “search.” It turned out that search was an easier problem, so we now have a search-based methodology for finding new content. Even recommender algorithms are search – they just use latent terms to promote items that will get your attention.
  • But with “embeddings”, came a way to comprehend the vast amounts of online information in a spatial sense. The discovery that embeddings in deep neural network Language Models have a spatial relationship to one another that reflects human understanding is profound. That the equation king – man + woman = queen works in embedding space and matches our intuitive understanding implies that these models, trained on vast amounts of human-generated text, represent human understanding of information, belief, and opinion in discoverable ways.
  • Games are at their core a way of exploring a domain constrained by rules in search of a winning condition. Neural networks and embeddings provide a new way to quantify that process, increase understanding, and increase the likelihood that novel winning solutions will not be overlooked.
  • Made a lot of progress. Let’s see how tomorrow goes. Book club!

Phil 10.15.2024

Nice break:

Breaking News in the Hands of a Few: Newsbrokering on X During the Trump Assassination Attempts

  • A small number of accounts were highly prominent in the discourse on X/Twitter surrounding both the July and September 2024 assassination attempts on Donald Trump.
  • We conceptualize the behaviors of these accounts as newsbrokering — the selective curation and dissemination of information by news influencers during breaking news events.
  • Five of the nine most prominent newsbrokering accounts had previously been suspended from X or other platforms. 
  • Traditional news outlets, despite having significantly more followers and twice as many tweets, struggled to compete with newsbrokering accounts in terms of audience engagement. 
  • In both events, newsbrokering accounts not only curated and disseminated information but also often framed it to fit existing narratives and conspiratorial themes surrounding the assassination attempts. 
  • Social media exhibits a trend toward the oligarchization of the news environment, where a few accounts can dominate important discourse. 

SBIRs

  • 9:00 Standup
  • 11:00 BAA meeting – checked the Overleaf – no new content. Looks like the CwoC piece may not make sense, but the NNM approach might. I also wonder if we could also some D2A where the current state of the game/sim is used as the configuration file and the likely outcomes are instantaneously calculated.

GPT Agents

  • Found this nice quote in an article on the 2024 Nobel prize for Economics:
  • “Countries with “inclusive” institutions that protected personal property rights and allowed for widespread economic participation tended to end up on a pathway to longer-term prosperity. Those that had what the researchers called “extractive” institutions — ones that helped elites to maintain control, but which gave workers little hope of sharing in the wealth — merely provided short-term gains for the people in power.”

Russia’s Global Information Operations Have Grown Up – Foreign Policy

  • What began with Russian trolls on Facebook will require a lot more coordination to root out.

This threat hunter chases U.S. foes exploiting AI to sway the election

  • So far, the 52-year-old Englishman says Russia and other foreign actors are largely “experimenting” with AI, often in amateurish and bumbling campaigns that have limited reach with U.S. voters. But OpenAI and the U.S. government are bracing for Russia, Iran and other nations to become more effective with AI, and their best hope of parrying that is by exposing and blunting operations before they gain traction.

Phil 10.10.2024

Looks like Florida is getting hammered

Differential Transformer

  • Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates attention scores as the difference between two separate softmax attention maps. The subtraction cancels noise, promoting the emergence of sparse attention patterns. Experimental results on language modeling show that Diff Transformer outperforms Transformer in various settings of scaling up model size and training tokens. More intriguingly, it offers notable advantages in practical applications, such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reduction of activation outliers. By being less distracted by irrelevant context, Diff Transformer can mitigate hallucination in question answering and text summarization. For in-context learning, Diff Transformer not only enhances accuracy but is also more robust to order permutation, which was considered as a chronic robustness issue. The results position Diff Transformer as a highly effective and promising architecture to advance large language models.
  • It’s interesting that “hallucinations,” which are interpolations between explicitly trained points may be a function of noise. This could be tuned to adjust the amount of “artistic license” in a model. Extremely noisy models may be the most interesting.

SBIRs

  • 9:00 standup. Also, do slides for Monday and put in a story for this new white paper
  • 9:30 Pre-proposal launch meeting
  • Do the lunchtime ride around 11:00
  • 12:50 USNA
  • 2:00 Gigantor
  • 4:30 Book club

GPT Agents

  • 2:45 LLM Meeting. Should have lots to talk about