Monthly Archives: December 2024

Phil 12.27.2024

Universality of representation in biological and artificial neural networks

  • Many artificial neural networks (ANNs) trained with ecologically plausible objectives on naturalistic data align with behavior and neural representations in biological systems. Here, we show that this alignment is a consequence of convergence onto the same representations by high-performing ANNs and by brains. We developed a method to identify stimuli that systematically vary the degree of inter-model representation agreement. Across language and vision, we then showed that stimuli from high-and low-agreement sets predictably modulated model-to-brain alignment. We also examined which stimulus features distinguish high-from low-agreement sentences and images. Our results establish representation universality as a core component in the model-to-brain alignment and provide a new approach for using ANNs to uncover the structure of biological representations and computations.

Well, here’s a weapon: StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models

  • Predicting and reasoning how a video would make a human feel is crucial for developing socially intelligent systems. Although Multimodal Large Language Models (MLLMs) have shown impressive video understanding capabilities, they tend to focus more on the semantic content of videos, often overlooking emotional stimuli. Hence, most existing MLLMs fall short in estimating viewers’ emotional reactions and providing plausible explanations. To address this issue, we propose StimuVAR, a spatiotemporal Stimuli-aware framework for Video Affective Reasoning (VAR) with MLLMs. StimuVAR incorporates a two-level stimuli-aware mechanism: frame-level awareness and token-level awareness. Frame-level awareness involves sampling video frames with events that are most likely to evoke viewers’ emotions. Token-level awareness performs tube selection in the token space to make the MLLM concentrate on emotion-triggered spatiotemporal regions. Furthermore, we create VAR instruction data to perform affective training, steering MLLMs’ reasoning strengths towards emotional focus and thereby enhancing their affective reasoning ability. To thoroughly assess the effectiveness of VAR, we provide a comprehensive evaluation protocol with extensive metrics. StimuVAR is the first MLLM-based method for viewer-centered VAR. Experiments demonstrate its superiority in understanding viewers’ emotional responses to videos and providing coherent and insightful explanations.

Phil 12.25.2024

Happy Isaac Newton’s birthday for those who celebrate! He’d be 1,418 years old today. Calculus is old.

Comparing cooperative geometric puzzle solving in ants versus humans

  • Biological ensembles use collective intelligence to tackle challenges together, but suboptimal coordination can undermine the effectiveness of group cognition. Testing whether collective cognition exceeds that of the individual is often impractical since different organizational scales tend to face disjoint problems. One exception is the problem of navigating large loads through complex environments and toward a given target. People and ants stand out in their ability to efficiently perform this task not just individually but also as a group. This provides a rare opportunity to empirically compare problem-solving skills and cognitive traits across species and group sizes. Here, we challenge people and ants with the same “piano-movers” load maneuvering puzzle and show that while ants perform more efficiently in larger groups, the opposite is true for humans. We find that although individual ants cannot grasp the global nature of the puzzle, their collective motion translates into emergent cognitive skills. They encode short-term memory in their internally ordered state and this allows for enhanced group performance. People comprehend the puzzle in a way that allows them to explore a reduced search space and, on average, outperform ants. However, when communication is restricted, groups of people resort to the most obvious maneuvers to facilitate consensus. This is reminiscent of ant behavior, and negatively impacts their performance. Our results exemplify how simple minds can easily enjoy scalability while complex brains require extensive communication to cooperate efficiently.

Phil 12.24.202

Why Misinformation Must Not Be Ignored

  • Recent academic debate has seen the emergence of the claim that misinformation is not a significant societal problem. We argue that the arguments used to support this minimizing position are flawed, particularly if interpreted (e.g., by policymakers or the public) as suggesting that misinformation can be safely ignored. Here, we rebut the two main claims, namely that misinformation is not of substantive concern (a) due to its low incidence and (b) because it has no causal influence on notable political or behavioral outcomes. Through a critical review of the current literature, we demonstrate that (a) the prevalence of misinformation is nonnegligible if reasonably inclusive definitions are applied and that (b) misinformation has causal impacts on important beliefs and behaviors. Both scholars and policymakers should therefore continue to take misinformation seriously.

Contextual Backpropagation Loops: Amplifying Deep Reasoning with Iterative Top-Down Feedback

  • Deep neural networks typically rely on a single forward pass for inference, which can limit their capacity to resolve ambiguous inputs. We introduce Contextual Backpropagation Loops (CBLs) as an iterative mechanism that incorporates top-down feedback to refine intermediate representations, thereby improving accuracy and robustness. This repeated process mirrors how humans continuously re-interpret sensory information in daily life-by checking and re-checking our perceptions using contextual cues. Our results suggest that CBLs can offer a straightforward yet powerful way to incorporate such contextual reasoning in modern deep learning architectures.

GPT Agents

  • Put the images in the paper and added a paragraph of description for each.

Phil 12.23.2024

Saw this on Mastodon: “A lot of people compare Trump 2.0 to Julius Caesar, but he reminds me more of Sulla—the man who set the template for Caesar’s rise. Sulla’s personal vendettas, and power grabs led to the collapse of the Roman Republic and paved the way for the Empire.

Caesar had a grand vision for Rome’s future, but Sulla was driven by personal grievances and revenge. Trump follows this model.

Sulla also had Catulus and Crassus. Trump 2.0 had McConnell and Musk.

Mitch McConnell = Catulus + Cicero: Like Catulus, McConnell’s conservative and focused on maintaining old power structures. But like Cicero, he’s a master strategist—calculating, maneuvering, and holding on to influence in a crumbling system.

Elon Musk = Crassus: Rich, opportunistic, and power-hungry. Crassus used wealth to buy influence, and Musk does the same today in tech. Both are masters of leveraging money to shift power and reshape their worlds.

Without key figures like McConnell and Musk, Trump 2.0 would not have gotten back in. let’s hope the republic can survive this and not end up with a Caesar down the track. But without smashing the oligarchs/corporations and removing the money from the US government I can’t see it surviving.

uspol #HistoryRhymes #rome”

GPT Agents

  • Tweaking figures – done! (hopefully)

Saw Flow. Excellent!

Phil 12.21.2024

Winter solstice! Tomorrow will be one second longer!

Scaling test-time compute – a Hugging Face Space by HuggingFaceH4

  • Over the last few years, the scaling of train-time compute has dominated the progress of large language models (LLMs). Although this paradigm has proven to be remarkably effective, the resources needed to pretrain ever larger models are becoming prohibitively expensive, with billion-dollar clusters already on the horizon. This trend has sparked significant interest in a complementary approach: test-time compute scaling. Rather than relying on ever-larger pretraining budgets, test-time methods use dynamic inference strategies that allow models to “think longer” on harder problems. A prominent example is OpenAI’s o1 model, which shows consistent improvement on difficult math problems as one increases the amount of test-time compute:

Another crazy AI slop thing: BBC complains to Apple over misleading shooting headline

  • Apple Intelligence, launched in the UK earlier this week, uses artificial intelligence (AI) to summarise and group together notifications. This week, the AI-powered summary falsely made it appear BBC News had published an article claiming Luigi Mangione, the man arrested following the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. He has not.

The unbearable slowness of being: Why do we live at 10 bits/s? (ArXiv link)

  • This article is about the neural conundrum behind the slowness of human behavior. The information throughput of a human being is about 10 bits/s. In comparison, our sensory systems gather data at ∼1,000,000,000 bits/s. The stark contrast between these numbers remains unexplained and touches on fundamental aspects of brain function: what neural substrate sets this speed limit on the pace of our existence? Why does the brain need billions of neurons to process 10 bits/s? Why can we only think about one thing at a time? The brain seems to operate in two distinct modes: the “outer” brain handles fast high-dimensional sensory and motor signals, whereas the “inner” brain processes the reduced few bits needed to control behavior. Plausible explanations exist for the large neuron numbers in the outer brain, but not for the inner brain, and we propose new research directions to remedy this.

GPT Agents

  • Worked some more on the “for profit” diagram. Need to start on the “egalitarian” diagram.

Tasks

  • Drained the washing machine, so hopefully that will help. On the ride today, Ross suggested that I put the machines in the garage. “That’s silly,” I think. “The garage is full of bikes and stuff from the basement!”
  • But bikes weigh less than washing machines and are far less likely to mess up a floor that may be soft in places. So I brought some of the bikes back into the basement and muscled (washers are heavy!) the washer and dryer into the garage, where they can sit out the cold snap:
  • Laundry! Bring soap! Done!

Phil 12.20.2024

We are at the bottom of the curve:

Tasks

  • Bills – done
  • 8:00 Floor – done! Now it needs to cure
  • Clean house – done
  • Dishes – done
  • Wrap gifts
  • 12:00 Greg done
  • 2:30 WSJ done. Fun!

GPT Agents

  • Made a lot of assets yesterday. Need to start assembling them. I think I’m going to make two figures, one extractive, and one inclusive. Did the For profit figure

Phil 12.19.2024

From Brad DeLong’s Substack. I think the point of ChatGPT being easy, convincing, and “sloppy” is an important triangulation on human nature, particularly in younger, less experienced people in academia – e.g. students.

  • Education & MAMLMsJosh Gans’s view is that our students already find it much easier to ask questions of ChatGPT than to go to office hours or email and get an answer back a day or two later, and so they will ask ChatGPT the questions. The result is that the average quality of the answers they get back will be low: ChatGPT has been designed and trained to exhibit mammoth amounts of verbal linguistic fluency—it can be quite persuasive—but its level of substantive knowledge and misinformation is that of your average internet s***poster.
  • Yes, it is possible, through “prompt engineering”, to do something to direct ChatGPT’s attention to that part of its lossy-compressed training data that contains reliable information. But our students do not know how to do that. And even those who claim that they do know admit that it is a black and unreliable art.

APpaREnTLy THiS iS hoW yoU JaIlBreAk AI

  • New research from Anthropic, one of the leading AI companies and the developer of the Claude family of Large Language Models (LLMs), has released research showing that the process for getting LLMs to do what they’re not supposed to is still pretty easy and can be automated. SomETIMeS alL it tAKeS Is typing prOMptS Like thiS.

SBIRs

  • Waiting for responses to the draft. I think I’ll also do a spreadsheet for hours while I’m waiting. Done. And then there were requests and now they are done too.
  • 9:00 Standup – done
  • 4:30 Last book club of the year – done. Finishe the book next time

GPT Agents

  • Reviewed changes for the article
  • Need to get a first draft of the diagram. I think I need the points on the graph, and then then arrows for control, value, and content(?). Maybe do this in Gephi? Downloaded the new version. Going to give it a shot. Nah. Not enough control. Back to Illustrator

Phil 12.18.2024

Johns Hopkins is still doing tracking of COVID. Here’s Maryland:

Replication for Language Models Problems, Principles, and Best Practice for Political Science

  • Excitement about Large Language Models (LMs) abounds. These tools require minimal researcher input and yet make it possible to annotate and generate large quantities of data. While LMs are promising, there has been almost no systematic research into the reproducibility of research using them. This is a potential problem for scientific integrity. We give a theoretical framework for replication in the discipline and show that much LM work is wanting. We demonstrate the problem empirically using a rolling iterated replication design in which we compare crowdsourcing and LMs on multiple repeated tasks, over many months. We find that LMs can be (very) accurate, but the observed variance in performance is often unacceptably high. In many cases the LM findings cannot be re-run, let alone replicated. This affects “downstream” results. We conclude with recommendations for best practice, including the use of locally versioned ‘open source’ LMs.
import pandas as pd
from tkinter import filedialog

filename = filedialog.askopenfilename(filetypes=(("XLSX files", "*.xlsx"),("All Files", "*.*")), title="Load XLSX Files")
if filename:
    print("opening {}".format(filename))

    df = pd.read_excel(filename)
    print("\\begin{table}[]\n\centering")
    print(df.to_latex())
    print("\caption{Caption}\n\label{tab:my_label}\n\end{table}")

SBIRs

  • Got some feedback. Need to roll it in. Also, shorter bios and start trimming to 5 pages.
  • Maaaaaaaaaaaaayyyyyyyybeeee get back to some coding.

GPT Agents

  • Start working on diagram. Maybe tweak the paper to mention the above?

Phil 12.17.2024

Learned a new word today:

Tasks

  • Ping Carlos – done
  • Check for TW Ellis – done
  • Call Dentist – and now I have a new filling

SBIRs

  • Frame out a rough draft – done

Phil 12.16.2024

Email for Carlos

SBIRs

  • 9:00 Sprint Demos – done
  • 3:00 Sprint Planning – stories are written
  • Started on the SBIR proposal. Template is done
  • Two unplanned meetings that might mean a trip to Huntsville again
  • Got together with Aaron to figure out the approach to the SBIR

Phil 12.14.2024

I Traded My News Apps for Rumble, the Right-Wing YouTube. Here’s What I Saw

  • Blame for any hiccups in Mr. Trump’s strategy was assigned to Democrats or even Republicans who were not sufficiently obedient.

Speaking of detecting and disrupting manipulation: A phone company developed an AI ‘granny’ to beat scammers at their own game. The number seeding is a literal counterattack/exploit in its own right

  • O2, the company behind the scam-baiting granny, said the AI technology can keep scammers on the phone for 40 minutes at a time. Daisy was trained with the help of YouTuber and software engineer Jim Browning, who has made an online career exposing scammers to his community of 4.4 million subscribers.
  • In order to bait scammers into time-wasting calls, the company utilized the practice of “number seeding,” which put the AI granny’s number on lists used by scammers to find their victims. The granny gimmick’s goal is twofold: to keep scammers away from real people and to raise awareness about the dangers of risky phone hoaxes.

Phil 12.13.2024

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

  • Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the problem of indirect data leaking, where modelsare iteratively improved by using data coming from users. In this work, we conduct the first systematic analysis of work using OpenAI’s GPT-3.5 and GPT-4, the most prominently used LLMs today, in the context of data contamination. By analysing 255 papers and considering OpenAI’s data usage policy, we extensively document the amount of data leaked to these models during the first year after the model’s release. We report that these models have been globally exposed to ∼4.7M samples from 263 benchmarks. At the same time, we document a number of evaluation malpractices emerging in the reviewed papers, such as unfair or missing baseline comparisons and reproducibility issues. We release our results as a collaborative project on https://leak-llm.github.io/, where other researchers can contribute to our efforts.

SBIRs

  • 9:00 Meeting
  • 12:50 USNA

Phil 12.12.2024

Atlantic circulation collapse? New clues on the fate of a crucial conveyor belt

  • So in summary, we have at least some reassurance from the North Atlantic data that a full-on AMOC collapse hasn’t begun. And it’s unlikely that any future collapse would reach its end point any sooner than the early to mid-2100s. Yet there’s also legitimate concern – stoked by recent work from climate modelers and statisticians – that a tipping point toward eventual collapse could arrive as soon as the next several decades, especially if fossil-fuel emissions aren’t cut sharply. In part two of this two-part post, we’ll look at some of the new research on early-warning signs of AMOC collapse, what those scientists and other assessments are telling us about the threat, and how we can help limit the odds of an AMOC collapse happening in the first place.

SBIRs

  • 9:00 standup
  • 11:00 drop off car / pick up car – done
  • 2:30 “AI” Meeting – could not work it in
  • 4:30 Book club – Rukan couldn’t make it
  • Good progress today, but still not quite right

GPT Agents

  • LLM meeting – worked on the diagram for the egalitarian AI paper

Phil 12.11.24

Ugh:

SBIRs

  • See how the line segment intersection code improves things. Maybe make some test files?
  • Also, I want to see how fast I could calculate a 256×256 grid given a source, destination, and a spectator.

GPT Agents

  • 3:00 Alden meeting. Bring up the GPTZero / Chess model as human result.
  • Work on book?

Phil 12.10.2024

Open source maintainers are drowning in junk bug reports written by AI

  • “Recently I’ve noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open source projects,” he wrote, pointing to similar findings from the Curl project in January. “These reports appear at first glance to be potentially legitimate and thus require time to refute.”

SBIRs

  • 9:00 standup
  • Finish refactoring and integrate with trajectories?

GPT Agents

  • Add a section on Salt Typhoon