Neuronpedia is a free and open source platform for AI interpretability. It may be a nice way at getting at the layer activations that I’ve been looking for.
Set up the example chapters in the ACM book format – done. That took a while. I can’t get the whole booke to work either.
Cleaning/Organizing – done
7:00 – 7:45 showing – done
Pick up Barbara 9:50 – done
Terry at 6:00 – done
SBIRs
9:00 standup – done
More runs started
Start looking at how to run UMAP across multiple pickle files. Probably just iterate over the files to create the mapping and save it, then a second stage to calculate the mapped points
Send the Gemini analysis to Clay and CC Aaron – done. Clay says “Sounds like a good shrooms date, but not a collaborator” L.O.L.
Current Large Language Models (LLMs) safety approaches focus on explicitly harmful content while overlooking a critical vulnerability: the inability to understand context and recognize user intent. This creates exploitable vulnerabilities that malicious users can systematically leverage to circumvent safety mechanisms. We empirically evaluate multiple state-of-the-art LLMs, including ChatGPT, Claude, Gemini, and DeepSeek. Our analysis demonstrates the circumvention of reliable safety mechanisms through emotional framing, progressive revelation, and academic justification techniques. Notably, reasoning-enabled configurations amplified rather than mitigated the effectiveness of exploitation, increasing factual precision while failing to interrogate the underlying intent. The exception was Claude Opus 4.1, which prioritized intent detection over information provision in some use cases. This pattern reveals that current architectural designs create systematic vulnerabilities. These limitations require paradigmatic shifts toward contextual understanding and intent recognition as core safety capabilities rather than post-hoc protective mechanisms.
My reaction to this is that either 1) It may be a mechanism where bad actors can learn to manipulate intent or 2) bad actors can use this mechanism to search for the deeper intentions in potential candidates that align with the goals of the actors.
Also interesting implications for WH/AI filtering. What is the intent behind a scam, post, or news article?
The Raines’ lawsuit alleges that OpenAI caused Adam’s death by distributing ChatGPT to minors despite knowing it could encourage psychological dependency and suicidal ideation. His parents were the first of five families to file wrongful-death lawsuits against OpenAI in recent months, alleging that the world’s most popular chatbot had encouraged their loved ones to kill themselves. A sixth suit filed this month alleges that ChatGPT led a man to kill his mother before taking his own life.
Tasks
10:00 showing, and a 2:00 showing, which kinda upended the day
Finish script that goes though all the URLs in a file and looks for 404 errors – done. Found one too!
Finish ACM proposal – not done, but closer
Winterize mower – tomorrow?
1:00 ride. Looks less cold. Monday looks nice, then BRRR.
Kick off a (5k book?) run and go for a hike – done and done. Also started another run. I have managed to spend $10!
Tried to load Linux on the dev box, but was thwarted by the inability to boot from the thumb drive. Rather than struggle , I dropped it off with actual professionals. Should be ready in a few days. They were fixing a Kitchen Aid mixer when I arrived. Was not expecting that.
Many human languages have words for emotions such as “anger” and “fear,” yet it is not clear whether these emotions have similar meanings across languages, or why their meanings might vary. We estimate emotion semantics across a sample of 2474 spoken languages using “colexification”—a phenomenon in which languages name semantically related concepts with the same word. Analyses show significant variation in networks of emotion concept colexification, which is predicted by the geographic proximity of language families. We also find evidence of universal structure in emotion colexification networks, with all families differentiating emotions primarily on the basis of hedonic valence and physiological activation. Our findings contribute to debates about universality and diversity in how humans understand and experience emotion.
These people look interesting (Unbreaking). They are documenting the disintegration of USA norms(?) using a timeline of summaries among other things. Once I get the embedding mapping done, it would be a good thing to try to run through the system. One of their founding members wrote this:
All this year, as I have chewed my way along the edges of this almost unfathomable problem, what happened in Valdez came to feel less like a metaphor and more like a model. That’s how I’ll work with it here. Not because the circumstances of megathrust earthquakes in fjords are literally the same as the societal problem of collective derangement, but because the model gives me new ways to take problem apart and see how the pieces interact.
Try another run of 1,000 books and see what breaks. Need to integrate the bad files list too. Edits went in smoothly. 1,000 books are cooking along, and I’m picking up new bad files
Looks like that’s working! Started another 1,000 books
Working on getting the embedding to work in batches – done
There are some files that are too big to send to OpenAI and throw an error. I just keep on going, but I’ll need(?) to revisit these files. Saving them out.
Finally got through the first 1k files. I have spent $2.70 on embeddings. Including all the errors.
We invite submissions exploring how large language models (LLMs) / foundation models (FMs) and game theory can enable strategic, interpretable AI agents for real-world scenarios.
The workshop is seeking submissions of research and industrial papers, including work on modelling, evaluation, algorithmic design, human data collection, and applications in negotiation, coordination, and everyday social intelligence, as well as demonstrations of agents succeeding (or failing) in strategic interactions.
Note: While the primary focus of the workshop is on leveraging LLMs to translate real-world scenarios to rigorous game-theoretic models, we will also consider papers that investigate other creative applications of LLMs to game theory or vice versa.
Tasks
10:00 Showing
Get a list of the txt and csv directories and deleting all the txt items from the list that match with the csv items, then finish the parsing – done
# iterate over the csv files and delete any name in tat list from the text_files list. That way we can pick up when the connection gets interrrupted
tnum = len(txt_files)
cnum = len(csv_files)
print("there are {} text files and {} csv files. After this, there should be {} text files".format(tnum, cnum, tnum-cnum))
csv:str
for csv in csv_files:
s = csv.replace("csv", "txt")
try:
txt_files.remove(s)
except ValueError:
print("{} is not in the text file list????".format(s))
tnum = len(txt_files)
print("Processing {} text files".format(tnum))
Start on embedding. Got all the pieces working. Rather than do one large pkl file, I’m going to do the embeddings on a per-book basis. This will be much more resiliant to interruptions’ and support restarts
Hello web-scrapers for LLM training sets. No much going on today.
Also wow. Just wow: NVISO reports a new development in the Contagious Interview campaign. The threat actors have recently resorted to utilizing legitimate JSON storage services like JSON Keeper, JSONsilo, and npoint.io to host and deliver malware from trojanized code projects, with the lure being a use case or demo project as part of an interview process.
Tasks
11:00 – 11:45 showing – done
RE taxes – done
Work on submission to ACM Interactions magazine – done, though I’m not sure what version I submitted
Winterize the mower? Certainly by the end of next week
Put Linux on the dev box. I’m tired of not being able to do overnight runs.
Work on the regex issue – interesting journey, but working now. Aaaan the servers were shut down early. Now I have to write a piece of code that looks to make sure that I don’t do any redundant conversions
I think it should be just as easy as getting a list of the txt and csv directories and deleting all the txt items from the list that match with the csv items
Working on Gutenberg parser – First pass is done. Some good vibe coding with Gemini Pro. Need to fix a combinatorial regex issue that pops up for Finnish books
3:00 SEG Meeting – done. A bit of a train wreck presenting on our side. Going to slides next time to see if that keeps things on the rails
4:00 MDA Meeting – done. Slow progress, but Dr. J is going to read the proposal
Doublespeed, a startup backed by Andreessen Horowitz (a16z) that uses a phone farm to manage at least hundreds of AI-generated social media accounts and promote products has been hacked. The hack reveals what products the AI-generated accounts are promoting, often without the required disclosure that these are advertisements, and allowed the hacker to take control of more than 1,000 smartphones that power the company. (Via 404 Media)
Tasks
3:15-3:45 showing – I am not sure that they showed
Clean house – done-ish
Groceries – done
Wash car – done
SBIRs
2:00 – 4:00 meeting. Can make the first hour, then call in from the car?
Showing at 3:15. Get the car washed while waiting. Nope, Wednesday now
Hotels! Lyle Lovett and John Hiatt – done
SBIRs
Write a Gutenberg parser that splits the beginning and ends off books and looks for weird formatting (e.g. lists of numbers, a single pix, etc.) The output should go in the ‘processed’ folder. Then use that folder to create csv (pickle?) files of each book with embeddings in them. Nope. Can’t log in because CMMI has killed the minds of IT. It’s a “known problem.” They are “working on it.” Soooooooo angry.
After waiting for 5 hours, I can now log into my laptop. It was easy, but required Hidden Knowledge. Which didn’t need to be Hidden.
Put together the white paper proposal for Neural Network Learning Capacity on Parametric Function
You must be logged in to post a comment.