Need to make a poster and submit by the 28th to the Digital Platforms and Societal Harms event. Probably show the 3 types of attacks (email examples) and mitigation. I could bring a laptop with ContextExplorer too.
Work on MAST whitepaper, then get together with Aaron at 1:00. Made good progress. The goal is to have a first draft by Friday COB
10:00 JSC Data Review. There is a lot. Ron’s going to do some summary statistics.
Maybe more scale paper this evening? Yup, finished Arms Control
MDA meeting from yesterday because Zac is back now. Done. Need to find out from Bob what the best target is.
More scale paper. Got started on the Arms Control section, which is coming along nicely. It seems that arms control is most effective when powers are not in open conflict (e.g. the cold war). Which is mostly the case now, though I wonder how much The Russian-Ukraine war would effect that. I think that there would be more focus on AI-enhanced weapons? Which for an agreement on Societal AI weapons might make things easier.
Need to get some work done on the MAST white paper
GPT Agents
Progress on getting lists of deans and chairs together to ask for participation.
…for 18 different tasks selected to be realistic samples of the kinds of work done at an elite consulting company, consultants using ChatGPT-4 outperformed those who did not, by a lot. On every dimension. Every way we measured performance.
I guess we’ll see what is going on with the server today?
9:00 Standup
GPT IRAD decision?
11:30 CSC
More scale paper. Need to start looking for some pix. Finished the disruption section. I think counterattack is an extension of disruption, and should be written that way. Of course, there’s a lot of groundwork that would have to be done in advance to put all the actors in place. That’s a tricky issue that’s worth discussing.
Tweaked the template for the Dahlgren paper and added some links to examples of prompt engineering to produce JSON files
Add a 0.5 point story for AI ethics
GPT Agents
2:00 UMBC Meeting. Test the new ContextTest and walk through the IRB – done with the later. Need to tweak the former – done
Add education history to work history prompt – done
Add “I assert that I am at least 18 years old” – done
Add recruitment email and screenshots to attachments – done
Change REI to Amazon – done
Draft email for all department chairs that includes an introduction of what the study is and who we are.
Working on venues for the scale paper/book. Need to start filling out the “defense” section. Started. Finished “Detection.” Next is “Disruption.”
Wrote up a short Python script that runs the loops that we think would generate the trajectories that we (think?) we need. I just realized that there needs a “trim” function that removes the beginning and end so we only have computable data
10:00 meeting with Rukan. The machine is hanging on file access because read permissions have been changed
3:00 AI Ethics meeting. Do homework! Done. Shiny, yet bad videos
Registered for the Digital Platforms and Societal Harms event
GPT Agents
Looks like we meet at 2:00 on Thursdays
Got a good start on the IRB! Need some guidance to finish
Our security people have decided that collaborative writing using overleaf is too much of a threat so they will not allow it. On top of all their other policies, I am very close to quitting.
We need another story. In this case, it’s another war room vignette, but this time from the defense’s side. Maybe with M again? Of course, part of this is figuring out what defenses might actually look like. One thing I’d like to re-use in the idea of diverse operator teams looking for misbehaving models. In this case though, the models are trained to be honeypots for attacks maybe? They go along in their day-to-day, sending emails, running dummy companies, having dates, etc. When they start acting too aligned, then it’s time to start looking for trouble. Maybe digital twins of important people?
Had a good chat with Rukan yesterday. What worked with the hdfproc data didn’t work with the new offsets? He’s going to run some tests
I really want to add a new project to the LLM IRAD. Something like NNMap-enabled group support. Need a better name, some slides (mentioning “killer app” and all the possible uses), and a schedule.
Tweaked the Jan6 AI subsection to integrate better into the rest of the section
Need to add a “Detect and Defend” section
Need to add an “AI Arms Control for Societal AI Weapons” section. Show that this is in everyone’s best interests. Authoritarian regimes are potentially at greater risk, particularly for Spanner and Lobotomy attacks.
Yesterday must have been pretty busy. I never made any notes.
I wrote a pitch to RadioLab about doing a story on the “living in a simulation” thing. I also turned that into a blog post
SBIRs
Had a good discussion on SEGs white paper. They are reviewing their changes and will get back to me with their final today. Hopefully.
Got some good stuff done on the Scale paper. More today
GPT Agents
Made a lot of progress here! All the new variables are in. I added some instructions. Still need to have the prompt titles and randomize – DONE! I think it’s ready to try out again, though I need to flip the switch back to GPT-4:
You must be logged in to post a comment.