We attack the state-of-the-art Go-playing AI system, KataGo, by training an adversarial policy that plays against a frozen KataGo victim. Our attack achieves a >99% win-rate against KataGo without search, and a >50% win-rate when KataGo uses enough search to be near-superhuman. To the best of our knowledge, this is the first successful end-to-end attack against a Go AI playing at the level of a top human professional. Notably, the adversary does not win by learning to play Go better than KataGo — in fact, the adversary is easily beaten by human amateurs. Instead, the adversary wins by tricking KataGo into ending the game prematurely at a point that is favorable to the adversary. Our results demonstrate that even professional-level AI systems may harbor surprising failure modes. See this https URL for example games.
9:00 Sprint Review
Used the LMN tools to figure out what to emphasize and find more papers
Figure out some keywords for various groups and start pulling tweets. I think 10k per group a week would be manageable.
Watching Twitter implde. Maybe I should just use the pushshift API?
By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the “program,” optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Experiments on 24 NLP tasks show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 19/24 tasks. We conduct extensive qualitative and quantitative analyses to explore the performance of APE. We show that APE-engineered prompts can be applied to steer models toward truthfulness and/or informativeness, as well as to improve few-shot learning performance by simply prepending them to standard in-context learning prompts. Please check out our webpage at this https URL.
One of the things to add as suggestions is a model-training facility with dedicated staff. The facility exists to train up to very large models that are resilient to attack (think of a GPT-3 ensemble), and staffed with people who study how models fail. The facility also trains faulty models (mode collapse, overfitting, etc) that can be invisibly swapped in for verified (whatever that means) models so that AI pilots can learn to recognize degraded model behavior. Lots of simulators that allow users to be trained in high-stress situations to adapt to failing models.
Since the facility trains many models, it will be possible to train meta models that can understand which hyperparameters and data sets produce effective models, and how to degrade them. This will be extremely valuable as AI/ML continue to move into more roles that were previously occupied by highly trained and/or experienced people.
Find chess paper that shows AI/human tams out-perform AI-only
Women in the US are more likely to be murdered during pregnancy or soon after childbirth than to die from the three leading obstetric causes of maternal mortality (hypertensive disorders, haemorrhage, or sepsis).1 These pregnancy associated homicides are preventable, and most are linked to the lethal combination of intimate partner violence and firearms. Preventing men’s violence towards women, including gun violence, could save the lives of hundreds of women and their unborn children in the US every year
Rolling in more edits
“In the meantime, some great news—the project is now officially approved! I’m now just waiting for a draft of the publishing agreement, so as soon as that’s ready I will send it over.”
Finish db access and build a view to see the text and meta info
Here’s a view that I created to link multiple rows of key/values to a root result. Really proud of this:
create or replace view test_view as
select tt.*, ttd_k.value as keyword, ttd_c.value as created, ttd_l.value as location, ttd_p.value as probability
FROM table_text tt
inner join table_text_data ttd_c on tt.id = ttd_c.text_id and ttd_c.name = 'created'
inner join table_text_data ttd_k on tt.id = ttd_k.text_id and ttd_k.name = 'keyword'
inner join table_text_data ttd_l on tt.id = ttd_l.text_id and ttd_l.name = 'location'
inner join table_text_data ttd_p on tt.id = ttd_p.text_id and ttd_p.name = 'probability';
11:30 Touch point
Work on paper. Reading the DSIAC-BCO-2022-216 report, which is a lot less than I thought it would be considering the page count, but has some good stuff in it.
Add example paragraph to rationale section
Add mute() method and flag – done. Also added the publish
Finish up Money section. Maybe introduce the idea of self-grounded spaces? Either add a section to Belief is a Place, and continue here, or introduce here and continue on with the rest of the book? Not sure
Add method that handles a particular class and produces a tab in a spreadsheet
Add overridable “publish” method in BaseController?
Comment changes in SharedObjects and BaseController
Many recent corporate scandals have been described as resulting from a slippery slope in which a series of small infractions gradually increased over time (e.g., McLean & Elkind, 2003). However, behavioral ethics research has rarely considered how unethical behavior unfolds over time. In this study, we draw on theories of self-regulation to examine whether individuals engage in a slippery slope of increasingly unethical behavior. First, we extend Bandura’s (1991, 1999) social-cognitive theory by demonstrating how the mechanism of moral disengagement can reduce ethicality over a series of gradually increasing indiscretions. Second, we draw from recent research connecting regulatory focus theory and behavioral ethics (Gino & Margolis, 2011) to demonstrate that inducing a prevention focus moderates this mediated relationship by reducing one’s propensity to slide down the slippery slope. We find support for the developed model across 4 multiround studies. (PsycINFO Database Record (c) 2014 APA, all rights reserved).
Working on showing controller commands, states, and responses – done!
Add SharedObject “queries” that create spreadsheets for specified (maybe an array?) types. Got the basics working. Need to break out a method to handle a type and a writer
10:00 Meeting! We are going ahead on the book! Submission of the full book at the beginning of December, and a decision 1-2 weeks after that!
Use the GoogleExplorer as a template for prompt interactions, and try some repeat interactions! Parsed output can easily be rendered as html which should be a nice touch. The text is the main piece, and the meta attributes are <ul> elements
Add param list for GPT generation
Add buttons to save current output to db
Add textarea for description
Progress for today!
Reply to Katy, and try to set up a meeting? Done. 10:00am tomorrow!
Fold in more changes
I realize that fiat money is also a self-grounded belief space. I think that means that money, organized religion, and constitutional governments are all related. They are also distinctly different from ungrounded belief spaces. Added a note to the text