OpenAi has been busy. First, they have some tutorials about interfacing with document collections using embeddings. Looks like a simpler version of GPT-Index
Second, they wrote up a report on using LLMs for misinformation and what to do about that:
Generative language models have improved drastically, and can now produce realistic text outputs that are difficult to distinguish from human-written content. For malicious actors, these language models bring the promise of automating the creation of convincing and misleading text for use in influence operations. This report assesses how language models might change influence operations in the future, and what steps can be taken to mitigate this threat. We lay out possible changes to the actors, behaviors, and content of online influence operations, and provide a framework for stages of the language model-to-influence operations pipeline that mitigations could target (model construction, model access, content dissemination, and belief formation). While no reasonable mitigation can be expected to fully prevent the threat of AI-enabled influence operations, a combination of multiple mitigations may make an important difference.
At The Markup we pioneered an array of scientifically inspired methods that used automation and computational power to supercharge our journalism. Reflecting on our work, I came up with 10 of the most important lessons I’ve learned using this approach.
Book
Proofing chapters. Finished up to chapter 10. Minor tweaks
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.
And I found the Trump campaign trip I’ve been looking for!
SBIRs
Finished the second draft! Need to send it out for some external sanity check. The SLT would like to see it too.
9:15 standup – done
11:30 CSC touch point
2:00 MORS meeting with Aaron – done! Sent off to SLT
This is true! I’ve put together a spreadsheet so you can see for yourself
SBIRs
More FOM stuff. Maybe a meeting at 2:00?
MORS paper with Aaron. Nope, but did finish the second draft.
GPT Agents
4:00 Meeting
Went on a bit of a tangent discussing Bostrom’s paperclip conjecture and how recommender algorithms could be that, but from a human/ai source, not agi. The problem is at the scales that these systems might have effects at, it is not clear what the objective function means, and if we are, in fact destroying the world by creating an algorithm that seeks to optimize for one thing, but does so in ways that are ultimately destructive to humans. Venue could be the 5th AAAI/ACM Conference on AI, Ethics, and Society Papers are due on March 5.
Got the SkyLatex link for review. Need to be done by Feb 10
SBIRs
9:00 Sprint planning. Maybe get a chance to start with GPT-index? Done
Continue with second draft
Pre meeting with Aaron and Rukan
Meeting with Loren. Since the goal is for each ship to “imagine” what the other ships are seeing and their FOM predictions, we need a way to have a way of easily positioning the ship position with respect to the threat in some shared, generalizable frame. And wrt the HGV, propagation seems… hard. Does it make more sense to simply simulate if an interception occurs at any time in the (recorded) flight paths? Then we can train the models on that.
GPT Agents
Get embedding to work – done! Now I need to reduce and cluster
ChatGPT appears to be back! Tried asking it was to market the book. It came back with some good suggestions.
GPT Agents
Continue working on automation
Start on embedding?
I think a larger test of shorter responses may be good for a proof-of-concept. About 64-128 tokens may be the sweet spot
SBIRs
9:00 Sprint demos
2:00 MDA weekly meeting
Book
Based on the ChatGPT suggestions, I’m going to reach out to Wayne to see if he can connect me with a good reviewer. Maybe Ben Shneiderman? I could also try Roger, since he and his wife know everyone at UMD
Loading and saving of parameter files, so that it’s easy to store and try things from one session to the next
Setting up automation
Need to strip things like multiple CRs and return them as a single. I’m getting one extra response. Fixed. I was calling the GPT twice. So it wasn’t one extra, it was double
One of Stacy’s friends had the really good ide about image and text generators could be a real boon for self held vision boards and planning. There has been a lot written about this, so they should be very good at handling the basic needs of folks
There is a pressing need to understand belief in false conspiracies. Past work has focused on the needs and motivations of conspiracy believers, as well as the role of overreliance on intuition. Here, we propose an alternative driver of belief in conspiracies: overconfidence. Across eight studies with 4,181 U.S. adults, conspiracy believers not only relied more intuition, but also overestimated their performance on numeracy and perception tests (i.e. were overconfident in their own abilities). This relationship with overconfidence was robust to controlling for analytic thinking, need for uniqueness, and narcissism, and was strongest for the most fringe conspiracies. We also found that conspiracy believers – particularly overconfident ones – massively overestimated (>4x) how much others agree with them: Although conspiracy beliefs were in the majority in only 12% of 150 conspiracies across three studies, conspiracy believers thought themselves to be in the majority 93% of the time.
I think this could have an effect on stampede behavior more broadly. Something to the effect that when rulers (people with dominant power over others) are overconfident, they can more easily head in the direction of social realities (e.g. conspiracy theories, but also that VW could get away with cheating on emissions, or that the USA would not fail in Afghanistan).
Overconfidence is a sort of dimension reduction since there is no need to look for complicated, nuanced positions. The most emotionally attractive answer is selected for and concentrates the overconfident.
An implication for diversity injection is that the “landing page” for diversity has to be simple and emotionally attractive.
Wire up the loading of generator and embedding params. Maybe while I’m at it, read in a file with prompts and params? Done!
Had a thought that rather than clustering, I could just work on distances and the number of connections at that distance. Too many connections is a node like “the”, and nodes with only two connections (the predecessor and successor in the narrative) may not be that interesting and could be discarded. Something to think about.
We’ve put together a notebook on GitHub to help you learn how to create embeddings with the Cohere API and then leverage the Vertex AI Matching Engine to create and query an index. The notebook includes code samples and step-by-step instructions for using the Cohere Embed endpoint to quickly capture semantic information about input data, and then applying the Vertex AI Matching Engine’s Approximate Nearest Neighbor (ANN) service to find similar texts.
Need to create a table for generator params and embedding params – Done! Hooked them up to the table_run and am saving them out on a per run basis. Next is to load them in
SBIRs
Did a first pass at the ChatGPT slide deck
Need to check out Lauren’s slides. Done – looks good. Still waiting for HGV video
9:15 standup
Do a full read through and tweak of paper, then ping Angela and Paul about if they would be interested in reading. Pinged, but only got to page 10 or so. A good deal of rewriting and cleanup.
Help Aaron with CONOPS paper? Done
Finished up the slide deck and integrated Loren’s content.
Introduction to pynytimes – The New York Times is one of the most trusted news source around the world. All their article metadata is easily available using their API, which is publicly available to everyone (though only for non-commercial use). All this data can be queried using a REST API, however setting it up can be quite time-consuming. This library solves that problem, now you can easily and quickly query the API without having to worry about the specific implementation.
A few weeks ago, ChatGPT emerged and launched the public discourse into a set of obscure acronyms: RLHF, SFT, IFT, CoT, and more, all attributed to the success of ChatGPT. What are these obscure acronyms and why are they so important? We surveyed all the important papers on these topics to categorize these works, summarize takeaways from what has been done, and share what remains to be shown.
SBIRs
9:15 Standup
10:00 Q3 Slides meeting with Loren
1:00 Bi-weekly
Get any responses back on paper (HA!) and get ready to send out
GPT Agents
Set up schema. I’m thinking four tables: 1) Experiment (name, date, user, run number), 2) Experiment params 3) Text (text, embedding, projection, cluster ID) 4) Cluster (experiment, cluster_number, cluster_name, include/exclude) – done
Add automation fields and buttons – done
For development, load result text automatically – done
Hooking up DB to App. Got a lot done. Experiments, runs, and text are stored using test data.
Today’s large language models (LLMs) routinely generate coherent, grammatical and seemingly meaningful paragraphs of text. This achievement has led to speculation that these networks are — or will soon become — “thinking machines”, capable of performing tasks that require abstract knowledge and reasoning. Here, we review the capabilities of LLMs by considering their performance on two different aspects of language use: ‘formal linguistic competence’, which includes knowledge of rules and patterns of a given language, and ‘functional linguistic competence’, a host of cognitive abilities required for language understanding and use in the real world. Drawing on evidence from cognitive neuroscience, we show that formal competence in humans relies on specialized language processing mechanisms, whereas functional competence recruits multiple extralinguistic capacities that comprise human thought, such as formal reasoning, world knowledge, situation modeling, and social cognition. In line with this distinction, LLMs show impressive (although imperfect) performance on tasks requiring formal linguistic competence, but fail on many tests requiring functional competence. Based on this evidence, we argue that (1) contemporary LLMs should be taken seriously as models of formal linguistic skills; (2) models that master real-life language use would need to incorporate or develop not only a core language module, but also multiple non-language-specific cognitive capacities required for modeling thought. Overall, a distinction between formal and functional linguistic competence helps clarify the discourse surrounding LLMs’ potential and provides a path toward building models that understand and use language in human-like ways.
Starting to read the documentation for GPT Index. It looks very thorough and capable. I need to get a charge number so I can dig into it and get paid.
SBIRs
Working on the slide deck
Contract stuff
GPT Agents
Got the parsing done. Need to work on saving them to the deb and getting the embeddings. Also, I’ll need to set up a looping system that runs the prompt a specific number of times and does the parsing and storing. Something like “automate” with a field for how many times.
It uses GPT Index, a project consisting of a set of data structures designed to make it easier to use large external knowledge bases with LLMs. Lots of documentation here.
I realize that this approach could replace the finetuning of the GPT-2 models. This makes it extremely general. There could be a book agent, Twitter agent, or even a SBIR agent.
And because it is finding text “chunks” based on embedding similarity to the prompt, it can point back to the sources. That makes research into a subject where the corpora exists much better.
Going to dig into this some more.
Elicit.org is also doing something like this. Here’s the thread:
You must be logged in to post a comment.