Phil 8.19.21

“Before we say “explainable AI” we must decide WHAT is it that we wish to explain. Are we about to explain the function that the system fitted to the data? or are we about to explain the world behind the data? Science writers seem unaware of the difference.” – Judea Pearl

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 – Introduction

From the syllabus: To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning.

GPT-Agents

Generate synthesized data – running
Calculate sentiment
Create spreadsheets (make a new directory for review-stars)

SBIR(s)

9:15 standup – done
10:30 NASA meeting – done Write up a 3 page version that describes a minimum viable project and then future work that extends the MVP
Ping Zach for a meeting to set up project – done
Start framing out paper on Overleaf – done
EXPENSE REPORT – this is chewing up hours. I STILL don’t have a code that works

JuryRoom

Write some rants for Tamahau

Phil 8.18.21

https://twitter.com/naunihalpublic/status/1427999617539522561

GPT-Agents

Finished creating the 50k, 25k, and 12k models
Uploading to repo – done
Generate synthesized data – running
Calculate sentiment
Create spreadsheets (make a new directory for review-stars)

SBIR(s)

Meeting with Ron at 9:00 – lots of various details about phase 1 and LAIC
Read through and write up paragraphs for NASA – I am becoming confused, but managed to write up an approach on what I think makes sense. Sent it off to John, and we’ll have a meeting about it tomorrow morning
Ping Zach for a meeting to set up project
EXPENSE REPORT – this is chewing up hours. I don’t have a code that works

Phil 8.17.21

I want to write a paper about the one unambiguously good option that AI/ML + simulation provides – problem domain exploration and the industrialization of imagination. The failures in Vietnam, Iraq, and Afghanistan, not to mention 9/11 and Pearl Harbor have all been described as failures of imagination. These failures exist at multiple levels – the tactical (think Jimmy Doolittle), and the strategic (human nature). AI/ML allows us to safely explore these domains before the unimaginable occurs. Because these potentials can be visualized in narratives, it is possible to broadly and compellingly present these possibilities, and increase the effectiveness and resiliency of our choices in combat and combat-adjacent domains.

Enhanced simulation means that ML can explore tactical options
- Deliver the right amount of energy in the right place for the lowest cost
Language model maps means that ML can explore strategic options
- And maybe avoid a fourth Vietnam

labml.ai Annotated PyTorch Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better. We are actively maintaining this repo and adding new implementations.

GPT-Agents

Need to do some preliminary (e.g. stars) evaluations on the synthesized and ground truth data before meeting
3:30 Meeting
- Went over results
- Make a new 50k, 25k, and 12k model and do the same tests
- Sent Shimei a set of CSV files for
On the Opportunities and Risks of Foundation Models
- AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on conventional deep learning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

SBIR(s)

Something something NASA proposal?
Meeting with Rukan
Sprint planning

Phil 8.16.21

Rather than say anything about Afghanistan here, I’d rather urge you to go read Thieves of State, by Sarah Chayes. Or, if you only have a few minutes, this blog post: The Ides of August

SBIR(s)

Sprint demos – done!
Lots more training – done!

GPT_Agents

Generated 10k synthetic reviews and added sentiment. Need to do that for ground truth now
Got that done. Next let’s see how they compare

Phil 8.14.21

MuZero: The Walkthrough (Part 1/3)

Teaching a machine to play games using self-play and deep learning…without telling it the rules

https://twitter.com/DrClaireH/status/1426868059566923776

Phil 8.13.21

This looks super interesting for building domain-specific belief maps:

https://twitter.com/ssgrn/status/1425615542837075968?s=12

Here’s a link to the paper: DEMix Layers: Disentangling Domains for Modular Language Modeling

We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity, increase training efficiency, and enable rapid adaptation with little overhead. We show that mixing experts during inference, using a parameter-free weighted ensemble, allows the model to better generalize to heterogeneous or unseen domains. We also show that experts can be added to iteratively incorporate new domains without forgetting older ones, and that experts can be removed to restrict access to unwanted domains, without additional training. Overall, these results demonstrate benefits of explicitly conditioning on textual domains during language modeling.
Git repo: github.com/kernelmachine/demix

GPT Agents

Get the review extraction working and produce some content. Got everything running and generating 10,000 reviews. We’ll see how the pattern of stars looks first, and then do a sentiment run on the stored data
Export the DB and run sentiment analysis

SBIR(s)

Had a long talk yesterday with Aaron about what to do with MARE. I think it becomes the framework for training and using our enhanced simulation scenario explorer. Basically AlphaZero but for physics-based games like tennis.
Got Andrew to buy off on the LAIC stories and show me how to put them properly(!) in Jira, so I’ll do that today
Endless, mind-numbing training
EXPENSE REPORT

Book

Skipping this week – Michelle has meetings

Phil 8.12.21

Just back from a conference in Huntsville. Lots of very expensive ways to deliver energy to a point in space at a particular time. I need to write up my thoughts in more detail later. Also EXPENSE REPORT!

Announcing AI21 Studio and Jurassic-1 Language Models

We are thrilled to announce the launch of AI21 Studio, our new developer platform where you can use our state-of-the-art Jurassic-1 language models to build your own applications and services. Jurassic-1 models come in two sizes, where the Jumbo version, at 178B parameters, is the largest and most sophisticated language model ever released for general use by developers. AI21 Studio is currently in open beta, allowing anyone to sign up and immediately start querying Jurassic-1 using our API and interactive web environment.

Research community dynamics behind popular AI benchmarks

The widespread use of experimental benchmarks in AI research has created competition and collaboration dynamics that are still poorly understood. Here we provide an innovative methodology to explore these dynamics and analyse the way different entrants in these challenges, from academia to tech giants, behave and react depending on their own or others’ achievements. We perform an analysis of 25 popular benchmarks in AI from Papers With Code, with around 2,000 result entries overall, connected with their underlying research papers. We identify links between researchers and institutions (that is, communities) beyond the standard co-authorship relations, and we explore a series of hypotheses about their behavior as well as some aggregated results in terms of activity, performance jumps and efficiency. We characterize the dynamics of research communities at different levels of abstraction, including organization, affiliation, trajectories, results and activity. We find that hybrid, multi-institution and persevering communities are more likely to improve state-of-the-art performance, which becomes a watershed for many community members. Although the results cannot be extrapolated beyond our selection of popular machine learning benchmarks, the methodology can be extended to other areas of artificial intelligence or robotics, and combined with bibliometric studies.

The Learning on Graphs and Geometry Reading Group

Alpha Zero’s “Alien” Chess Shows the Power, and the Peculiarity, of AI

What’s also remarkable, though, Hassabis explained, is that it sometimes makes seemingly crazy sacrifices, like offering up a bishop and queen to exploit a positional advantage that led to victory. Such sacrifices of high-value pieces are normally rare. In another case the program moved its queen to the corner of the board, a very bizarre trick with a surprising positional value. “It’s like chess from another dimension,” Hassabis said.

SBIR

Standup – done
Respond to Steve – done multiple
Schedule story time with Andrew – done. Now I just need to put them in Jira
Schedule golf with Aaron? Done! Sim first (using MARE and enhanced sim), then prototype, then build a trade show version (indoor so no weather), then try fielding at some willing golf course? Paul could probably help with that

Phil 8.9.21

Nice ride on Saturday. An 18mph average pace and I still got dropped by the lead group! But I did hang on for over 40 miles

Book

Want a \TODO{write something here} that can disappear as needed? Use these two versions of TODO:

%\newcommand\TODO[1]{\textcolor{red}{(TODO: #1)}} % show
\newcommand\TODO[1]{} % hide

SBIRs

Go over stories with Aaron?
MARCOM meeting
Off to the SMD symposium

GPT Agents

Setting up the DB to handle sentiment and PoS – done
Generating and parsing the review/stars model. When there is an exception thrown while debugging, the IDE loses the ability to edit?

Phil 8.7.21

There is a version of DALL-E at huggingface for image to text! (huggingface.co/spaces/flax-community/dalle-mini)

A man in a room:

A woman in a room:

Need to fix my timesheet for Monday

A Network Framework of Cultural History

The emergent processes driving cultural history are a product of complex interactions among large numbers of individuals, determined by difficult-to-quantify historical conditions. To characterize these processes we have reconstructed aggregate intellectual mobility over two millennia through the birth and death locations of more than 150,000 notable individuals. The tools of network and complexity theory were then used to identify characteristic statistical patterns and determine the cultural and historical relevance of deviations. The resulting network of locations provides a macroscopic perspective of cultural history, which helps us to retrace cultural narratives of Europe and North America using large-scale visualization and quantitative dynamical tools and to derive historical trends of cultural centers beyond the scope of specific events or narrow time intervals.

Phil 8.6.21

Had to get my truck serviced yesterday (oil change and recalls) which took a bunch of hours, so I brought my bike and went on a really nice ride on a wonderful day

Speaking of the truck. These folks (103 Creek Ridge Road,Greensboro, North Carolina 27406) will install lift kits from these folks. I could also do wheels and tires. Stay at Haw River State Park?

Book

Put the proposal in the Overleaf folder using proposal.tex as the root document
2:00 Meeting. We are getting very close! I need to make the TODOs vanish

SBIRs

The Delta tix did not save to PDF worth a damn, so I created a new document with screenshots that didn’t suck. Delta is horrible and expensive. I think if I have to go to Huntsville again I’ll try to take the train
Steve has many questions. Did some answering and pointed him at Microsoft Flight Simulator, which is getting more amazing all the time. It’s over 40 years old!

More stupid travel stuff. Clay suggests American for next time

Phil 8.4.2021

Finished Stewardship of global collective behavior. It’s quite good and a nice way to frame all this research

Collective behavior provides a framework for understanding how the actions and properties of groups emerge from the way individuals generate and share information. In humans, information flows were initially shaped by natural selection yet are increasingly structured by emerging communication technologies. Our larger, more complex social networks now transfer high-fidelity information over vast distances at low cost. The digital age and the rise of social media have accelerated changes to our social systems, with poorly understood functional consequences. This gap in our knowledge represents a principal challenge to scientific progress, democracy, and actions to address global crises. We argue that the study of collective behavior must rise to a “crisis discipline” just as medicine, conservation, and climate science have, with a focus on providing actionable insight to policymakers and regulators for the stewardship of social systems.

Put my bids in for ICTAI-2021 reviews

GPT Agents

Building 6-epoch review, stars model – done! Need to verify they work

SBIR

Phase II presubmission meeting (One hr)
Working on Epic/stories for map app
React and Flask
Django
- realpython.com/get-started-with-django
- betterprogramming.pub/prettify-python-django-with-beautiful-visual-charts
Had a good discussion with Zack, who suggested a stack of Svelt, Sapper, Python and MySQL all connected with ZeroMQ

Phil 8.3.21

Examining the consumption of radical content on YouTube

Daily share of news consumption on YouTube, a social media platform with more than 2 billion monthly users, has increased in the last few years. Constructing a large dataset of users’ trajectories across the full political spectrum during 2016–2019, we identify several distinct communities of news consumers, including “far-right” and “anti-woke.” Far right is small and not increasing in size over the observation period, while anti-woke is growing, and both grow in consumption per user. We find little evidence that the YouTube recommendation algorithm is driving attention to this content. Our results indicate that trends in video-based political news consumption are determined by a complicated combination of user preferences, platform features, and the supply-and-demand dynamics of the broader web.

GPT Agents

I now have 3 and 6 epoch runs for name, review, stars models.
Evaluate stars to see how much has changed
Maybe try to train up a bigger model? Start with the xl model and step back to find the largest model that will fit. Then train that with the name, review, stars corpora
- Nope, the 117m model is the biggest that will fit. When I’ve got the time try the Huggingface Course and see how to do cloud training
3:00 Meeting. Went over results and the mapping tool proposal
- Need to adjust the counts to relative percent for easier compare
- Try training a model from scratch on the stars/votes corpora? Thaty way we could see if it learns the ratios better. This could be an artifact from finetuning
- Create models for review+star since the name sets up the review

SBIRs

Sprint planning
- Plan LM Epic – DSR-646
- SMD conference – DSR-645
Long-ish chat with Rukan about transforms in scene graphs

Phil 8.2.2021

Set up oil change and recall service

GPT Agents

Running the ensemble of 3-epoch models to see how much variation there is
Create review corpora
Train model(s?). Start with the default 3 epoch since that seems to work well

SBIR

Sprint demos
Meeting with Steve and Rukan about next steps
Last-second changes about the SMD trip. I swear that I am never going to speak at a conference that doesn’t have a clear and easy to find schedule
Need to come up with stories

Phil 7.31.21

I had some perplexing runs trying different epoch counts on corpora recently. Here’s the 6k:

As you can see, only the 32 epoch can figure out there are no 0 star ratings, though the best overall match is the 16 epoch.

The 100k is weirder, and makes more sense when you look at the raw data:

As you can see, the 16 and 32 epoch miss the two star rating entirely, though aside(!) from that the fit isn’t bad:

Looking at this, I’d say that 2-4 epochs seem to work well, and when I take out the explicit epoch, run_clm determines the epochs should be 3:

[INFO|trainer.py:1164] 2021-07-31 06:31:02,447 >> ***** Running training *****
[INFO|trainer.py:1165] 2021-07-31 06:31:02,447 >>   Num examples = 1649
[INFO|trainer.py:1166] 2021-07-31 06:31:02,448 >>   Num Epochs = 3
[INFO|trainer.py:1167] 2021-07-31 06:31:02,448 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1168] 2021-07-31 06:31:02,449 >>   Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:1169] 2021-07-31 06:31:02,449 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1170] 2021-07-31 06:31:02,449 >>   Total optimization steps = 4947
{'loss': 0.1737, 'learning_rate': 4.494643218111988e-05, 'epoch': 0.3}

I’m going to try an ensemble of these 3-epoch models to see what that looks like

Phil 7.30.21

https://twitter.com/maithra_raghu/status/1420804708629766146

Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization

The successes of deep learning critically rely on the ability of neural networks to output meaningful predictions on unseen data — generalization. Yet despite its criticality, there remain fundamental open questions on how neural networks generalize. How much do neural networks rely on memorization — seeing highly similar training examples — and how much are they capable of human-intelligence styled reasoning — identifying abstract rules underlying the data? In this paper we introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization. While PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty, they all have a simple underlying rule. One part of the PVR task input acts as a pointer, giving the location of a different part of the input, which forms the value (and output). We demonstrate that this task structure provides a rich testbed for understanding generalization, with our empirical study showing large variations in neural network performance based on dataset size, task complexity and model architecture. The interaction of position, values and the pointer rule also allow the development of nuanced tests of generalization, by introducing distribution shift and increasing functional complexity. These reveal both subtle failures and surprising successes, suggesting many promising directions of exploration on this benchmark.

https://twitter.com/stefan_fee/status/1421081942792015880

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x’ that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website this http URL including constantly-updated survey, and paperlist.

GPT Agents

This looks interesting for paragraph clustering: Sentence Transformers in the Hugging Face Hub
- Sentence Transformers is a framework for sentence, paragraph and image embeddings. This allows to derive semantically meaningful embeddings (1) which is useful for applications such as semantic search or multi-lingual zero shot classification. As part of Sentence Transformers v2 release, there are a lot of cool new features:
  - Sharing your models in the Hub easily.
  - Widgets and Inference API for sentence embeddings and sentence similarity.
  - Better sentence-embeddings models available (benchmark and models in the Hub).
Finished the 6k epoch tests yesterday. Maybe finish creating models for 100k today?

SBIR

Abstract! Done!
See how Rukan is doing. Tell him about cubes and other shapes – done. Looks good

Book

Meeting with Michelle at 2:00. Worked on the positioning statement. It’s almost ready to go to an agent!

4:00 NLP Meeting – cancelled

viztales

Dimension reduction, State, Orientation, and Speed

Phil 8.19.21

Phil 8.18.21

Phil 8.17.21

Phil 8.16.21

Phil 8.14.21

Phil 8.13.21

Phil 8.12.21

Phil 8.9.21

Phil 8.7.21

Phil 8.6.21

Phil 8.4.2021

Phil 8.3.21

Phil 8.2.2021

Phil 7.31.21

Phil 7.30.21