Monthly Archives: August 2021

Phil 8.31.21

So we’re officially done in Afghanistan now? One of these years, I’m going to try to figure out what the response to 9/11 cost, what the expectations were, and what actually happened


  • Working with Zach on the webapp. We may be able to do all this with websockets and no server
  • Sprint planning – done
  • Starting on websockets. Installed websockets. I installed asyncio, but it’s part of Python. That’s nice! Uninstalled and everything still works
  • The hello world works!
  • Took a detour down SSL and got stuck on cert format issues? Look at that later
  • Sending data to the browser:

That works too!


  • Still cranking on generating reviews with the untrained model
  • 3:00 Meeting. Made a bet with Shimei that the 800k chess model has forgotten that the Queen could drink tea. We’ll see if we can prompt the model to talk about something other than chess next week

Phil 8.30.21

If you want to summarize your research in a sentence… have an AI do it. SciTLDR sums up papers given an abstract, intro & conclusion. And it works impressively well: (Via Twitter)

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

  • Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.


  • Got the client communicating with the server using Websockets and the server relaying those messages to RabbitMQ!
  • Sprint Demos and story writing today
  • Starting to look at Docker for this effort

GPT Agents

  • Finish 1-5 star parser and start run on GPT-large, then GPT. Curious what we’ll get
    • Verified that everything seems to be working on a small run. Lots of parsing to get star values
    • Tring a full-sized run of 100 batches of 10 experiments with 10 return sequences
  • OpenAI: The fine-tuning endpoint is now ready, and we’re excited to share it with you! Here’s how to get started: link

Phil 8.28.2021

ETA Prediction with Graph Neural Networks in Google Maps

  • Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike. Further, such a task requires accounting for complex spatiotemporal interactions (modelling both the topological properties of the road network and anticipating events — such as rush hours — that may occur in the future). Hence, it is an ideal target for graph representation learning at scale. Here we present a graph neural network estimator for estimated time of arrival (ETA) which we have deployed in production at Google Maps. While our main architecture consists of standard GNN building blocks, we further detail the usage of training schedule methods such as MetaGradients in order to make our model robust and production-ready. We also provide prescriptive studies: ablating on various architectural decisions and training regimes, and qualitative analyses on real-world situations where our model provides a competitive edge. Our GNN proved powerful when deployed, significantly reducing negative ETA outcomes in several regions compared to the previous production baseline (40+% in cities like Sydney).
  • I think that the GNNs should be usable to produce the maps themselves. Need to try this with simulation

Created a folder for Graph Neural Network research

Ride down to DC today for this and hopefully not get wet!

Phil 8.27.21

Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning

  • Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at this https URL and isaac gym can be downloaded at this https URL.


  • 1:00 meeting with Rukan
  • Write some on the paper
  • Do slides for demos
    • Add ‘assist Steve’ story
  • Update repo and switch to dev. Verify that everything still works – it does! And receives messages as well. Oddly it seems to b e splitting the messages between the Python and TypeScript listeners:
SveltKit console logs are black and Python is blue

GPT Agents

  • Make some spreadsheets that compare the stars/sentiment properties of the relative models. Done. The models are remarkably stable, even down to 3k. They make more mistakes with the specific meta training but that seems to be about it?
  • Trying to generate reviews from the untrained gpt2 models. The 117M model was (probably?) too small, so I’m trying the 774M model without finetuning. It requires two passes – the first creates the review (using a bigger prompt), and then I use the result and tack on “{}. I give it a star rating of“. Then I need to parse the ratings, which can be numbers or strings. I’ve kind of run out of energy so I’ll finish later.
  • Start trying to figure out a posterior test?

Phil 8.26.21


GPT Agents

  • Brought all the model outputs from LIWC (manual here) and put them into a single spreadsheet. All the models are surprisingly stable, except for word count (WC):
Largest variation from max to min
  • Here are some values beyond WC:
    • Analytical thinking — a high number reflects formal, logical, and hierarchical thinking; lower numbers reflect more informal, personal, here-and-now, and narrative thinking
    • Clout — a high number suggests that the author is speaking from the perspective of high expertise and is confident; low Clout numbers suggest a more tentative, humble, even anxious style.
    • Authentic — higher numbers are associated with a more honest, personal, and disclosing text; lower numbers suggest a more guarded, distanced form of discourse.
    • Emotional tone — a high number is associated with a more positive, upbeat style; a low number reveals greater anxiety, sadness, or hostility. A number around 50 suggests either a lack of emotionality or different levels of ambivalence

Phil 8.25.21

GPT Agents

  • Build a spreadsheet (and template?) for the LWIWC data


  • 7:30 meeting with Zach. Good progress. We’re almost using RabbitMQ to talk between server-side TypeScript and server-side Python. And we made a cool diagram


  • 7:00 meeting

Phil 8.24.21

Learning to predict the cosmological structure formation

  • Matter evolved under the influence of gravity from minuscule density fluctuations. Nonperturbative structure formed hierarchically over all scales and developed non-Gaussian features in the Universe, known as the cosmic web. To fully understand the structure formation of the Universe is one of the holy grails of modern astrophysics. Astrophysicists survey large volumes of the Universe and use a large ensemble of computer simulations to compare with the observed data to extract the full information of our own Universe. However, to evolve billions of particles over billions of years, even with the simplest physics, is a daunting task. We build a deep neural network, the Deep Density Displacement Model (D3M3), which learns from a set of prerun numerical simulations, to predict the nonlinear large-scale structure of the Universe with the Zel’dovich Approximation (ZA), an analytical approximation based on perturbation theory, as the input. Our extensive analysis demonstrates that D3MD3M outperforms the second-order perturbation theory (2LPT), the commonly used fast-approximate simulation method, in predicting cosmic structure in the nonlinear regime. We also show that D3MD3M is able to accurately extrapolate far beyond its training data and predict structure formation for significantly different cosmological parameters. Our study proves that deep learning is a practical and accurate alternative to approximate 3D simulations of the gravitational structure formation of the Universe.


  • Generating content for the small-corpora models. 6k is done, working on 3k done
  • Generated sentiment
  • Do this to speed up the load of a mysql database (via stackoverflow)
mysql> use db_name;

mysql> SET autocommit=0 ; source the_sql_file.sql ; COMMIT ;
  • 3:00 Meeting
  • – set up a paper repo in Overleaf and start to rough out
  • Need to get the spreadsheets built for the 3k and 6k models
  • Build a spreadsheet (and template?) for the LWIWC data
  • Sent Shimei reviews from the 50k, 25k, 12k, 6k, and 3k models
  • One of the really observable results is that the model tends to amplify the number of items that exist in larger quantities in the training corpora and reduce the number of items that are less common in the corpora. However, the tokens within a review seem to be unchanged. The average number of stars associated with a POSITIVE or NEGATIVE review seem very resilient.


  • Writing the consumer
  • That’s working too!
  • Seems plenty speedy when batched up, too
  • 9:15 standup
  • 1:00 Meeting about the sim for ARL. Going to talk about missile command, where the physics are simple, but the tactics are difficult.


  • Clean up chapter thumbnails. Done!

Phil 8.23.21


  • Was getting started with Zach and then lost the power from about 8:30 to 1:30
  • Looking into RabbitMQ
  • Finishing up the NASA initial writeup

GPT Agents

  • Based on the good results, trying a 6k and 3k models just to see how small we can get
  • Trained up in less than 30 minutes! Generating content now

Phil 8.20.21

Need to look at this article in Science that does some multidimensional similarity mapping between COVID-19 variants.

  • Derek Smith, an evolutionary biologist at the University of Cambridge, has worked for decades on visualizing immune evasion in the influenza virus in so-called antigenic maps. The farther apart two variants are on Smith’s maps, the less well antibodies against one virus protect against the other. In a recently published preprint, Smith’s group, together with David Montefiori’s group at Duke University, has applied the approach to mapping the most important variants of SARS-CoV-2

The Geometry of Shape Space: Application to Influenza

  • Shape space was proposed over 20 years ago as a conceptual formalism in which to represent antibody/antigen binding. It has since played a key role in computational immunology. Antigens and antibodies are considered to be points in an abstract “shape space”, where coordinates of points in this space represent generalized physico-chemical properties associated with various (unspecified) physical properties related to binding, such as geometric shape, hydrophobicity, charge, etc. Distances in shape space between points representing antibodies and (the shape complement) of antigens are assumed to be related to their affinity, with small distances corresponding to high affinity.
  • In this paper, we provide algorithms, related to metric and ordinal multidimensional scaling algorithms first developed in the mathematical psychology literature, which construct explicit, quantitative coordinates for points in shape space given experimental data such as hemagglutination inhibition assays, or other general affinity assays. Previously, such coordinates had been conceptual constructs and totally implicit. The dimension of shape space deduced from hemagglutination inhibition assays for influenza is low, approximately five dimensional.
  • The deduction of the explicit geometry of shape space given experimental affinity data provides new ways to quantify the similarity of antibodies to antibodies, antigens to antigens, and the affinity of antigens to antibodies. This has potential utility in, e.g. strain selection decisions for annual influenza vaccines, among other applications. The analysis techniques presented here are not restricted to the analysis of antibody–antigen interactions and are generally applicable to affinity data resulting from binding assays.


  • Meeting with Zach on the Webapp Framework. Made a lot of progress, though I only kind of know what’s going on. We were able to access MySQL on the server and add a D3 chart:
Behold! SvelteKit with D3 and MySql!
  • Working on the NASA proposal

GPT Agents

  • Make spreadsheets for other models and compare to 100k

Phil 8.19.21

“Before we say “explainable AI” we must decide WHAT is it that we wish to explain. Are we about to explain the function that the system fitted to the data? or are we about to explain the world behind the data? Science writers seem unaware of the difference.”Judea Pearl

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 – Introduction

  • From the syllabus: To realize the dreams and impact of AI requires autonomous systems that learn to make good decisions. Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including generalization and exploration. Through a combination of lectures, and written and coding assignments, students will become well versed in key ideas and techniques for RL. Assignments will include the basics of reinforcement learning as well as deep reinforcement learning — an extremely promising new area that combines deep learning techniques with reinforcement learning.


  • Generate synthesized data – running
  • Calculate sentiment
  • Create spreadsheets (make a new directory for review-stars)


  • 9:15 standup – done
  • 10:30 NASA meeting – done Write up a 3 page version that describes a minimum viable project and then future work that extends the MVP
  • Ping Zach for a meeting to set up project – done
  • Start framing out paper on Overleaf – done
  • EXPENSE REPORT – this is chewing up hours. I STILL don’t have a code that works


  • Write some rants for Tamahau

Phil 8.18.21


  • Finished creating the 50k, 25k, and 12k models
  • Uploading to repo – done
  • Generate synthesized data – running
  • Calculate sentiment
  • Create spreadsheets (make a new directory for review-stars)


  • Meeting with Ron at 9:00 – lots of various details about phase 1 and LAIC
  • Read through and write up paragraphs for NASA – I am becoming confused, but managed to write up an approach on what I think makes sense. Sent it off to John, and we’ll have a meeting about it tomorrow morning
  • Ping Zach for a meeting to set up project
  • EXPENSE REPORT – this is chewing up hours. I don’t have a code that works

Phil 8.17.21

I want to write a paper about the one unambiguously good option that AI/ML + simulation provides – problem domain exploration and the industrialization of imagination. The failures in Vietnam, Iraq, and Afghanistan, not to mention 9/11 and Pearl Harbor have all been described as failures of imagination. These failures exist at multiple levels – the tactical (think Jimmy Doolittle), and the strategic (human nature). AI/ML allows us to safely explore these domains before the unimaginable occurs. Because these potentials can be visualized in narratives, it is possible to broadly and compellingly present these possibilities, and increase the effectiveness and resiliency of our choices in combat and combat-adjacent domains.

  • Enhanced simulation means that ML can explore tactical options
    • Deliver the right amount of energy in the right place for the lowest cost
  • Language model maps means that ML can explore strategic options
    • And maybe avoid a fourth Vietnam Annotated PyTorch Paper Implementations

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations, and the website renders these as side-by-side formatted notes. We believe these would help you understand these algorithms better. We are actively maintaining this repo and adding new implementations.


  • Need to do some preliminary (e.g. stars) evaluations on the synthesized and ground truth data before meeting
  • 3:30 Meeting
    • Went over results
    • Make a new 50k, 25k, and 12k model and do the same tests
    • Sent Shimei a set of CSV files for
  • On the Opportunities and Risks of Foundation Models
    • AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on conventional deep learning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.


  • Something something NASA proposal?
  • Meeting with Rukan
  • Sprint planning

Phil 8.16.21

Rather than say anything about Afghanistan here, I’d rather urge you to go read Thieves of State, by Sarah Chayes. Or, if you only have a few minutes, this blog post: The Ides of August


  • Sprint demos – done!
  • Lots more training – done!


  • Generated 10k synthetic reviews and added sentiment. Need to do that for ground truth now
  • Got that done. Next let’s see how they compare

Phil 8.13.21

This looks super interesting for building domain-specific belief maps:

Here’s a link to the paper: DEMix Layers: Disentangling Domains for Modular Language Modeling

  • We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text. A DEMix layer is a collection of expert feedforward networks, each specialized to a domain, that makes the LM modular: experts can be mixed, added or removed after initial training. Extensive experiments with autoregressive transformer LMs (up to 1.3B parameters) show that DEMix layers reduce test-time perplexity, increase training efficiency, and enable rapid adaptation with little overhead. We show that mixing experts during inference, using a parameter-free weighted ensemble, allows the model to better generalize to heterogeneous or unseen domains. We also show that experts can be added to iteratively incorporate new domains without forgetting older ones, and that experts can be removed to restrict access to unwanted domains, without additional training. Overall, these results demonstrate benefits of explicitly conditioning on textual domains during language modeling.
  • Git repo:

GPT Agents

  • Get the review extraction working and produce some content. Got everything running and generating 10,000 reviews. We’ll see how the pattern of stars looks first, and then do a sentiment run on the stored data
  • Export the DB and run sentiment analysis


  • Had a long talk yesterday with Aaron about what to do with MARE. I think it becomes the framework for training and using our enhanced simulation scenario explorer. Basically AlphaZero but for physics-based games like tennis.
  • Got Andrew to buy off on the LAIC stories and show me how to put them properly(!) in Jira, so I’ll do that today
  • Endless, mind-numbing training


  • Skipping this week – Michelle has meetings