Phil 3.10.19

Learning to Speak and Act in a Fantasy Text Adventure Game

  • We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

New run in the dungeon. Exciting!

Finished my pass through Antonio’s paper

Zoe Keating (May 1) or Imogen Heap (May 3)?

Phil 3.9.19

Understanding China’s AI Strategy

  • In my interactions with Chinese government officials, they demonstrated remarkably keen understanding of the issues surrounding AI and international security. It is clear that China’s government views AI as a high strategic priority and is devoting the required resources to cultivate AI expertise and strategic thinking among its national security community. This includes knowledge of U.S. AI policy discussions. I believe it is vital that the U.S. policymaking community similarly prioritize cultivating expertise and understanding of AI developments in China.

Russian Trolls Shift Strategy to Disrupt U.S. Election in 2020

  • Russian internet trolls appear to be shifting strategy in their efforts to disrupt the 2020 U.S. elections, promoting politically divisive messages through phony social media accounts instead of creating propaganda themselves, cybersecurity experts say.

Backup phone

Work on SASO paper – started

Rachel’s dungeon run is tomorrow! Maybe cross 10,000 posts?

Look at using BERT and the full Word2Vec model for analyzing posts

The Promise of Hierarchical Reinforcement Learning

  • To really understand the need for a hierarchical structure in the learning algorithm and in order to make the bridge between RL and HRL, we need to remember what we are trying to solve: MDPs. HRL methods learn a policy made up of multiple layers, each of which is responsible for control at a different level of temporal abstraction. Indeed, the key innovation of the HRL is to extend the set of available actions so that the agent can now choose to perform not only elementary actions, but also macro-actions, i.e. sequences of lower-level actions. Hence, with actions that are extended over time, we must take into account the time elapsed between decision-making moments. Luckily, MDP planning and learning algorithms can easily be extended to accommodate HRL.

Phil 3.7.19

Day 2 of the TF Dev summit. Worth the money, though much less research-y and more implementation and production-y

Google Cloud has Fedramp certification, which it does see details here.

Live Transcribe

Coral: On Device Transfer learning (paper)

TF 2.0 API \changes and Behavior changes

  • Best practices (link: )
  • Declare variables at the beginning of the code
  • Keras Functional API
    • The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
  • Autograd can automatically differentiate native Python and Numpy code. It can handle a large subset of Python’s features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation), which means it can efficiently take gradients of scalar-valued functions with respect to array-valued arguments, as well as forward-mode differentiation, and the two can be composed arbitrarily. The main intended application of Autograd is gradient-based optimization. For more information, check out the tutorial and the examples directory.
  • JAX is Autograd and XLA, brought together for high-performance machine learning research. With its updated version of Autograd, JAX can automatically differentiate native Python and NumPy functions. It can differentiate through loops, branches, recursion, and closures, and it can take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation) via grad as well as forward-mode differentiation, and the two can be composed arbitrarily to any order.
  • Effective TF 2.0: There are multiple changes in TensorFlow 2.0 to make TensorFlow users more productive. TensorFlow 2.0 removes redundant APIs, makes APIs more consistent (Unified RNNsUnified Optimizers), and better integrates with the Python runtime with Eager execution.

Phil 3.6.19

5:00 – ASRC TL

  • Got a lot done on the BAA on the flight yesterday
  • Wrote up a description of LMN and CM for Eric V.
  • Reading more of the Handbook of Latent Semantic Analysis. It’s giving me some good ideas for calculating similarities of posts using Word2Vec and comparing the average vector for each post
  • Antonio got an extension to the 12th. Need to see what he’s up to. Wow, there’s a lot there now. Made some comments about what I’d like to see. I’ll pull down the document to read later
  • Continued to tweak the slides
  • TF Dev conference main sessions today. Breakouts tomorrow.

Phil 3.4.19

7:00 – 5:00 ASRC

  • Build an interactive SequenceAnalyzer. The adjustments are
    • Number of buckets
    • Percentages for each analytic (percentages to keep/discard
    • Selectable skip words that can be added to a list (in the db?)
  • Algorithm
    1. Find the most common words across all groups, these are skip_words
    2. Find the most common words along the entire series of posts per player and eliminate them
    3. Find the most common/central words across all sequences and keep those as belief places
    4. For each sequence by group, find the most common/central words after the belief places. These are the belief spaces.
    5. Build an adjacency matrix of players, groups, places and spaces
    6. Build submatrices for centrality calculations? This could be rather than finding the most common
    7. Possible word2vec variations?
      1. It seems to me that I might be able to use direction cosines and dynamic time warping to calculate the similarity of posts and align them better than the overall scaling that I’m doing now. DM posts introducing a room should align perfectly, and then other scaling could happen between those areas of greatest alignment
  • Display
    • Menu:
      • Save spreadsheet (includes config, included words, posts(?), trajectories)
      • load data
      • select database
      • select group within db
      • load/save config file
      • clear all
    • Fields
      • percent for A1, A2, A3, A4
      • Centrality/Sum switch
      • BOW/TF-IDF switch
      • Word2vec switch?
    • Textarea (areas? tabbed?)
      • Table with rows as sequence step. Columns are grouped by places, spaces, groups, and players
    • Work on Antonio’s paper got a first draft on introduction and motivation
    • BAA
      • Upload latex and references to laptop
    • Haircut! Pack!
    • Model-Based Reinforcement Learning for Atari
      • Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction — substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with orders of magnitude fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a range of Atari games and achieve competitive results with only 100K interactions between the agent and the environment (400K frames), which corresponds to about two hours of real-time play.


Phil 3.3.19

Once more, icky weather makes me productive

  • Ingested all the runs into the db. We are at 7,246 posts
  • Reworking the 5 bucket analysis
  • Building better ignore files and rebuilding bucket spreadsheets. It tuns out that for tymora1, names took up 25% of the BOW, so I increased the fraction saved to the trimmed spreadsheets to 50%
  • Building bucket spreadsheets and saving the centrality vector
  • Here’s what I’ve got so far: ThreeRuns
  • Trajectories: Trajectories
  • First map: firstMap
  • Here it is annotated: firstMapAnnotated
  • Some thoughts. I think this is still “zoomed out” too far. Changing the granularity should help some. I need to automate some of my tools though. The other issue is how I’m assembling my sequences.

Phil 3.2.19

Updating SheetToMap to take comma separated cell names. Lines 180 – 193. I think I’ll need an iterating compare function. Nope, wound up doing something simpler

for (String colName : colNames) {
    String curCells = tm.get(colName);
    String[] cellArray = curCells.split("\\|\\|"); <--- new!
    for(String curCell : cellArray) {
        addNode(curCell, rowName);
        if (prevCell != null && !curCell.equals(prevCell)) {
            String edgeName = curCell + "+" + prevCell;
            if (graph.getEdge(edgeName) == null) {
                try {
                    graph.addEdge(edgeName, curCell, prevCell);
                    System.out.println("adding edge [" + edgeName + "]");
                } catch (EdgeRejectedException e) {
                    System.out.println("didn't add edge [" + edgeName + "]");
        prevCell = curCell;

    //System.out.print(curCell + ", ");
    prevCell = cellArray[0];

Updating GPM to generate comma separated cell names in trajectories

  • need to get the previous n cell names
  • Need to change the cellName val in FlockingBeliefCA to be a stack of tail length. Done.
  • Parsed the strings in SheetToMap. Each cell has a root name (the first) which connects to the roots of the previous cell. The root then links to the subsequent names in the chain of names that are separated by “||”
    "cell_[4, 5]||cell_[4, 4]||cell_[4, 3]||cell_[4, 2]||cell_[4, 1]"
  • Seems to be working: tailtest

Phil 3.1.19

7:00 – ASRC

  • Got accepted to the TF dev conference. The flight out is expensive… Sent Eric V. a note asking for permission to go, but bought tix anyway given the short fuse
  • Downloaded the full slack data
  • Working on white paper. The single file was getting unwieldy, so I broke it up
  • Found Speeding up Parliamentary Decision Making for Cyber Counter-Attack, which argues for the possibility of pre-authorizing automated response
  • Up to six pages. IN the middle of the cyberdefense section

Phil 2.28.19

7:00 – very, very, late ASRC

  • Tomorrow is March! I need to write a few paragraphs for Antonio this weekend
  • YouTube stops recommending alt-right channels
    • For the first two weeks of February, YouTube was recommending videos from at least one of these major alt-right channels on more than one in every thirteen randomly selected videos (7.8%). From February 15th, this number has dropped to less than one in two hundred and fifty (0.4%).
  • Working on text splitting Group1 in the PHPBB database
    • Updated the view so the same queries work
    • Discovered that you can do this: …, “message” as type, …. That gives you a column of type filled with “message”. Via stackoverflow
    • Mostly working, I’m missing the last bucket for some reason. But it’s good overlap with the Slack data.
    • Was debugging on my office box, and was wondering where all the data after the troll was! Ooops, not loaded
    • Changed the time tests to be > ts1 and <= ts2
  • Working on the white paper. Deep into strategy, Cyberdefense, and the evolution towards automatic active response in cyber.
  • Looooooooooooooooooooooooooong meeting of Shimei’s group. Interesting but difficult paper: Learning Dynamic Embeddings from Temporal Interaction Networks
  • Emily’s run in the dungeon finishes tonight!
  • Looks like I’m going to the TF Dev conference after all….

Phil 2.27.19

7:00 – 5:30 ASRC

  • Getting closer to the goal by being less capable
    • Understanding how systems with many semi-autonomous parts reach a desired target is a key question in biology (e.g., Drosophila larvae seeking food), engineering (e.g., driverless navigation), medicine (e.g., reliable movement for brain-damaged individuals), and socioeconomics (e.g., bottom-up goal-driven human organizations). Centralized systems perform better with better components. Here, we show, by contrast, that a decentralized entity is more efficient at reaching a target when its components are less capable. Our findings reproduce experimental results for a living organism, predict that autonomous vehicles may perform better with simpler components, offer a fresh explanation for why biological evolution jumped from decentralized to centralized design, suggest how efficient movement might be achieved despite damaged centralized function, and provide a formula predicting the optimum capability of a system’s components so that it comes as close as possible to its target or goal.
  • Nice chat with Greg last night. He likes the “Bones in a Hut” and “Stampede Theory” phrases. It turns out the domains are available…
    • Thinking that the title of the book could be “Stampede Theory: Why Group Think Happens, and why Diversity is the First, Best Answer”. Maybe structure the iConference talk around that as well.
  • Guidance from Antonio: In the meantime, if you have an idea on how to structure the Introduction, please go on considering that we want to put the decision logic inside each Autonomous Car that will be able to select passengers and help them in a self-organized manner.
  • Try out the splitter on the Tymora1 text.
    • Incorporate the ignore.xml when reading the text
    • If things look promising, then add changes to the phpbb code and try on that text as well.
    • At this point I’m just looking at overlapping lists of words that become something like a sand chart. I wonder if I can use the Eigenvector values to become a percentage connectivity/weight? Weights
    • Ok – I have to say that I’m pretty happy with this. These are centrality using top 25% BOW from the Slack text of Tymora1. I think that the way to use this is to have each group be an “agent” that has cluster of words for each step: Top 10
    • Based on this, I’d say add a “Evolving Networks of words” section to the dissertation. Have to find that WordRank paper
  • Working on white paper. Lit review today, plus fix anything that I might have broken…
    • Added section on cybersecurity that got lost in the update fiasco
    • Aaron found a good paper on the lack of advantage that the US has in AI, particularly wrt China
  • Avoiding working on white paper by writing a generator for Aaron. Done!
  • Cortex is an open-source platform for building, deploying, and managing machine learning applications in production. It is designed for any developer who wants to build machine learning powered services without having to worry about infrastructure challenges like configuring data pipelines, continuous deployment, and dependency management. Cortex is actively maintained by Cortex Labs. We’re a venture-backed team of infrastructure engineers and we’re hiring.

Phil 2.26.19

7:00 – 3:00 ASRC

    • Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
    • More white paper. Add Flynn’s thoughts about cyber security – see notes from yesterday
    • Reconnected with Antonio. He’d like me to write the introduction and motivation for his SASO paper
    • Add time bucketing to postanalyzer. I’m really starting to want to add a UI
      • Looks done. Try it out next time
        Running query for Poe in subject peanutgallery between 23:56 and 00:45
        Running query for Dungeon Master in subject peanutgallery between 23:56 and 00:45
        Running query for Lord Javelin in subject peanutgallery between 23:56 and 00:45
        Running query for memoriesmaze in subject peanutgallery between 23:56 and 00:45
        Running query for Linda in subject peanutgallery between 23:56 and 00:45
        Running query for phil in subject peanutgallery between 23:56 and 00:45
        Running query for Lorelai in subject peanutgallery between 23:56 and 00:45
        Running query for Bren'Dralagon in subject peanutgallery between 23:56 and 00:45
        Running query for Shelton Herrington in subject peanutgallery between 23:56 and 00:45
        Running query for Keiri'to in subject peanutgallery between 23:56 and 00:45
    • More white paper. Got through the introduction and background. Hopefully didn’t loose anything when I had to resynchronize with the repository that I hadn’t updated from


Phil 2.25.19

7:00 – 2:30 ASRC TL

2:30 – 4:30 PhD

  • Fix directory code of LMN so that it remembers the input and output directories – done
  • Add time bucketing capabilities. Do this by taking the complete conversation and splitting the results into N sublists. Take the beginning and ending time from each list and then use those to set the timestamp start and stop for each player’s posts.
  • Thinking about a time-series LMN tool that can chart the relative occurrence of the sorted terms over time. I think this could be done with tkinter. I would need to create and executable as described here, though the easiest answer seems to be pyinstaller.
  • Here are two papers that show the advantages of herding over nomadic behavior:
    • Phagotrophy by a flagellate selects for colonial prey: A possible origin of multicellularity
      • Predation was a powerful selective force promoting increased morphological complexity in a unicellular prey held in constant environmental conditions. The green alga, Chlorella vulgaris, is a well-studied eukaryote, which has retained its normal unicellular form in cultures in our laboratories for thousands of generations. For the experiments reported here, steady-state unicellular C. vulgaris continuous cultures were inoculated with the predator Ochromonas vallescia, a phagotrophic flagellated protist (‘flagellate’). Within less than 100 generations of the prey, a multicellular Chlorella growth form became dominant in the culture (subsequently repeated in other cultures). The prey Chlorella first formed globose clusters of tens to hundreds of cells. After about 10–20 generations in the presence of the phagotroph, eight-celled colonies predominated. These colonies retained the eight-celled form indefinitely in continuous culture and when plated onto agar. These self-replicating, stable colonies were virtually immune to predation by the flagellate, but small enough that each Chlorella cell was exposed directly to the nutrient medium.
    • De novo origins of multicellularity in response to predation
      • The transition from unicellular to multicellular life was one of a few major events in the history of life that created new opportunities for more complex biological systems to evolve. Predation is hypothesized as one selective pressure that may have driven the evolution of multicellularity. Here we show that de novo origins of simple multicellularity can evolve in response to predation. We subjected outcrossed populations of the unicellular green alga Chlamydomonas reinhardtii to selection by the filter-feeding predator Paramecium tetraurelia. Two of five experimental populations evolved multicellular structures not observed in unselected control populations within ~750 asexual generations. Considerable variation exists in the evolved multicellular life cycles, with both cell number and propagule size varying among isolates. Survival assays show that evolved multicellular traits provide effective protection against predation. These results support the hypothesis that selection imposed by predators may have played a role in some origins of multicellularity. SpontaniousClustering\

Phil 2.24.19

It is a miserable, rainy morning, so I’m working on extracting text blocks for analytics. Once I try the various packages on those blocks, I’ll work on breaking them into blocks.

Ok, that’s coming along well. Here’s an example:

Bren'Dralagon: Pushing through the vines, he steps out to meet the Orc..
(unknown distance clarity, if possible, rush down the stairs to the attack)

Bren'Dralagon: kk

Shelton Herrington: RIP

Keiri'to: first blood

Bren'Dralagon: *Hmm, my tailor will have questions on where that came from*

Shelton Herrington: how far across is the hazard? impossible to jump over?

Shelton Herrington: ok

Bren'Dralagon: close enough to attack?

Shelton Herrington: understood, just checking

Bren'Dralagon: if charging is allowed, since i just moved forward and would be turning i doubt it?, i'll charge

Lorelai: I thought the vines were (mostly) gone?

Shelton Herrington: *"this ingress is a formidable enemy"*

Bren'Dralagon: *Remind me to have those stairs cleaned. I know a guy*

Shelton Herrington: do i have a line of sight to either?

Now that I have some text, I’ll try the tools listed here: The whole suite is known as the Suite of Automatic Linguistic Analysis Tools (SALAT).

Which means… (bear with me here)

That these are tools for creating word salat!

I’ll be here all night folks. Be sure to try the fish…

Played with the tools, but I need a list of words to analyze the docs with respect to. LMN does a good job of this, so I tried it using the broken-out player and DM. It looks super interesting. This is BOW with the non-topic words “these, those, get, etc” ignored:


Based on what I see here, I’m going to work on the bucketing and see if the top words change over time. If they do, then we can build a map in fewer steps