Category Archives: Conferences

Phil 7.3.19

Continuing with the ICML 2019 Tutorial: Recent Advances in Population-Based Search for Deep Neural Networks. Wow. Lots of implications for diversity science. They need to read Martindale though.

This also looks good, using the above concepts of Quality Diversity to create map-like structures in low dimensions

  • Autonomous skill discovery with Quality-Diversity and Unsupervised Descriptors
    • Quality-Diversity optimization is a new family of optimization algorithms that, instead of searching for a single optimal solution to solving a task, searches for a large collection of solutions that all solve the task in a different way. This approach is particularly promising for learning behavioral repertoires in robotics, as such a diversity of behaviors enables robots to be more versatile and resilient. However, these algorithms require the user to manually define behavioral descriptors, which is used to determine whether two solutions are different or similar. The choice of a behavioral descriptor is crucial, as it completely changes the solution types that the algorithm derives. In this paper, we introduce a new method to automatically define this descriptor by combining Quality-Diversity algorithms with unsupervised dimensionality reduction algorithms. This approach enables robots to autonomously discover the range of their capabilities while interacting with their environment. The results from two experimental scenarios demonstrate that robot can autonomously discover a large range of possible behaviors, without any prior knowledge about their morphology and environment. Furthermore, these behaviors are deemed to be similar to handcrafted solutions that uses domain knowledge and significantly more diverse than when using existing unsupervised methods.

Back to the Dissertation

  • Added notes from Monday’s dungeon run
  • Added adversarial herding
  • At 111 pages!

Phil 6.11.19

ASRC GEOS 7:00 – 5:30

  • Some interesting stuff from ICML 2019
    • The Evolved Transformer
      • Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 EnglishGerman translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments – the Evolved Transformer – demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 EnglishCzech and LM1B. At a big model size, the Evolved Transformer establishes a new state-ofthe-art BLEU score of 29.8 on WMT’14 EnglishGerman; at smaller sizes, it achieves the same quality as the original “big” Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of ~7M parameters.
    • DBSCAN++: Towards fast and scalable density clustering
      • DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest
    • Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
      • We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We analyze Giro in a Bernoulli bandit and derive a bound on its n-round regret, where ? is the difference in the expected rewards of the optimal and the best suboptimal arms, and K is the number of arms. The main advantage of our exploration design is that it easily generalizes to structured problems. To show this, we propose contextual Giro with an arbitrary reward generalization model. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well.
    • Guided evolutionary strategies: Augmenting random search with surrogate gradients
      • Many applications in machine learning require optimizing a function whose true gradient is inaccessible, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications or training networks with discrete variables). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a first-order optimizer. We analytically and numerically characterize the trade-offs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems, demonstrating an improvement over both standard evolutionary strategies and first-order methods that directly follow the surrogate gradient
    • 2019 Workshop on Human In the Loop Learning (HILL)
      • This workshop is a joint effort between the 4th ICML Workshop on Human Interpretability in Machine Learning (WHI) and the ICML 2019 Workshop on Interactive Data Analysis System (IDAS). We have combined our forces this year to run Human in the Loop Learning (HILL) in conjunction with ICML 2019!
      • The workshop will bring together researchers and practitioners who study interpretable and interactive learning systems with applications in large scale data processing, data annotations, data visualization, human-assisted data integration, systems and tools to interpret machine learning models as well as algorithm designs for active learning, online learning, and interpretable machine learning algorithms. The target audience for the workshop includes people who are interested in using machines to solve problems by having a human be an integral part of the process. This workshop serves as a platform where researchers can discuss approaches that bridge the gap between humans and machines and get the best of both worlds.
    • More JASS paper
    • Start on clustering hyperparameter search
      • Created ClusterEvaluator. Going to use learning_optimizer as the search space evaluator – Done
    • Waikato meeting
      • Extract data from the PHP and Slack DBs for Tony and JASSS

Phil 6.6.19

7:00 – 3:00 ASRC PM Summit

  • 75th anniversary of D-day 640px-Naval_Bombardments_on_D-Day
  • Research talk today at the conference. Much networking yesterday.
    • The talk went well. More opportunities for networking. Mayne some ML for 3D printing?
  • Copied the CHIPLAY paper to a new GROUP 2020 folder and change to the acm small article format
  • Simplicial models of social contagion
    • Complex networks have been successfully used to describe the spread of diseases in populations of interacting individuals. Conversely, pairwise interactions are often not enough to characterize social contagion processes such as opinion formation or the adoption of novelties, where complex mechanisms of influence and reinforcement are at work. Here we introduce a higher-order model of social contagion in which a social system is represented by a simplicial complex and contagion can occur through interactions in groups of different sizes. Numerical simulations of the model on both empirical and synthetic simplicial complexes highlight the emergence of novel phenomena such as a discontinuous transition induced by higher-order interactions. We show analytically that the transition is discontinuous and that a bistable region appears where healthy and endemic states co-exist. Our results help explain why critical masses are required to initiate social changes and contribute to the understanding of higher-order interactions in complex systems.
  • This is wild: Randomly wired neural networks and state-of-the-art accuracy? Yes it works.
  • This is sad: Training a single AI model can emit as much carbon as five cars in their lifetimes
  • Came home and slept 2 1/2 hours. Very cooked.

Phil 6.4.19

7:00 – 4:00 ASRC NASA GEOS

  • Continuing to read Colin Martindale’s Cognitive Psychology, a Neural Network Approach, which is absolutely bonkers for something written decades ago. Ordered two more copies.
  • JASSS Paper. Adding footnotes to figures, which is tricky.
  • Dissertation
    • Took the chapter numbers out of the file names, since these things seem to be sliding around quite a bit
  • Registered for Politics and Computational Social Science (PACSS) Conference
  • GROUP paper?
  • Waveform clustering
    • Adding noise to the float_functions class. Here’s the waveform without and with some (0.1) noise:
    • Installed fastdtw for python
    • DTW is working on the lines in the csv. Identical lines have zero distance, noise has some. Need to think about some kind of normalizing measure. Maybe divide by the number of points?
    • Need to iterate as nested loops over all the rows. Skip when i == j – done
    • Need to build a Dataframe of distances from one row to the next – done
    • Here are the two curves to compare: TwoCurves
    • And here’s the DTW result: DTW
  • Good Waikato meeting. We’ll try to run a jury next week. Also, meetings have been moved to 6:30 EST

Phil 5.30.19

7:00 – 2:30 NASA GEOS

  • CHI Play reviews should come back today!
    • Darn – rejected. From the reviews, it looks like we are in the same space, but going a different direction – an alignment problem. Need to read the reviews in detail though.
    • Some discussion with Wayne about GROUP
  • More JASSS paper
    • Added some broader thoughts to the conclusion and punched up the subjective/objective map difference
  • Start writing proposal for Bruce
    • Simple simulation baseline for model building
    • Develop models for
      • Extrapolating multivariate (family) values, including error conditions
      • Classify errors
      • Explainable model, that has sensor inputs drive the controls of the model that produce outputs that are evaluated against the original inputs using RL
      • “Safer” ML using Sanhedrin approach
  • EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling
    • In our ICML 2019 paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, we propose a novel model scaling method that uses a simple yet highly effective compound coefficient to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients. Powered by this novel scaling method and recent progress on AutoML, we have developed a family of models, called EfficientNets, which superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster). EfficientNet

Phil 4.8.19

7:00 – ASRC PhD

  • Meeting with Wayne and Aaron last night. Wayne doesn’t think the venue is right for the papers in the current form. Rewrite combining papers as a “using FTRPGs” as a source for science.
  • Still need a venue for the mapping:

Phil 3.12.19

7:00 – 4:00 ASRC PhD



Phil 3.10.19

Learning to Speak and Act in a Fantasy Text Adventure Game

  • We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

New run in the dungeon. Exciting!

Finished my pass through Antonio’s paper

Zoe Keating (May 1) or Imogen Heap (May 3)?

Phil 3.9.19

Understanding China’s AI Strategy

  • In my interactions with Chinese government officials, they demonstrated remarkably keen understanding of the issues surrounding AI and international security. It is clear that China’s government views AI as a high strategic priority and is devoting the required resources to cultivate AI expertise and strategic thinking among its national security community. This includes knowledge of U.S. AI policy discussions. I believe it is vital that the U.S. policymaking community similarly prioritize cultivating expertise and understanding of AI developments in China.

Russian Trolls Shift Strategy to Disrupt U.S. Election in 2020

  • Russian internet trolls appear to be shifting strategy in their efforts to disrupt the 2020 U.S. elections, promoting politically divisive messages through phony social media accounts instead of creating propaganda themselves, cybersecurity experts say.

Backup phone

Work on SASO paper – started

Rachel’s dungeon run is tomorrow! Maybe cross 10,000 posts?

Look at using BERT and the full Word2Vec model for analyzing posts

The Promise of Hierarchical Reinforcement Learning

  • To really understand the need for a hierarchical structure in the learning algorithm and in order to make the bridge between RL and HRL, we need to remember what we are trying to solve: MDPs. HRL methods learn a policy made up of multiple layers, each of which is responsible for control at a different level of temporal abstraction. Indeed, the key innovation of the HRL is to extend the set of available actions so that the agent can now choose to perform not only elementary actions, but also macro-actions, i.e. sequences of lower-level actions. Hence, with actions that are extended over time, we must take into account the time elapsed between decision-making moments. Luckily, MDP planning and learning algorithms can easily be extended to accommodate HRL.

Phil 3.7.19

Day 2 of the TF Dev summit. Worth the money, though much less research-y and more implementation and production-y

Google Cloud has Fedramp certification, which it does see details here.

Live Transcribe

Coral: On Device Transfer learning (paper)

TF 2.0 API \changes and Behavior changes

  • Best practices (link: )
  • Declare variables at the beginning of the code
  • Keras Functional API
    • The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
  • Autograd can automatically differentiate native Python and Numpy code. It can handle a large subset of Python’s features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation), which means it can efficiently take gradients of scalar-valued functions with respect to array-valued arguments, as well as forward-mode differentiation, and the two can be composed arbitrarily. The main intended application of Autograd is gradient-based optimization. For more information, check out the tutorial and the examples directory.
  • JAX is Autograd and XLA, brought together for high-performance machine learning research. With its updated version of Autograd, JAX can automatically differentiate native Python and NumPy functions. It can differentiate through loops, branches, recursion, and closures, and it can take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation) via grad as well as forward-mode differentiation, and the two can be composed arbitrarily to any order.
  • Effective TF 2.0: There are multiple changes in TensorFlow 2.0 to make TensorFlow users more productive. TensorFlow 2.0 removes redundant APIs, makes APIs more consistent (Unified RNNsUnified Optimizers), and better integrates with the Python runtime with Eager execution.

Phil 3.6.19

5:00 – ASRC TL

  • Got a lot done on the BAA on the flight yesterday
  • Wrote up a description of LMN and CM for Eric V.
  • Reading more of the Handbook of Latent Semantic Analysis. It’s giving me some good ideas for calculating similarities of posts using Word2Vec and comparing the average vector for each post
  • Antonio got an extension to the 12th. Need to see what he’s up to. Wow, there’s a lot there now. Made some comments about what I’d like to see. I’ll pull down the document to read later
  • Continued to tweak the slides
  • TF Dev conference main sessions today. Breakouts tomorrow.

Phil 1.25.19

7:00 – 5:30 ASRC NASA/PhD

    • Practical Deep Learning for Coders, v3
    • Continuing Clockwork Muse (reviews on Amazon are… amazingly thorough) , which is a slog but an interesting slog. Martindale is talking about how the pattern of increasing arousal potential and primordial/stylistic content is self-similar across scales of the individual work to populations and careers.
    • Had a bunch of thoughts about primordial content and the ending of the current dungeon.
    • Last day of working on NOAA. I think there is a better way to add/subtract months here in stackoverflow
    • Finish review of CHI paper. Mention Myanmar and that most fake news sharing is done by a tiny fraction of the users, so finding the heuristics of those users is a critical question. Done!
    • Setting up Fake news on Twitter during the 2016 U.S. presidential election as the next paper in the queue. The references look extensive (69!) and good.
    • TFW you don’t want any fancy modulo in your math confusing you:
      def add_month(year: int, month: int, offset: int) -> [int, int]:
          # print ("original date = {}/{}, offset = {}".format(month, year, offset))
          new_month = month + offset
          new_year = year
          while new_month < 1:         new_month += 12         new_year -= 1     while new_month > 12:
              new_month -= 12
              new_year += 1
          return new_month, new_year
    • Got a version of the prediction system running on QA. Next week I start something new


Phil 1.24.19

7:00 – 4:30 ASRC NASA/PhD

  • Fake news on Twitter during the 2016 U.S. presidential election
    • The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.
  • One Simple Trick is now live on IEEE!
  • Antibubbles is going well
  • Work on CHI review. Mention this: Less than you think: Prevalence and predictors of fake news dissemination on Facebook
  • Starting to work on the Slack data ingestion and database population. I really want a file dialog to navigate to the Slack folders. StackOverflow suggests tkinter. And lo, it worked just like that:
    import tkinter as tk
    from tkinter import filedialog
    root = tk.Tk()
    file_path = filedialog.askopenfilename()
  • More beating on the prediction pipeline
    • Load up all the parts of the prediction histories and entries – done
    • Store the raw data in the various prediction tables – done
    • populate PredictedAvailableUDO table – done
    • There’s an error in interpolate that I’m not handling correctly, and I’m too cooked to be able to see it. Tomorrow. interpolatebug