Phil 6.13.19

7:00 – 5:30 ASRC GEOS

PapersAndDatasets

  • Style Transfer in Text: Exploration and Evaluation
    • The ability to transfer styles of texts or images, is an important measurement of the advancement of artificial intelligence (AI). However, the progress in language style transfer is lagged behind other domains, such as computer vision, mainly because of the lack of parallel data and reliable evaluation metrics. In response to the challenge of lacking parallel data, we explore learning style transfer from non-parallel data. We propose two models to achieve this goal. The key idea behind the proposed models is to learn separate content representations and style representations using adversarial networks. Considering the problem of lacking principle evaluation metrics, we propose two novel evaluation metrics that measure two aspects of style transfer: transfer strength and content preservation. We benchmark our models and the evaluation metrics on two style transfer tasks: paper-news title transfer, and positive-negative review transfer. Results show that the proposed content preservation metric is highly correlate to human judgments, and the proposed models are able to generate sentences with similar content preservation score but higher style transfer strength comparing to autoencoder.
  • Different Spirals of Sameness: A Study of Content Sharing in Mainstream and Alternative Media
    • In this paper, we analyze content sharing between news sources in the alternative and mainstream media using a dataset of 713K articles and 194 sources. We find that content sharing happens in tightly formed communities, and these communities represent relatively homogeneous portions of the media landscape. Through a mix-method analysis, we find several primary content sharing behaviors. First, we find that the vast majority of shared articles are only shared with similar news sources (i.e. same community). Second, we find that despite these echo-chambers of sharing, specific sources, such as The Drudge Report, mix content from both mainstream and conspiracy communities. Third, we show that while these differing communities do not always share news articles, they do report on the same events, but often with competing and counter-narratives. Overall, we find that the news is homogeneous within communities and diverse in between, creating different spirals of sameness.
  • Fear of missing out, or FOMO, is “a pervasive apprehension that others might be having rewarding experiences from which one is absent”.[2] This social anxiety[3] is characterized by “a desire to stay continually connected with what others are doing”.[2] FOMO is also defined as a fear of regret,[4] which may lead to a compulsive concern that one might miss an opportunity for social interaction, a novel experience, a profitable investment, or other satisfying events.[5] In other words, FOMO perpetuates the fear of having made the wrong decision on how to spend time since “you can imagine how things could be different”.
  • In financial panics, when social facts dominate over objective ones, the behavior is still called herding.
  • More JASS paper
    • Finished implementing Wayne’s suggestions
    • Save out DB – done
  • More clustering. Add options for column headers and row indices – done
  • Calculating the DTW for the initial data – stillllllll running.

Phil 6.12.19

7:00 – 5:30 ASRC GEOS

Phil 6.11.19

ASRC GEOS 7:00 – 5:30

  • Some interesting stuff from ICML 2019
    • The Evolved Transformer
      • Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 EnglishGerman translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments – the Evolved Transformer – demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 EnglishCzech and LM1B. At a big model size, the Evolved Transformer establishes a new state-ofthe-art BLEU score of 29.8 on WMT’14 EnglishGerman; at smaller sizes, it achieves the same quality as the original “big” Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of ~7M parameters.
    • DBSCAN++: Towards fast and scalable density clustering
      • DBSCAN is a classical density-based clustering procedure with tremendous practical relevance. However, DBSCAN implicitly needs to compute the empirical density for each sample point, leading to a quadratic worst-case time complexity, which is too slow on large datasets. We propose DBSCAN++, a simple modification of DBSCAN which only requires computing the densities for a chosen subset of points. We show empirically that, compared to traditional DBSCAN, DBSCAN++ can provide not only competitive performance but also added robustness in the bandwidth hyperparameter while taking a fraction of the runtime. We also present statistical consistency guarantees showing the trade-off between computational cost and estimation rates. Surprisingly, up to a certain point, we can enjoy the same estimation rates while lowering computational cost, showing that DBSCAN++ is a sub-quadratic algorithm that attains minimax optimal rates for level-set estimation, a quality that may be of independent interest
    • Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
      • We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We analyze Giro in a Bernoulli bandit and derive a bound on its n-round regret, where ? is the difference in the expected rewards of the optimal and the best suboptimal arms, and K is the number of arms. The main advantage of our exploration design is that it easily generalizes to structured problems. To show this, we propose contextual Giro with an arbitrary reward generalization model. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well.
    • Guided evolutionary strategies: Augmenting random search with surrogate gradients
      • Many applications in machine learning require optimizing a function whose true gradient is inaccessible, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications or training networks with discrete variables). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a first-order optimizer. We analytically and numerically characterize the trade-offs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems, demonstrating an improvement over both standard evolutionary strategies and first-order methods that directly follow the surrogate gradient
    • 2019 Workshop on Human In the Loop Learning (HILL)
      • This workshop is a joint effort between the 4th ICML Workshop on Human Interpretability in Machine Learning (WHI) and the ICML 2019 Workshop on Interactive Data Analysis System (IDAS). We have combined our forces this year to run Human in the Loop Learning (HILL) in conjunction with ICML 2019!
      • The workshop will bring together researchers and practitioners who study interpretable and interactive learning systems with applications in large scale data processing, data annotations, data visualization, human-assisted data integration, systems and tools to interpret machine learning models as well as algorithm designs for active learning, online learning, and interpretable machine learning algorithms. The target audience for the workshop includes people who are interested in using machines to solve problems by having a human be an integral part of the process. This workshop serves as a platform where researchers can discuss approaches that bridge the gap between humans and machines and get the best of both worlds.
    • More JASS paper
    • Start on clustering hyperparameter search
      • Created ClusterEvaluator. Going to use learning_optimizer as the search space evaluator – Done
    • Waikato meeting
      • Extract data from the PHP and Slack DBs for Tony and JASSS

Phil 6.10.19

ASRC GEOS 7:00 – 3:00

  • I’ve been thinking about the implications of this article: Training a single AI model can emit as much carbon as five cars in their lifetimes
    • There is something in this that has to do with the idea of cost. NN architectures have no direct concept of cost. Inevitably the “current best network” takes a building full of specialized processors 200 hours. This has been true for Inception, AmeoebaNet, and AlphaGo. I wonder what would happen if there was a cost for computation that was part of the fitness function?
    • My sense is that evolution, has two interrelated parameters
      • a mutation needs to “work better” (whatever that means in the context) than the current version
      • the organism that embodies the mutation has to reproduce
    • In other words, neural structures in our brains have an unbroken chain of history to the initial sensor neurons in multicellular organisms. All the mutations that didn’t live to make an effect. Those that weren’t able to reproduce didn’t get passed on.
    • Randomness is important too. Systems that are too similar, like Aspen trees that have given up on sexual reproduction and are essentially all clones reproducing by rhizome. These live long enough to have an impact on the environment, particularly where they can crowd out other species, but the species itself is doomed.
    • I’d like to see an approach to developing NNs that involves more of the constraints of “natural” evolution. I think it would lead to better, and potentially less destructive results.
  • SHAP (SHapley Additive exPlanations) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods [1-7] and representing the only possible consistent and locally accurate additive feature attribution method based on expectations (see our papers for details).
  • Working on clustering. I’ve been going around in circles on how to take a set of relative distance measures and use them as a basis for clustering. To revisit, here’s a screenshot of a spreadsheet containing the DTW distances from every sequence to every other sequence: DTW
  • My approach is to treat each line of relative distances as a high-dimensional coordinate ( in this case, 50 dimensions), and cluster with respect to the point that defines. This takes care of the problem that the data in this case is very symmetric about the diagonal. Using this approach, an orange/green coordinate is in a different location from the mirrored green/orange coordinate. It’s basically the difference between (1, 2) and (2, 1). That should be a reliable clustering mechanism. Here are the results:
           cluster_id
    ts_0            0
    ts_1            0
    ts_2            0
    ts_3            0
    ts_4            0
    ts_5            0
    ts_6            0
    ts_7            0
    ts_8            0
    ts_9            0
    ts_10           0
    ts_11           0
    ts_12           0
    ts_13           0
    ts_14           0
    ts_15           0
    ts_16           0
    ts_17           0
    ts_18           0
    ts_19           0
    ts_20           0
    ts_21           0
    ts_22           0
    ts_23           0
    ts_24           0
    ts_25           1
    ts_26           1
    ts_27           1
    ts_28           1
    ts_29           1
    ts_30           1
    ts_31           1
    ts_32           1
    ts_33           1
    ts_34           1
    ts_35           1
    ts_36           1
    ts_37           1
    ts_38           1
    ts_39           1
    ts_40           1
    ts_41           1
    ts_42           1
    ts_43           1
    ts_44           1
    ts_45           1
    ts_46           1
    ts_47           1
    ts_48           1
    ts_49           1
  • First-Order Adversarial Vulnerability of Neural Networks and Input Dimension
    • Carl-Johann Simon-Gabriel, Yann Ollivier, Bernhard Scholkopf, Leon BottouDavid Lopez-Paz
    • Over the past few years, neural networks were proven vulnerable to adversarial images: Targeted but imperceptible image perturbations lead to drastically different predictions. We show that adversarial vulnerability increases with the gradients of the training objective when viewed as a function of the inputs. Surprisingly, vulnerability does not depend on network topology: For many standard network architectures, we prove that at initialization, the l1-norm of these gradients grows as the square root of the input dimension, leaving the networks increasingly vulnerable with growing image size. We empirically show that this dimension-dependence persists after either usual or robust training, but gets attenuated with higher regularization.
  • More JASSS paper. Through the corrections up to the Results section. Kind of surprised to be leaning so hard on Homer, but I need a familiar story from before world maps.
  • Oh yeah, the Age Of discovery correlates with the development of the Mercator projection and usable world maps

Phil 6.7.19

7:00 – 4:30ASRC GEOS

  • Expense report
  • learned how to handle overtime
  • Dissertation. At 68 pages into the Very Horrible First Draft (VHFD)
  • Meeting with Wayne. Walked though JASSS paper and CHIPLAY reviews
  • Set arguments to DTW systems so that a specified number of rows can be evaluated to support parallelization – done: Split
  • Start clustering? Mope. Wrote up report instead

Phil 6.6.19

7:00 – 3:00 ASRC PM Summit

  • 75th anniversary of D-day 640px-Naval_Bombardments_on_D-Day
  • Research talk today at the conference. Much networking yesterday.
    • The talk went well. More opportunities for networking. Mayne some ML for 3D printing?
  • Copied the CHIPLAY paper to a new GROUP 2020 folder and change to the acm small article format
  • Simplicial models of social contagion
    • Complex networks have been successfully used to describe the spread of diseases in populations of interacting individuals. Conversely, pairwise interactions are often not enough to characterize social contagion processes such as opinion formation or the adoption of novelties, where complex mechanisms of influence and reinforcement are at work. Here we introduce a higher-order model of social contagion in which a social system is represented by a simplicial complex and contagion can occur through interactions in groups of different sizes. Numerical simulations of the model on both empirical and synthetic simplicial complexes highlight the emergence of novel phenomena such as a discontinuous transition induced by higher-order interactions. We show analytically that the transition is discontinuous and that a bistable region appears where healthy and endemic states co-exist. Our results help explain why critical masses are required to initiate social changes and contribute to the understanding of higher-order interactions in complex systems.
  • This is wild: Randomly wired neural networks and state-of-the-art accuracy? Yes it works.
  • This is sad: Training a single AI model can emit as much carbon as five cars in their lifetimes
  • Came home and slept 2 1/2 hours. Very cooked.

Phil 6.4.19

7:00 – 4:00 ASRC NASA GEOS

  • Continuing to read Colin Martindale’s Cognitive Psychology, a Neural Network Approach, which is absolutely bonkers for something written decades ago. Ordered two more copies.
  • JASSS Paper. Adding footnotes to figures, which is tricky.
  • Dissertation
    • Took the chapter numbers out of the file names, since these things seem to be sliding around quite a bit
  • Registered for Politics and Computational Social Science (PACSS) Conference
  • GROUP paper?
  • Waveform clustering
    • Adding noise to the float_functions class. Here’s the waveform without and with some (0.1) noise:
    • Installed fastdtw for python
    • DTW is working on the lines in the csv. Identical lines have zero distance, noise has some. Need to think about some kind of normalizing measure. Maybe divide by the number of points?
    • Need to iterate as nested loops over all the rows. Skip when i == j – done
    • Need to build a Dataframe of distances from one row to the next – done
    • Here are the two curves to compare: TwoCurves
    • And here’s the DTW result: DTW
  • Good Waikato meeting. We’ll try to run a jury next week. Also, meetings have been moved to 6:30 EST

Phil 5.31.19

7:00 – 3:00 NASA GEOS

  • Got a proposal from Panos and his group. Michael Mayo is interested in running Google’s Universal Sentence Encoder on the data
  • Defending Against Neural Fake News
    • Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. 
      Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary’s point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover. Given a headline like `Link Found Between Vaccines and Autism,’ Grover can generate the rest of the article; humans find these generations to be more trustworthy than human-written disinformation. 
    • Developing robust verification techniques against generators like Grover is critical. We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias — and sampling strategies that alleviate its effects — both leave artifacts that similar discriminators can pick up on. We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news.
  • Retooling CHIPLAY for GROUP. Deadline is June 21
  • More JASS tweaking:
    • Switch the urls in the paper to antibubbles to anonymize – done

Phil 5.30.19

7:00 – 2:30 NASA GEOS

  • CHI Play reviews should come back today!
    • Darn – rejected. From the reviews, it looks like we are in the same space, but going a different direction – an alignment problem. Need to read the reviews in detail though.
    • Some discussion with Wayne about GROUP
  • More JASSS paper
    • Added some broader thoughts to the conclusion and punched up the subjective/objective map difference
  • Start writing proposal for Bruce
    • Simple simulation baseline for model building
    • Develop models for
      • Extrapolating multivariate (family) values, including error conditions
      • Classify errors
      • Explainable model, that has sensor inputs drive the controls of the model that produce outputs that are evaluated against the original inputs using RL
      • “Safer” ML using Sanhedrin approach
  • EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling
    • In our ICML 2019 paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, we propose a novel model scaling method that uses a simple yet highly effective compound coefficient to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients. Powered by this novel scaling method and recent progress on AutoML, we have developed a family of models, called EfficientNets, which superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster). EfficientNet

Phil 5.28.19

Phil 7:00 – 5:00 ASRC NASA GEOS

  • Factors Motivating Customization and Echo Chamber Creation Within Digital News Environments
    • With the influx of content being shared through social media, mobile apps, and other digital sources – including fake news and misinformation – most news consumers experience some degree of information overload. To combat these feelings of unease associated with the sheer volume of news content, some consumers tailor their news ecosystems and purposefully include or exclude content from specific sources or individuals. This study explores customization on social media and news platforms through a survey (N = 317) of adults regarding their digital news habits. Findings suggest that consumers who diversify their online news streams report lower levels of anxiety related to current events and highlight differences in reported anxiety levels and customization practices across the political spectrum. This study provides important insights into how perceived information overload, anxiety around current events, political affiliations and partisanship, and demographic characteristics may contribute to tailoring practices related to news consumption in social media environments. We discuss these findings in terms of their implications for industry, policy, and theory
  • More JASSS paper
  • Installing new IntelliJ and re-indexing
  • Discovered a few bugs with the JsonUtils.find. Fixed and submitted a version to StackOverflow. Eeeep!

Phil 5.26.19

Tikkun olam (Hebrew for “world repair”) has come to connote social action and the pursuit of social justice. The phrase has origins in classical rabbinic literature and in Lurianic kabbalah, a major strand of Jewish mysticism originating with the work of the 16th-century kabbalist Isaac Luria.

Cooperation in large-scale human societies — What, if anything, makes it unique, and how did it evolve?

  • There is much controversy about whether the cooperative behaviours underlying the functioning of human societies can be explained by individual self-interest. Confusion over this has frustrated the understanding of how large-scale societies could ever have evolved and be maintained. To clarify this situation, we here show that two questions need to be disentangled and resolved. First, how exactly do individual social interactions in small- and large-scale societies differ? We address this question by analysing whether the exchange and collective action dilemmas in large-scale societies differ qualitatively from those in small-scale societies, or whether the difference is only quantitative. Second, are the decision-making mechanisms used by individuals to choose their cooperative actions driven by self-interest? We address this question by extracting three types of individual decision-making mechanism (three type of “minds”) that have been assumed in the literature, and compare the extent to which these decision-making mechanisms are sensitive to individual material payoff. After addressing the above questions, we ask: what was the key change from other primates that allowed for cooperative behaviours to be maintained as the scale of societies grew? We conclude that if individuals are not able to refine the social interaction mechanisms underpinning cooperation, i.e change the rules of exchange and collective action dilemmas, then new mechanisms of transmission of traits between individuals are necessary. Examples are conformity-biased or prestige-biased social learning, as stressed by the cultural group selection hypothesis. But if individuals can refine and adjust their social interaction mechanisms, then no new transmission mechanisms are necessary and cooperative acts can be sustained in large-scale societies entirely by way of self-interest, as stressed by the institutional path hypothesis. Overall, our analysis contributes to the theoretical foundation of the evolution of human social behaviour.