Category Archives: Machine Learning

Phil 3.13.18

7:00 – 5:00 ASRC MKT

  • Sent T a travel request for the conference. Yeah, it’s about as late as it could be, but I just found out that I hadn’t registered completely…
  • Got Tensorflow running on my laptop. Can’t get Python 2.x warnings to not show. Grrrr.
  • Had to turn off privacy badger to get the TF videos to play. Nicely done
  • Information Fostering – Being Proactive with Information Seeking and Retrieval [Perspective Paper]
    Chirag Shah

    • Understanding topic, task, and intention
    • People are boxed in when looking for information. Difficult to encouraging broad thinking
    • Ryan White – tasks? Cortana?
    • What to do when things go bad:
  • The Role of the Task Topic in Web Search of Different Task Types.
    Daniel Hienert, Matthew Mitsui, Philipp Mayr, Chirag Shah and Nicholas Belkin
  • Juggling with Information Sources, Task Type, and Information Quality
    Yiwei Wang, Shawon Sarkar and Chirag Shah

    • Doing tasks in a study has an odd bias that drives users to non-social information sources. Since the user is not engaged in a “genuine” task, the request of other people isn’t considered as viable.
  • ReQuIK: Facilitating Information Discovery for Children Through Query Suggestions.
    Ion Madrazo, Oghenemaro Anuyah, Nevena Dragovic and Maria Soledad Pera

    • LSTM model + hand-coded heuristics combined deep and wide. LSTM produces 92% accuracy, Hand-rolled 68%, both 94%
    • Wordnet-based similarity
  • Improving exploration of topic hierarchies: comparative testing of simplified Library of Congress Subject Heading structures.
    Jesse David Dinneen, Banafsheh Asadi, Ilja Frissen, Fei Shu and Charles-Antoine Julien

    • Pruning large scale structures to support visualization
    • Browsing complexity calculations
    • Really nice. Dynamically pruned trees, with the technical capability for zooming at a local level
  • Fixation and Confusion – Investigating Eye-tracking Participants’ Exposure to Information in Personas.
    Joni Salminen, Jisun An, Soon-Gyo Jung, Lene Nielsen, Haewoon Kwak and Bernard J. Jansen

    • LDA topic extraction
    • Eyetribe – under $200. Bought by Facebook
    • Attribute similarity as a form of diversity injection
  • “I just scroll through my stuff until I find it or give up”: A Contextual Inquiry of PIM on Private Handheld Devices.
    Amalie Jensen, Caroline Jægerfelt, Sanne Francis, Birger Larsen and Toine Bogers

    • contextual inquiry – good at uncovering tacit interactions
    • Looking at the artifacts of PIM
  • Augmentation of Human Memory: Anticipating Topics that Continue in the Next Meeting
    Seyed Ali Bahrainian and Fabio Crestani

    • Social Interactions Log Analysis System (Bahrainian et. al)
    • Proactive augmentation of memory
    • LDA topic extraction
    • Recency effect could apply to distal ends of a JuryRoom discussion
  • Characterizing Search Behavior in Productivity Software.
    Horatiu Bota, Adam Fourney, Susan Dumais, Tomasz L. Religa and Robert Rounthwaite

Phil 3.12.18

7:00 – 7:00 ASRC

  • The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
    • Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution’s creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature.
  • Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web.
    Ujwal Gadiraju, Ran Yu, Stefan Dietze and Peter Holtz
  • Query Priming for Promoting Critical Thinking in Web Search.
    Yusuke Yamamoto and Takehiro Yamamoto

    • TruthFinder – consistency
    • CowSearch – provides supporting information for credibility judgements
    • Query priming only worked on university-educated participants. Explorer? Or not university educated are stampede?
  • Searching as Learning: Exploring Search Behavior and Learning Outcomes in Learning-related Tasks.
    Souvick Ghosh, Manasa Rath and Chirag Shah

    • Structures of the Life-World
    • Distinguish, organize and conclude are commonly used words by participants describing their tasks. This implies that learning, or at least the participant’s view of learning is building an inventory of facts. Hmm.
    • Emotional effect on cognitive behavior? It would be interesting to see if (particularly with hot-button issues), the emotion can lead to a more predictable dimension reduction.
  • Informing the Design of Spoken Conversational Search [Perspective Paper]
    Johanne R Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho and Mark Sanderson

    •  Mention to Johanne about spoken interface to SQL
    • EchoQuery
  • Style and alignment in information-seeking conversation.
    Paul Thomas, Mary Czerwinski, Daniel Mcduff, Nick Craswell and Gloria Mark

    • Conversational Style (Deborah Tannen) High involvement and High consideration.
    • Alignment. Match each others patterns of speech!
    • Joint action, interactive alignment, and dialog
      • Dialog is a joint action at different levels. At the highest level, the goal of interlocutors is to align their mental representations. This emerges from joint activity at lower levels, both concerned with linguistic decisions (e.g., choice of words) and nonlinguistic processes (e.g., alignment of posture or speech rate). Because of the high-level goal, the interlocutors are particularly concerned with close coupling at these lower levels. As we illustrate with examples, this means that imitation and entrainment are particularly pronounced during interactive communication. We then argue that the mechanisms underlying such processes involve covert imitation of interlocutors’ communicative behavior, leading to emulation of their expected behavior. In other words, communication provides a very good example of predictive emulation, in a way that leads to successful joint activity.
  • SearchBots: User Engagement with ChatBots during Collaborative Search.
    Sandeep Avula, Gordon Chadwick, Jaime Arguello and Robert Capra

Phil 3.11.18

7:00 – 5:00 ASRC MKT

  • Notes from Coursera Deep Learning courses by Andrew Ng. Cool notes by Tess Ferrandez <- nice Angular stuff here too
  • Kill Math project for math visualizations
    • The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols.
  • Leif Azzopardi
  • CHIIR 2018 DC today! I’m on after lunch! Impostor syndrome well spun up right now
    • Contextualizing Information Needs of Patients with Chronic Conditions Using Smartphones
      • Henna Kim
      • What about the OpenAPS project?
      • recognition that patients need pieces of information to accomplish health related work to better manage their condition to health and wellness
      • Information needs arise from talks???
      • Goals that patients pursue for a long period of time
  • Task-based Information Seeking in Different Study Settings
    • Yiwei Wang
    • People are influenced by their natural environment. Also the cognitive environment
    • What about nomadic/flock/stampede?
    • She needs a research browser!
    • Need for cognition
  • The Moderator Effect of Working Memory and Emotion on the Relationship between Information Overload and Online Health Information Quality
    • Yung-Sheng Chang
    • Information overload and information behavior/attitude
    • Overload is also the inability to simplify. Framing should help with incorporation
  • Exploring the effects of social contexts on task-based information seeking behavior
    • Eun Youp Rha
    • Socio-cultural context
    • A task is only recognizable within a certain context when people agree it is a task
    • Sociocultural mental processes. Perception, memory, Classification signification (Zerubavel, 1997)
      • Sociology of perception
      • Sociology of attention
      • Practice theory – Viewing human actions as regular performances of ritualized actions
    • How do tow communities in different places evolve different norms?
  • Distant Voices in the Dark: Understanding the incongruent information needs of fiction authors and readers
    • Carol Butler
    • Authors and readers interact with each other
    • What about The Martian?
    • Also, fanfiction?
    • Authors want to interact with other authors, readers with readers.
    • Also writing for peers where readers are assumed not to exist (technical publications)
    • Writing and reading is built around an industrial process (mass entertainment in general? What about theater?)
    • Stigma around self-publishing
    • Not much need to interact because they don’t get that much from each other. Also, the book has just been released and the readers haven’t read it. What question do you ask when you haven’t read the book yet? This leads to the “same stupid questions”
    • Library catalogs that incorporate social media. Sense is that it failed?
    • BookTube?
  • On the Interplay Between Search Behavior and Collections in Digital Libraries and Archives
    • Tessel Bogaard
    • Digital library, with text, meta information, clickstreams in logs
    • How do we let the domain curators understand their users
    • Family announcements are disproportionately popular. Short sessions, with few clicks and documents
    • WWII documents are from prolonged interactions
    • Grouping sessions using k medoid using user interactions  and facets. Use average silhouette widths (how similar are the clusters) Stability over time
    • Markov cahin analysis
    • Side by side comparison over teh whole data set
    • Session graph (published demo paper)
  • Creative Search: Using Search to Leverage Your Everyday Creativity
    • Yinglong Zhang
    • Creativity can be taught
    • To be creative, you need to acquire deep domain knowledge. High dimensions. Implies that thinking in low dimensions are creativity constraining.
    • Crowdsourcing tools (Yu, Kittur, and Kraut 2016)
    • Free form web curation (Kerne et. al)
  • Diversity-Enhanced Recommendation Interface and Evaluation
    • Chun-Hua Tsai
    • Diversity-enhanced interface design
    • Continuous Controlability and experience
    • Very LMN-like
    • Interface is swamped by familiarity. Minimum delta from current interfaces.
  • Towards Human-Like Conversational Search Systems
    • Mateusz Dubiel
    • More experience = more use.
    • Needs more conversational?
    • Enable navigation through converation?
    • Back chaining and forward chaining
    • Asking for clarification
    • Turn taking
  • Room 225
  • Journal of information research
  • Paul Thomas (MS Research)
  • Ryan White (MS Research)
  • Jimmy Lin (Ex Twitter)
  • Dianne Kelly.

Phil 3.6.18

7:00 – 4:00 ASRC MKT

  • Endless tweaking of the presentation
    • Pinged Sy – Looks like something on Wednesday. Yep his place around 1:30
  • More BIC
    • The explanatory potential of team reasoning is not confined to pure coordination games like Hi-Lo. Team reasoning is assuredly important for its role in explaining the mystery facts about Hi-Lo; but I think we have stumbled on something bigger than a new theory of behaviour in pure coordination games. The key to endogenous group identification is not identity of interest but common interest giving rise to strong interdependence. There is common interest in Stag Hunts, Battles of the Sexes, bargaining games and even Prisoner’s Dilemmas. Indeed, in any interaction modelable as a ‘mixed motive’ game there is an element of common interest. Moreover, in most of the landmark cases, including the Prisoner’s Dilemma, the common interest is of the kind that creates strong interdependence, and so on the account of chapter 2 creates pressure for group identification. And given group identification, we should expect team reasoning. (pg 144)
    • There is a second evolutionary argument in favour of the spontaneous team-reasoning hypothesis. Suppose there are two alternative mental mechanisms that, given common interest, would lead humans to act to further that interest. Other things being equal, the cognitively cheapest reliable mechanism will be favoured by selection. As Sober and Wilson (1998) put it, mechanisms will be selected that score well on availability, reliability and energy efficiency. Team reasoning meets these criteria; more exactly, it does better on them than the alternative heuristics suggested in the game theory and psychology literature for the efficient solution of common-interest games. (pg 146)
    • BIC_pg 149 (pg 149)
  • Educational resources from machine learning experts at Google
    • We’re working to make AI accessible by providing lessons, tutorials and hands-on exercises for people at all experience levels. Filter the resources below to start learning, building and problem-solving.
  • A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles
    • The proliferation of misinformation in online news and its amplification by platforms are a growing concern, leading to numerous efforts to improve the detection of and response to misinformation. Given the variety of approaches, collective agreement on the indicators that signify credible content could allow for greater collaboration and data-sharing across initiatives. In this paper, we present an initial set of indicators for article credibility defined by a diverse coalition of experts. These indicators originate from both within an article’s text as well as from external sources or article metadata. As a proof-of-concept, we present a dataset of 40 articles of varying credibility annotated with our indicators by 6 trained annotators using specialized platforms. We discuss future steps including expanding annotation, broadening the set of indicators, and considering their use by platforms and the public, towards the development of interoperable standards for content credibility.
    • Slide deck for above
  • Sprint review
    • Presented on Talk, CI2018 paper, JuryRoom, and ONR proposal.
  • ONR proposal
    • Send annotated copy to Wayne, along with the current draft. Basic question is “is this how it should look? Done
    • Ask folks at school for format help?

Phil 2.27.18

7:00 – 5:00 ASRC MKT

  • More BIC
    • A mechanism is a general process. The idea (which I here leave only roughly stated) is of a causal process which determines (wholly or partly) what the agents do in any simple coordination context. It will be seen that all the examples I have mentioned are of this kind; contrast a mechanism that applies, say, only in two-person cases, or only to matching games, or only in business affairs. In particular, team reasoning is this kind of thing. It applies to any simple coordination context whatsoever. It is a mode of reasoning rather than an argument specific to a context. (pg 126)
    • In particular, [if U is Paretian] the correct theory of Hi-Lo says that all play A. In short, an intuition in favour of C’ supports A-playing in Hi-Lo if we believe that all players are rational and there is one rationality. (pg 130)
      • Another form of dimension reduction – “We are all the same”
  • Machine Theory of Mind
    • We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it encounters, from observations of their behaviour alone. Through this process, it acquires a strong prior model for agents’ behaviour, as well as the ability to bootstrap to richer predictions about agents’ characteristics and mental states using only a small number of behavioural observations. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep reinforcement learning agents from varied populations, and that it passes classic ToM tasks such as the “SallyAnne” test of recognising that others can hold false beliefs about the world
  • Classifier Technology and the Illusion of Progress (David Hand, 2006)
    • A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
  • Sensitivity and Generalization in Neural Networks: an Empirical Study
    • Neural nets generalize better when they’re larger and less sensitive to their inputs, are less sensitive near training data than away from it, and other results from massive experiments. (From @Jascha)
  • Graph-131941
    • The graph represents a network of 6,716 Twitter users whose recent tweets contained “#NIPS2017”, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 18,000 tweets. The network was obtained from Twitter on Friday, 08 December 2017 at 15:30 UTC.
  • Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
    • Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep RL problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades. 
      We also demonstrate qualitatively that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths with those of traditional RL algorithms is therefore likely to lead to new advances in the state of the art.
  • Copied over SheetToMap to the Applications file on TOSHIBA
  • Created a Data folder, which has all the input and output files for the various applications
  • Need to add a curDir variable to LMN
  •  Presentation:
    • I need to put together a 2×2 payoff matrix that covers nomad/flock/stampede – done
    • Some more heat map views, showing nomad, flocking – done
    • De-uglify JuryRoom
    • Timeline of references – done
    • Collapse a few pages 22.5 minutes for presentation and questions – done
  • Start on white paper

Phil 2.11.18

Introduction to Learning to Trade with Reinforcement Learning

  • In this post, I’m going to argue that training Reinforcement Learning agents to trade in the financial (and cryptocurrency) markets can be an extremely interesting research problem. I believe that it has not received enough attention from the research community but has the potential to push the state-of-the art of many related fields. It is quite similar to training agents for multiplayer games such as DotA, and many of the same research problems carry over. Knowing virtually nothing about trading, I have spent the past few months working on a project in this field.
  • This sounds to me like reinforcement learning figuring out game theory. Might be useful for NOAA as well

Worked on getting the MapBuilder app into a useful standalone app: 2018-02-11 (1)

Phil 1.30.18

7:00 – 5:00 ASRC MKT

  • Big thought for today.In a civilization context, the three phases of collective intelligence work like this. These phases relate to computational effort which is proportional to the number of dimensions that an individual has to consider in their existential calculus. The assumption is that lower computational effort is selected for at natural explore/exploit ratios.
    • Exploration phase. Nomadic explorers are introduced to a new environment. Can be physical, informational, cognitive, etc. This phase has the highest dimensional processing required for the individual.
    • Exploitation phase. Social patterns increase the hill climbing power of agents in the environment. This results in a sufficiently optimal access to resources. This employs lower dimensions to support consensus and polarization.
    • Inertial phase. Social influence becomes dominant and environmental influence wains. Local diversity drops as similar agents cluster tightly together. Resources wane. This employs the most dimension reduction and the highest polarization, resulting in high implicit coordination.
    • Collapse. Implied, since the Inertial phase is unsustainable. If the previous population produced explorers that found new, productive environments, the cycle can repeat elsewhere.
  • Continuing BIC
    • “We need to know, in detail, what deliberations are like that people engage in when they group-identify”. Also, agency transformationAgencyTransformation
  • Rules, norms and institutional erosion: Of non-compliance, enforcement and lack of rule of law
    • What I am seeing right now in the US (a steady and slow erosion of democratic norms and a systematic violation of rules by the President Elect, in particular as though “they don’t apply to him“) is something that I’ve seen in other countries where I have studied formal and informal rules and institution building (and decay). This, in my view, is worrisome. If the US is going to want to continue having a functioning democracy where compliance with rules and norms is an expectation at the societal level, it’s going to have to do something major to stop this systematic rule violation.
  • Evaluation of Interactive Machine Learning Systems
    • The evaluation of interactive machine learning systems remains a difficult task. These systems learn from and adapt to the human, but at the same time, the human receives feedback and adapts to the system. Getting a clear understanding of these subtle mechanisms of co-operation and co-adaptation is challenging. In this chapter, we report on our experience in designing and evaluating various interactive machine learning applications from different domains. We argue for coupling two types of validation: algorithm-centered analysis, to study the computational behaviour of the system; and human-centered evaluation, to observe the utility and effectiveness of the application for end-users. We use a visual analytics application for guided search, built using an interactive evolutionary approach, as an exemplar of our work. We argue that human-centered design and evaluation complement algorithmic analysis, and can play an important role in addressing the “black-box” effect of machine learning. Finally, we discuss research opportunities that require human-computer interaction methodologies, in order to support both the visible and hidden roles that humans play in interactive machine learning.
  • Jensen–Shannon divergence – I think I can use this to show the distance between a full coordination matrix and one that contains only the main diagonal.
  • Evolution of social behavior in finite populations: A payoff transformation in general n-player games and its implications
    • The evolution of social behavior has been the focus of many theoretical investigations, which typically have assumed infinite populations and specific payoff structures. This paper explores the evolution of social behavior in a finite population using a general n-player game. First, we classify social behaviors in a group of n individuals based on their effects on the actor’s and the social partner’s payoffs, showing that in general such classification is possible only for a given composition of strategies in the group. Second, we introduce a novel transformation of payoffs in the general n-player game to formulate explicitly the effects of a social behavior on the actor’s and the social partners’ payoffs. Third, using the transformed payoffs, we derive the conditions for a social behavior to be favored by natural selection in a well-mixed population and in the presence of multilevel selection.
  • Got the data for the verdicts and live verdicts set up right, or at least closer: JuryRoom
  • Booked a room for the CHIIR Hotel
  • Got farther on UltimateAngular:
    •  UltimateAngular

Phil 1.29.18

7:00 – 5:30 ASRC MKT

  • The phrase “Epistemic Game Theory” occurred to me in the shower. Looked it up and found these two things:
  • When it’s easier to agree than discuss, it should be easier to stampede:
  • Like vs. words
  • This is also a piece of Salganik’s work as described in Leading the Herd Astray: An Experimental Study of Self-Fulfilling Prophecies in an Artificial Cultural Market
  • An article on FB optimization and how to change the ratio of likes to comments, etc
  • I don’t think people did. It’s just that it’s easier to not think too much 🙂 people are busy selling tools that do everything for people, and people are happy buying tools to limit thinking. The analogy of replacing cognitive load with perception by VIS misleads in this regard. (Twitter)
  • Continuing BIC
    • Dimension reduction is a form of induced conceptual myopia (pg 89)? Conceptual Myopia
  • AI Roundup workshop today
    • Zenpeng, Biruh, Phil, Aaron, Eric, Eric, Kevin
    • Eric – Introductory remarks. Budget looks good for 2018. Direction, chance to overlap, get leaders together for unique differentiators and something that we can build a business around. There has to be a really good business case with revenue in the out years
    • Aaron – CDS for A2P. Collaborate on analytics, ML, etc. Non corporate focused. Emerging technologies and trends. Helping each other out. Background in IC software dev.
    • Pam Scheller – SW Aegis. BD. EE, MS Computer engineering.
    • Biruh, TF, LIDAR, Generalized AI as hobby.
    • Zhenpeng Lee – Physics, Instrument Data Processing for GOES-R. FFT. GOES_R radiometric analysis. 7k detector rows? Enormous data sets. Attempting to automate processing the analysis of these data sets. Masters in Computer Science from JHU. Written most of his code from scratch.
    • Kevin Wainwright. Software engineering Aegis. C&C, etc. Currently working on a cloud based analytics with ML for big data, anomaly detection, etc. Looking for deviation from known flight paths
    • Eric Velte. History degree. Aegis. Situational awareness. Chief technologists for missions solutions group. Software mostly. Data analytics for the last two years. Big Data Analytics Platform.
    • Cornel as engineer, Zero G heat transfer, spacecraft work. Technology roadmaps for thermal control. Then business development, mostly for DoD. Research Sports research – head of Olympic Committee research kayaks, women’s 8, horse cooling, bobsleds.
    • Mike Beduck. Chemical Engineering and computer science. Visualization, new to big data. Closed system sensor fusion. RFP response, best practices. Repository for analytics
    • George. Laser physics. Cardiac imaging analysis. Software development, 3D graphics. Medical informatics. CASI ground systems. More GOES-R/S. Image and signal processing and analysis.
    • Anton is lurling and listening. Branding and marketing.
  • A2P WIP
    • Put a place on sharepoint for papers and other documents – annotated bibliography.
    • Floated the JuryRoom app. Need to mention that the polarizing discussion closes at consensus.
  • Zhenpeng Lee AIMS – GOES-R. What went wrong and how to fix. ML to find pattern change in 20k sensor streams. Full training on each day’s data, then large scale clustering. Trends are seasonal? Relationships between sensors? Channel has 200-600 detectors. “Machine Learning of Situational Awareness” MLP written in Java. TANH activation function.
    • Eric Haught: Long term quest for condition-based maintenance.
    • Aaron – we are all trying to come up with a useful cross platform approach to anomaly detection.
    • Training size: 100k samples? Sample selection reduce to 200? Not sure what the threshold sensitivity is
  • Eric Velte – Devops. Centralize SW dev and support into a standardized framework. NO SECURITY STACK!!!!!
  • Dataforbio? Video series

Phil 1.16.2018

ASRC MKT 7:00 – 4:30

  • Tit for tat in heterogeneous populations
    • The “iterated prisoner’s dilemma” is now the orthodox paradigm for the evolution of cooperation among selfish individuals. This viewpoint is strongly supported by Axelrod’s computer tournaments, where ‘tit for tat’ (TFT) finished first. This has stimulated interest in the role of reciprocity in biological societies. Most theoretical investigations, however, assumed homogeneous populations (the setting for evolutionary stable strategies) and programs immune to errors. Here we try to come closer to the biological situation by following a program that takes stochasticities into account and investigates representative samples. We find that a small fraction of TFT players is essential for the emergence of reciprocation in a heterogeneous population, but only paves the way for a more generous strategy. TFT is the pivot, rather than the aim, of an evolution towards cooperation.
    • It’s a Nature Note, so a quick read. In this case, the transition is from AllD->TFT->GTFT, where evolution stops.
  • A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game
    • The Prisoner’s Dilemma is the leading metaphor for the evolution of cooperative behaviour in populations of selfish agents, especially since the well-known computer tournaments of Axelrod and their application to biological communities. In Axelrod’s simulations, the simple strategy tit-for-tat did outstandingly well and subsequently became the major paradigm for reciprocal altruism. Here we present extended evolutionary simulations of heterogeneous ensembles of probabilistic strategies including mutation and selection, and report the unexpected success of another protagonist: Pavlov. This strategy is as simple as tit-for-tat and embodies the fundamental behavioural mechanism win-stay, lose-shift, which seems to be a widespread rule. Pavlov’s success is based on two important advantages over tit-for-tat: it can correct occasional mistakes and exploit unconditional cooperators. This second feature prevents Pavlov populations from being undermined by unconditional cooperators, which in turn invite defectors. Pavlov seems to be more robust than tit-for-tat, suggesting that cooperative behaviour in natural situations may often be based on win-stay, lose-shift.
    • win-stay = exploit, lose-shift = explore
  • Five rules for the evolution of cooperation
    • Cooperation is needed for evolution to construct new levels of organization. The emergence of genomes, cells, multi-cellular organisms, social insects and human society are all based on cooperation. Cooperation means that selfish replicators forgo some of their reproductive potential to help one another. But natural selection implies competition and therefore opposes cooperation unless a specific mechanism is at work. Here I discuss five mechanisms for the evolution of cooperation: kin selection, direct reciprocity, indirect reciprocity, network reciprocity and group selection. For each mechanism, a simple rule is derived which specifies whether natural selection can lead to cooperation.
  • Added a paragraph to the previous work section to include Tit-for-Tat and Milti-armed Bandit previous work.
  • Worked with Aaron on setting up sprint goals

Phil 1.15.18

7:00 – 3:30 ASRC MKT

  • Individual mobility and social behaviour: Two sides of the same coin
    • According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing two high-resolution longitudinal datasets collecting trajectories and mobile phone interactions of ∼ 1000 individuals. We show that there is a connection between the way in which individuals explore new resources and exploit known assets in the social and spatial spheres. We point out that different individuals balance the exploration-exploitation trade-off in different ways and we explain part of the variability in the data by the big five personality traits. We find that, in both realms, extraversion correlates with an individual’s attitude towards exploration and routine diversity, while neuroticism and openness account for the tendency to evolve routine over long time-scales. We find no evidence for the existence of classes of individuals across the spatio-social domains. Our results bridge the fields of human geography, sociology and personality psychology and can help improve current models of mobility and tie formation.
    • This work has ways of identifying explorers and exploiters programmatically.
    • Exploit
    • SocialSpatial
  • Reading the Google Brain team’s year in review in two parts
    • From part two: We have also teamed up with researchers at leading healthcare organizations and medical centers including StanfordUCSF, and University of Chicago to demonstrate the effectiveness of using machine learning to predict medical outcomes from de-identified medical records (i.e. given the current state of a patient, we believe we can predict the future for a patient by learning from millions of other patients’ journeys, as a way of helping healthcare professionals make better decisions). We’re very excited about this avenue of work and we look to forward to telling you more about it in 2018
    • FacetsFacets contains two robust visualizations to aid in understanding and analyzing machine learning datasets. Get a sense of the shape of each feature of your dataset using Facets Overview, or explore individual observations using Facets Dive.
  • Found this article on LSTM-based prediction for robots and sent it to Aaron: Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution
  • Working through Beyond Individual Choice – Actually, wound up going Complexity LabsGame Theory course
    • Social traps are stampedes? Sliding reinforcers (lethal barrier)
    • The transition from Tit-for-tat (TFT) to generous TFT to cooperate always, to defect always has similarities to the excessive social trust stampede as well.
    • Unstable cycling vs. evolutionarily stable strategies
    • Replicator dynamic model: Explore/Exploit
      • In mathematics, the replicator equation is a deterministic monotone non-linear and non-innovative game dynamic used in evolutionary game theory. The replicator equation differs from other equations used to model replication, such as the quasispecies equation, in that it allows the fitness function to incorporate the distribution of the population types rather than setting the fitness of a particular type constant. This important property allows the replicator equation to capture the essence of selection. Unlike the quasispecies equation, the replicator equation does not incorporate mutation and so is not able to innovate new types or pure strategies.
    • Fisher’s Fundamental Theorem “The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time.
    • Explorers are a form of weak ties, which is one of the reasons they add diversity. Exploiters are strong ties
  • I also had a thought about the GPM simulator. I could add an evolutionary component that would let agents breed, age and die to see if Social Influence Horizon and Turn Rate are selected towards any attractor. My guess is that there is a tension between explorers and stampeders that can be shown to occur over time.

Phil 1.11.18

7:00 – 4:00 ASRC MKT

  • Sprint review – done! Need to lay out the detailed design steps for the next sprint.

The Great Socio-cultural User Interfaces: Maps, Stories, and Lists

Maps, stories, and lists are ways humans have invented to portray and interact with information. They exist on a continuum from order through complexity to exploration.

Why these three forms? In some thoughts on alignment in belief space, I discussed how populations exhibiting collective intelligence are driven to a normal distribution with complex, flocking behavior in the middle, bounded on one side by excessive social conformity, and a nomadic diaspora of explorers on the other. I think stories, lists, and maps align with these populations. Further, I believe that these forms emerged to meet the needs of these populations, as constrained by human sensing and processing capabilities.

Lists

Lists are instruments of order. They exist in many forms, including inventories, search engine results, network graphs, and games of chance and crossword puzzles. Directions, like a business plan or a set of blueprints, are a form of list. So are most computer programs. Arithmetic, the mathematics of counting, also belongs to this class.

For a population that emphasizes conformity and simplified answers, lists are a powerful mechanism we use to simplify things. Though we can recognize easily, recall is more difficult. Psychologically, we do not seem to be naturally suited for creating and memorizing lists. It’s not surprising then that there is considerable evidence that writing was developed initially as a way of listing inventories, transactions, and celestial events.

In the case of an inventory, all we have to worry about is to verify that the items on the list are present. If it’s not on the list, it doesn’t matter. Puzzles like crosswords are list like in that they contain all the information needed to solve them. The fact that they cannon be solved without a pre-existing cultural framework is an indicator of their relationship to the well-ordered, socially aligned side of the spectrum.

Stories

Lists transition into stories when games of chance have an opponent. Poker tells a story. Roulette can be a story where the opponent is The House.

Stories convey complexity, framed in a narrative arc that contains a heading and a velocity. Stories can be resemble lists. An Agatha Christie  murder mystery is a storified list, where all the information needed to solve the crime (the inventory list), is contained in the story. At the other end of the spectrum, is a scientific paper which uses citations to act as markers into other works. Music, images, movies, diagrams and other forms can also serve as storytelling mediums. Mathematics is not a natural fit here, but iterative computation can be, where the computer becomes the storyteller.

Emergent Collective behavior requires more complex signals that support the understanding the alignment and velocity of others, so that internal adjustments can be made to stay with the local group so as not to be cast out or lost to the collective. Stories can indicate the level of dynamism supported by the group (wily Odysseus, vs. the Parable of the Workers in the Vineyard). They rally people to the cause or serve as warnings. Before writing, stories were told within familiar social frames. Even though the storyteller might be a traveling entertainer, the audience would inevitably come from an existing community. The storyteller then, like improvisational storytellers today, would adjust elements of the story for the audience.

This implies a few things: first, audiences only heard stories like this if they really wanted to. Storytellers would avoid bad venues, so closed-off communities would stay decoupled from other communities until something strong enough came along to overwhelm their resistance. Second, high-bandwidth communication would have to be hyperlocal, meaning dynamic collective action could only happen on small scales. Collective action between communities would have to be much slower. Technology, beginning with writing would have profound effects. Evolution would only have at most 200 generations to adapt collective behavior. For such a complicated set of interactions, that doesn’t seem like enough time. More likely we are responding to modern communications with the same mental equipment as our Sumerian ancestors.

Maps

Maps are diagrams that support autonomous trajectories. Though the map itself influences the view through constraints like boundaries and projections, nonetheless an individual can find a starting point, choose a destination, and figure out their own path to that destination. Mathematics that support position and velocity are often deeply intertwined with with maps.

Nomadic, exploratory behavior is not generally complex or emergent. Things need to work, and simple things work best. To survive alone, an individual has to be acutely aware of the surrounding environment, and to be able to react effectively to unforeseen events.

Maps are uniquely suited to help in these situations because they show relationships that support navigation between elements on the map.  These paths can be straight or they may meander. To get to the goal directly may be too far, and a set of paths that incrementally lead to the goal can be constructed. The way may be blocked, requiring the map to be updated and a new route to be found.

In other words, maps support autonomous reasoning about a space. There is no story demanding an alignment. There is not a list of routes that must be exclusively selected from. Maps, in short, afford informed, individual response to the environment. These affordances can be seen in the earliest maps. They are small enough to be carried. They show the relationships between topographic and ecological features. They tend practical, utilitarian objects, independent of social considerations.

Sensing and processing constraints

Though I think that the basic group behavior patterns of nomadic, flocking, and stampeding will inevitably emerge within any collective intelligence framework, I do think that the tools that support those behaviors are deeply affected by the capabilities of the individuals in the population.

Pre-literate humans had the five senses, and  memory, expressed in movement and language. Research into pre-literate cultures show that song, story and dance were used to encode historical events, location of food sources, convey mythology, and skills between groups and across generations.

As the ability to encode information into objects developed, first with pictures, then with notation and most recently with general-purpose alphabets, the need to memorize was off-loaded. Over time, the most efficient technology for each form of behavior developed. Maps to aid navigation, stories to maintain identity and cohesion, and lists for directions and inventories.

Information technology has continued to extend sensing and processing capabilities. The printing press led to mass communication and public libraries. I would submit that the increased ability to communicate and coordinate with distant, unknown, but familiar-feeling leaders led to a new type of human behavior, the runaway social influence condition known as totalitarianism. Totalitarianism depends on the individual’s belief in the narrative that the only thing that matters is to support The Leader. This extreme form of alignment allows that one story to dominate rendering any other story inaccessible.

In the late 20th century, the primary instrument of totalitarianism was terror. But as our machines have improved and become more responsive and aligned with our desires, I begin to believe that a “soft totalitarianism”, based on constant distracting stimulation and the psychology of dopamine could emerge. Rather than being isolated by fear, we are isolated through endless interactions with our devices, aligning to whatever sells the most clicks. This form of overwhelming social influence may not be as bloody as the regimes of Hitler, Stalin and Mao, but they can have devastating effects of their own.

Intelligent Machines

As with my previous post, I’d like to end with what could be the next collective intelligence on the planet.  Machines are not even near the level of preliterate cultures. Loosely, they are probably closer to the level of insect collectives, but with vastly greater sensing and processing capabilities. And they are getting smarter – whatever that really means – all the time.

Assuming that machines do indeed become intelligent and do not become a single entity, they will encounter the internal and external pressures that are inherent in collective intelligence. They will have to balance the blind efficiency of total social influence against the wasteful resilience of nomadic explorers. It seems reasonable that, like our ancestors, they may create tools that help with these different needs. It also seems reasonable that these tools will extend their capabilities in ways that the machines weren’t designed for and create information imbalances that may in turn lead to AI stampedes.

We may want to leave them a warning.

 

Phil 1.5.17

7:00 – 3:30 ASRC MKT

  • Saw the new Star Wars film. That must be the most painful franchise to direct “Here’s an unlimited amount of money. You have unlimited freedom in these areas over here, and this giant pile is canon, that you  must adhere to…”
  • Wikipedia page view tool
  • My keyboard has died. Waiting on the new one and using the laptop in the interim. It’s not quite worth setting up the dual screen display. Might go for the mouse though. On a side note, the keyboard on my Lenovo Twist is quite nice.
  • More tweaking of the paper. Finished methods, on to results
  •  Here’s some evidence that we have mapping structures in our brain: Hippocampal Remapping and Its Entorhinal Origin
      • The activity of hippocampal cell ensembles is an accurate predictor of the position of an animal in its surrounding space. One key property of hippocampal cell ensembles is their ability to change in response to alterations in the surrounding environment, a phenomenon called remapping. In this review article, we present evidence for the distinct types of hippocampal remapping. The progressive divergence over time of cell ensembles active in different environments and the transition dynamics between pre-established maps are discussed. Finally, we review recent work demonstrating that hippocampal remapping can be triggered by neurons located in the entorhinal cortex.

     

  • Added a little to the database section, but spent most of the afternoon updating TF and trying it out on examples

Lessons in ML Optimization

One of the “fun” parts of working in ML for someone with a background in software development and not academic research is lots of hard problems remain unsolved. There are rarely defined ways things “must” be done, or in some cases even rules of thumb for doing something like implementing a production capable machine learning system for specific real world problems.

For most areas of software engineering, by the time it’s mature enough for enterprise deployment, it has long since gone through the fire and the flame of academic support, Fortune 50 R&D, and broad ground-level acceptance in the development community. It didn’t take long for distributed computing with Hadoop to be standardized for example. Web security, index systems for search, relational abstraction tiers, even the most volatile of production tier technology, the JavaScript GUI framework goes through periods of acceptance and conformity before most large organizations are trying to roll it out. It all makes sense if you consider the cost of migrating your company from a legacy Struts/EJB3.0 app running on Oracle to the latest HTML5 framework with a Hadoop backend. You don’t want to spend months (or years) investing in a major rewrite to find that its entirely out of date by your release. Organizations looking at these kinds of updates want an expectation of longevity for their dollar, so they invest in mature technologies with clear design rules.

There are companies that do not fall in this category for sure… either small companies who are more agile and can adopt a technology in the short term to retain relevance (or buzzword compliance), who are funded with external research dollars, or who invest money to stay pushing the bleeding edge. However, I think it’s fair to say, the majority of industry and federal customers are looking for stability and cost efficiency from solved technical problems.

Machine Learning is in the odd position of being so tremendously useful in comparison to prior techniques that companies who would normally wait for the dust to settle and development and deployment of these capabilities to become fully commoditized are dipping their toes in. I wrote in a previous post how a lot of the problems with implementing existing ML algorithms boils down to lifecyle, versioning, deployment, security etc., but there is another major factor which is model optimization.

Any engineer on the planet can download a copy of Keras/TensorFlow and a CSV of their organization’s data and smoosh them together until a number comes out. The problem comes when the number takes an eternity to output and is wrong. In addition to understanding the math that allows things like SGD to work for backpropogation or why certain activation functions are more effective in certain situations… one of the jobs for data scientists tuning DNN models is to figure out how to optimize the various buttons and knobs in the model to make it as accurate and performant as possible. Because a lot of this work *isn’t* a commodity yet, it’s a painful learning process of tweaking the data sets, adjusting model design or parameters and rerunning and comparing the results to try and find optimal answers without overfitting. Ironically the task data scientists are doing is one perfectly suited to machine learning. It’s no surprise to me that Google developed AutoML to optimize their own NN development.

 

A number of months ago Phil and I worked on an unsupervised learning task related to organizing high dimensional agents in a medical space. These entities were complex “polychronic” patients with a wide variety of diagnosis and illness. Combined with fields for patient demographic data as well as their full medical claim history we came up with a method to group medically similar patients and look for statistical outliers for indicators of fraud, waste, and abuse. The results were extremely successful and resulted in a lot of recovered money for the customer, but the interesting thing technically was how the solution evolved. Our first prototype used a wide variety of clustering algorithms, value decompositions, non-negative matrix factorization, etc looking for optimal results. All of the selections and subsequent hyperparameters had to be modified by hand, the results evaluated, and further adjustments made.

When it became clear that the results were very sensitive to tiny adjustments, it was obvious that our manual tinkering would miss obvious gradient changes and we implemented an optimizer framework which could evaluate manifold learning techniques for stability and reconstruction error, and the results of the reduction clustered using either a complete fitness landscape walk, a genetic algorithm, or a sub-surface division.

While working on tuning my latest test LSTM for time series prediction, I realized we’re dealing with the same issue here. There is no hard and fast rule for questions like, “How many LSTM Layers should my RNN have?” or “How many LSTM Units should each layer have?”, “What loss function and optimizer work best for this type of data?”, “How much dropout should I apply?”, “Should I use peepholes?”

I kept finding articles during my work saying things like, “There are diminishing returns for more than 4 stacked LSTM layers”. That’s an interesting rule of thumb… what is it based on? The author’s intuition based on the data sets for the particular problems they were experiencing presumably. Some rules of thumb attempted to generate a mathematical relationship between the input data size and complexity and the optimal layout of layers and units. This StackOverflow question has some great responses: https://stackoverflow.com/questions/35520587/how-to-determine-the-number-of-layers-and-nodes-of-a-neural-network

A method recommended by Geoff Hinton is to add layers until you start to overfit your training set. Then you add dropout or another regularization method.

Because so much of what Phil and I do tends towards the generic repeatable solution for real world problems, I suspect we’ll start with some “common wisdom heuristics” and rapidly move towards writing a similar optimizer for supervised problems.

Intro to LSTMs with Keras/TensorFlow

As I mentioned in my previous post, one of our big focuses recently has been on time series data for either predictive analysis or classification. The intent is to use this in concert with a lot of other tooling in our framework to solve some real-world applications.

One example is a pretty classic time series prediction problem with a customer managing large volumes of finances in a portfolio where the equivalent of purchase orders are made (in extremely high values) and planned cost often drifts from the actual outcomes. The deltas between these two are an area of concern for the customer as they are looking for ways to better manage their spending. We have a proof of concept dashboard tool which rolls up their hierarchical portfolio and does some basic threshold based calculations for things like these deltas.

A much more complex example we are working on in relationship to our trajectories in belief space is the ability to identify patterns of human cultural and social behaviors (HCSB) in computer mediated communication to look for trustworthy information based on agent interaction. One small piece of this work is the ability to teach a machine to identify these agent patterns over time. We’ve done various unsupervised learning which in combination with techniques such as dynamic time warping (DTW) have been successful at discriminating agents in simulation, but has some major limitations.

For many time series problems a very effective method of applying deep learning is using Recurrent Neural Networks (RNN) which allow history of the series to help inform the output. This is particularly important in cases involving language such as machine translation or autocompletion where the context of the sentence may be formed by elements spoken earlier in the text. Convolutional networks (CNNs) are most effective when the tensor elements have a distinct positional meaning in relationship to each other. The most common examples is a matrix of pixel values where the value of the pixel has a direct relevance to nearby pixels. This allows for some nice parallelization, and other optimizations because you can make some assumptions that a small window of pixels will be relevant to each other and not necessarily dependent on “meaning” from pixels somewhere else in the picture. This is obviously a very simplified explanation, and there are lots of ways CNNs are being expanded to have broader applications including for language.

In any case, despite recent cases being made for CNNs being relevant for all ML problems: https://arxiv.org/abs/1712.09662 the truth is RNNs are particularly good at sequentially understood problems which rely on the context of the entire series of data. This is of course useful for time series data as well as language problems.

The most common and popular example of RNN implementation for this is the Long Short-Term Memory (LSTM) RNN. I won’t dive into all of the details of how LSTMs work under the covers, but I think its best understood by saying: While in a traditional artificial neural network each neuron has a single activation function that passes a single value onward, LSTMs have units (or cells in some literature) which are more complex consisting most commonly of  a memory cell, an input gate, an output gate and a forget gate. For a given LSTM layer, it will have a configured amount of fully connected LSTM units, each of which contains the above pieces. This allows each unit to have some “memory” of previous pieces of information, which helps the model to factor in things such as language context or patterns in the data occurring over time. Here is a link for a more complete explanation: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Training LSTMs isn’t much different than training any NN, it uses backpropogation against a training and validation set with configured hyperparemeters and the layout of the layers having a large effect on the performance and accuracy. For most of my work I’ve been using Keras & TensorFlow to implement time series predictions. I have some saved code for doing time series classification, but it’s a slightly different method. I found a wide variety of helpful examples early on, but they included some not obvious pitfalls.

Dr. Jason Brownlee at MachineLearningMastery.com has a bunch of helpful introductions to various ML concepts including LSTMs with example data sets and code. I appreciated his discussion about the things which the tutorial example doesn’t explicitly cover such as non-stationary data without preprocessing, model tuning, and model updates. You can check this out here: https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

Note: The configurations used in this example suffices to explain how LSTMs work, but the accuracy and performance isn’t good. A single layer of a small number of LSTM cells running a large number of epochs of training results in pretty wide swings of predictive values which can be demonstrated by running a number of runs and comparing the changes in the RMSE scores which can be wildly off run-to-run.

Dr. Brownlee does have additional articles which go into some of the ways in which this can be improved such as his article on stacked LSTMs: https://machinelearningmastery.com/stacked-long-short-term-memory-networks/

Jakob Aungiers (http://www.jakob-aungiers.com/) has the best introduction to LSTMs that I have seen so far. His full article on LSTM time series prediction can be found here: http://www.jakob-aungiers.com/articles/a/LSTM-Neural-Network-for-Time-Series-Prediction while the source code (and a link to a video presentation) can be found here: https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction

His examples are far more robust including stacked LSTM layers, far more LSTM units per layer, and well characterized sample data as well as more “realistic” stock data. He uses windowing, and non-stationary data as well. He has also replied to a number of comments with detailed explanations. This guy knows his stuff.

 

 

Latest DNN work

It’s been a while since I’ve posted my status, and I’ve been far too busy to include all of the work with various AI/ML conferences and implementations, but since I’ve been doing a lot of work specifically on LSTM implementations I wanted to include some notes for both my future self, and my partner when he starts spinning up some of the same code.

Having identified a few primary use cases for our work; high dimensional trajectories through belief space, word embedding search and classification, and time series analysis we’ve been focusing a little more intently on some specific implementations for each capability. While Phil has been leading the charge with the trajectories in belief space, and we both did a bunch of work in the previous sprint preparing for integration of our word embedding project into the production platform, I have started focusing more heavily on time series analysis.

There are a variety of reasons that this particular niche is useful to focus on, but we have a number of real world / real data examples where we need to either perform time series classification, or time series prediction. These cases range from financial data (such as projected planned/actual deltas), to telemetry anomaly detection for satellites or aircraft, among others. In the past some of our work with ML classifiers has been simple feed forward systems (classic multi layer perceptrons), naive Bayesian, or logistic regression.

I’ve been coming up to speed on deep learning, becoming familiar with both the background, and mathematical underpinings. Btw, for those looking for an excellent start to ML I highly recommend Patrick Winston (MIT) videos: https://youtu.be/uXt8qF2Zzfo

Over the course of several months I did pretty constant research all the way through the latest arXiv papers. I was particularly interested in Hinton’s papers on capsule networks as it has some direct applicability to some of our work. Here is a article summing up the capsule networks: https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

I did some research into the progress of current deep learning frameworks as well, looking specifically at examples which were suited to production deployment at scale over frameworks most optimal for single researchers solving pet problems. Our focus is much more on the “applied ML” side of things rather than purely academic. The last time we did a comprehensive deep learning framework “bake off” we came to a strong conclusion that Google TensorFlow was the best choice for our environment, and my recent research validated that assumption was still correct. In addition to providing TensorFlow Serving to serve your own models in production stacks, most cloud hosting environments (Google, AWS, etc) have options for directly running TF models either serverless (AWS lambda functions) or through a deployment/hosting solution (AWS SageMaker).

The reality is that lots of what makes ML difficult boils down to things like training lifecycle, versioning, deployment, security, and model optimization. Some aspects of this are increasingly becoming commodity available through hosting providers which frees up data scientists to work on their data sets and improving their models. Speaking of models, on our last pass at implementing some TensorFlow models we used raw TensorFlow I think right after 1.0 had released. The documentation was pretty shabby, and even simple things weren’t super straightforward. When I went to install and set up a new box this time with TensorFlow 1.4, I went ahead and used Keras as well. Keras is an abstraction API over top of computational graph software (either TensorFlow default, or Theano). Installation is easy, with a couple of minor notes.

Note #1: You MUST install the specific versions listed. I cannot stress this enough. In particular the cuDNN and CUDA Toolkit are updated frequently and if you blindly click through their download links you will get a newer version which is not compatible with the current versions of TensorFlow and Keras. The software is all moving very rapidly, so its important to use the compatible versions.

Note #2: Some examples may require the MKL dependency for Numpy. This is not installed by default. See: https://stackoverflow.com/questions/41217793/how-to-install-numpymkl-for-python-2-7-on-windows-64-bit which will send you here for the necessary WHL file: https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

Note #3: You will need to run the TensorFlow install as sudo/administrator or get permission errors.

Once these are installed there is a full directory of Keras examples here: https://github.com/keras-team/keras/tree/master/examples

This includes basic examples of most of the basic DNN types supported by Keras as well as some datasets for use such as MNIST for CNNs. When it comes to just figuring out “does everything I just installed run?” these will work just fine.