Category Archives: Machine Learning

Phil 12.9.19

7:00 – 8:00 ASRC

  • Saw this on Twitter this morning: Training Agents using Upside-Down Reinforcement Learning
    • Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques. Many of its main principles are outlined in a companion report [34]. Here we present the first concrete implementation of UDRL and demonstrate its feasibility on certain episodic learning problems. Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.
  • I wonder how it compares with Stuart Russell’s paper Cooperative Inverse Reinforcement Learning
    • For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial- information game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.
  • Dissertation
    • In the Ethics section, change ‘civilization’ to ‘culture’, and frame it in terms of the simulation – done
    • Last slide should be ‘Thanks for coming to my TED talk’
    • Ping Don’s composer and choreographer, if I can find them
    • Cool! A T-O style universe map (Unmismoobjetivo , via Wikipedia). The logarithmic distance effect is something that I need to look into: universe
  • Evolver
    • Quickstart
    • User’s guide
    • Finished commenting!
    • Flailing on geting the documentation tools to work.
  • ML Seminar
    • Double Crab Cake Platter (2) – 2 Vegetables – $34.00
    • Went over the Evolver. The Ensemble charts really make an impression, but overall, the code walkthrough is too difficult – there are two many moving parts. I need to write a paper with screengrabs that walk through the whole process. I’ll need to evaluate against Bayesian tuners, but I also have architecture search
    • The venue could be IEEE ICTAI 2020: The IEEE International Conference on Tools with Artificial Intelligence (ICTAI) is a leading Conference of AI in the Computer Society providing a major international forum where the creation and exchange of ideas related to artificial intelligence are fostered among academia, industry, and government agencies. It will be in Baltimore, I think.
  • Meeting with Aaron. He thinks that part of the ethics discussion needs to be an addressing of the status quo

Phil 12.7.19

You can now have an AI DM. AI Dungeon 2. Here’s an article about it: You can do nearly anything you want in this incredible AI-powered game. It looks like a GPT-2 model trained with chooseyouradventure. Here’s the “how we did it”. Wow

The Toxins We Carry (Whitney Phillips)

  • My proposal is that we begin thinking ecologically, an approach I explore with Ryan Milner, a communication scholar, in our forthcoming book You Are Here: A Field Guide for Navigating Polluted Information. From an ecological perspective, Wardle’s term “information pollution” makes perfect sense. Building on Wardle’s definition, we use the inverted form “polluted information” to emphasize the state of being polluted and to underscore connections between online and offline toxicity. One of the most important of these connections is just how little motives matter to outcomes. Online and off, pollution still spreads, and still has consequences downstream, whether it’s introduced to the environment willfully, carelessly, or as the result of sincere efforts to help. The impact of industrial-scale polluters online—the bigots, abusers, and chaos agents, along with the social platforms that enable them—should not be minimized. But less obvious suspects can do just as much damage. The truth is one of them.
  • Taking an ecological approach to misinformation

Phil 12.5.19

ASRC GOES 7:00 – 4:30, 6:30 – 7:00

  • Write up something for Erik and John?
  • Send gdoc link to Bruce – done
  • apply for TF Dev invite – done
  • Schedule physical! – done
  • Dissertation – more Designing for populations
  • Evolver
    • Comment EvolutionaryOptimizer – almost done
    • Comment ModelWriter
    • Quickstart
    • User’s guide
    • Comment the excel utils?
  • Waikato meeting with Alex and Panos

Phil 12.4.19

7:00 – 8:00 ASRC GOES

  • Dissertation – back to designing for populations
  • Timesheet revisions
  • Applying for MS Project
  • Evolver – more documentation
  • GOES Meeting
    • Bought a copy of MS Project for $15
    • Send Erik a note about permission to charge for TF Dev Conf
    • Good chat with Bruce about many things, including CASSIE as a Cloud service
    • Re-send links to common satellite dictionary
    • Vadim got a pendulum working
  • Meeting with Roger
    • Got a tour of the new building
    • Lots of VR discussion
    • Some academic future options

Phil 12.3.19

7:00 – 4:00 ASRC GOES

  • Dissertation – reworked the last paragraph of the Reflection and reflex section
  • Evolver – more documentation
  • Send this out to the HCC mailing list: The introvert’s academic “alternative networking” guide
  • Arpita’s proposal defense
    • Stanford: Open information extraction (open IE) refers to the extraction of relation tuples, typically binary relations, from plain text, such as (Mark Zuckerberg; founded; Facebook). The central difference from other information extraction is that the schema for these relations does not need to be specified in advance; typically the relation name is just the text linking two arguments. For example, Barack Obama was born in Hawaii would create a triple (Barack Obama; was born in; Hawaii), corresponding to the open domain relation was-born-in(Barack-Obama, Hawaii).
    • Open Information Extraction 5
    • UKG Open Information Extraction
    • Supervised Ensemble of Open IE
    • Datasets
      • AW-OIE
      • AW-OIE-C
      • WEB
      • NYT
      • PENN
    • Why the choice of 100 dimensins for your symentic embedding? How does it compare to other dimensions?
    • Contextual embedding for NLP?
    • Input-Output Hidden Markov Model (version on GitHub)

Phil 12.2.19

December! Yikes!

7:00 – 8:00 ASRC GOES

  • Dissertation
    • Designing for populations
  • Evolver
    • Oh, boy – big IDE updates. Hoping nothing breaks
      • Had to connect back to python
      • TF still works!
    • Commenting and documenting
      • Finished ValueAxis.py
      • Starting TF2OptomizerBase.py
  • ML seminar (food fro La Madeleine!)
  • Meeting with Aaron M

Phil 11.27.19

7:00 – 3:00 ASRC GOES

  • Dissertation – Added a bit at the beginning of the discussion section to explain why this should fit in the HCI universe. Started working on the Non-human agents part, and am explaining why things like the GPT-2 create their own low dimensional spaces due to the cost of implementation and the incentives of research
  • Evolver – Commenting and tweaking
    • Done with ValueAxis.py, which contains
      • class ValueAxisType(Enum):
      • class ValueAxis:
      • class EvolveAxis:
      • Example usage, evaluation and class exercising code using
        if __name__ == '__main__':
  • Ran out of space on my primary drive and had to drop everything and fix that

Phil 11.25.19

7:00 – 7:00 ASRC GOES

  • Dissertation – more discussion
    • Added Clark’s Grounding in communication to the lit review
    • Added more to the diversity section. Need to fold ecosystem thinking in
  • Evolver – get copied state nailed down
    • That seems to be working in the test harness:
      vzfunc[0]: Zfunc
      d1={'Zfunc': 2.5, 'Zfunc_function': 'plus_func', 'Zvals1': 1.0, 'Zvals2': 1.5}
      d2={'Zfunc': 2.5, 'Zfunc_function': 'plus_func', 'Zvals1': 1.0, 'Zvals2': 1.5}
      ------------
      vzfunc[1]: Zfunc
      d1={'Zfunc': 4.5, 'Zfunc_function': 'div_func', 'Zvals1': 4.5, 'Zvals2': 1.0}
      d2={'Zfunc': 4.5, 'Zfunc_function': 'div_func', 'Zvals1': 4.5, 'Zvals2': 1.0}
      ------------
      vzfunc[2]: Zfunc
      d1={'Zfunc': 3.5, 'Zfunc_function': 'mult_func', 'Zvals1': 1.0, 'Zvals2': 3.5}
      d2={'Zfunc': 3.5, 'Zfunc_function': 'mult_func', 'Zvals1': 1.0, 'Zvals2': 3.5}
      ------------
      vzfunc[3]: Zfunc
      d1={'Zfunc': 7.5, 'Zfunc_function': 'plus_func', 'Zvals1': 3.5, 'Zvals2': 4.0}
      d2={'Zfunc': 7.5, 'Zfunc_function': 'plus_func', 'Zvals1': 3.5, 'Zvals2': 4.0}
    • Still not setting the values of the EvolveAxis History_list correctly when breeding genomes, I think
  • Fika – slides are done-ish
  • ML – seminar
    • Good point – I need to visit with each of the committee to walk them through the dissertation (possibly with slides?) some time in January. Also, use the conclusions to build a TL;DR version.
  • Meeting with Aaron – nope

 

Phil 11.21.19

7:00 – 4:30ASRC GOES

  • Dissertation
    • Good progress on discussion section
    • I have 222 hours to charge for the rest of the year!
  • Evolver
    • Working out index-based calculations in the test case
    • Found a HUGE bug. I was copying EvolveAxis pointers not values
    • Fixed with copy.deepcopy()
    • Need to add a set_value() for crossover
  • Several hours with Aaron on vehicle identification
  • Nextgen schedule plan – trying to get MSProject
  • JuryRoom Meeting
    • Moved time to 6:30
    • Need to write up a peer review use case

Phil 11.20.19

7:00 – 5:00 ASRC

  • Reading User Experience as a Legitimacy Trap, by Paul Dourish. Solid stuff.
    • Why are HCI researchers and practitioners now on the wrong side of many of the problematic developments in the contemporary technology landscape? Why is it so challenging for us to reformulate the objectives of our discipline and the central values of our educational programs? It is because those were not the basis upon which we argued for the legitimacy of our practice. By legitimizing HCI and its role in technology production in terms of user experience, user delight, and user acceptance—which were only ever means toward other ends—we have ceded the space from which we could argue for the considerations that were actually at the center of the discipline’s ambitions (to nurture and sustain human dignity and flourishing.). 
      • I think I can cite this in the conclusions section, where I think I need to address the issue that some might not consider this appropriate research for an HCI PhD
  •  Dissertation
    • More discussion. Send a note out to folks to workshop on Friday?
    • Mostly spent my time cleaning up the beginning. Didn’t write much new, but clarified and tightened up.
    • Found the original Bellman cite for the curse of dimensionality 
  • Evolver
    • Need to change chromosomes so that they point to the history index in the genome. The args Dict for the user function can be created from that, and the value/parameter spreadsheet can be too.
    • That reconstruction will need to ripple through the arguments axis to the function as well. That might be the problem that I was having yesterday.
  • AIMS Telemetry meeting
    • Need to start an MS-Project chart for nextGen efforts. ASRC doesn’t seem to have Project in its stack?

Phil 11.19.19

7:00 – 4:00 ASRC GOES

  • Disseration
  • Evolver
    • Work on getting all the functions and Evolver->Evolver stacks putting their arguments and return values in the spreadsheet. then adjust the chromosome so that secondary and tertiary values are permuted correctly. I think everything will have to be listed, but certain parts will need to be frozen.
    • Make sure that genomes don’t repeat. Making progress, but it’s complex and slow going. Right now it doesn’t repeat on the value, but I don’t think that’s quite right

Phil 11.18.19

7:00 – 4:00 ASRC GOES

  • Dissertation
    • Finished my notes on the introduction to History of Cartography
    • Started in on the discussion, which is a poorly organized mess
  • Evolver
    • Moving the optimization to a hyperparameter folder in TimeSeriesML2. Validating – it works!
    • Make sure that genomes don’t repeat. Making progress, but it’s complex and slow going. Right now it doesn’t repeat on the value, but I don’t think that’s quite right
    • Getting the parameters to print in the spreadsheet history. That’s mostly working, but the function cur_value isn’t working quite right. This may be affecting the evolution of the system, which hits a plateau.
  • Meeting with Aaron M. Went over the discussion debris, and worked towards getting things to behave. Need to define what a phase is, and remove occurances of social influence distance. Also discussed getting an editor. My bibfile is a mess

Phil 11.14.19

7:00 – 3:30 ASRC GOES

  • Dissertation – Done with Human Study!
  • Evolver
      • Work on parameter passing and function storing
      • You can use the * operator before an iterable to expand it within the function call. For example:
        timeseries_list = [timeseries1 timeseries2 ...]
        r = scikits.timeseries.lib.reportlib.Report(*timeseries_list)
      • Here’s the running code with variable arguments
        def plus_func(v1:float, v2:float) -> float:
            return v1 + v2
        
        def minus_func(v1:float, v2:float) -> float:
            return v1 - v2
        
        def mult_func(v1:float, v2:float) -> float:
            return v1 * v2
        
        def div_func(v1:float, v2:float) -> float:
            return v1 / v2
        
        if __name__ == '__main__':
            func_array = [plus_func, minus_func, mult_func, div_func]
        
            vf = EvolveAxis("func", ValueAxisType.FUNCTION, range_array=func_array)
            v1 = EvolveAxis("X", ValueAxisType.FLOAT, parent=vf, min=-5, max=5, step=0.25)
            v2 = EvolveAxis("Y", ValueAxisType.FLOAT, parent=vf, min=-5, max=5, step=0.25)
        
            for f in func_array:
                result = vf.get_random_val()
                print("------------\nresult = {}\n{}".format(result, vf.to_string()))
      • And here’s the output
        ------------
        result = -1.0
        func: cur_value = div_func
        	X: cur_value = -1.75
        	Y: cur_value = 1.75
        ------------
        result = -2.75
        func: cur_value = plus_func
        	X: cur_value = -0.25
        	Y: cur_value = -2.5
        ------------
        result = 3.375
        func: cur_value = mult_func
        	X: cur_value = -0.75
        	Y: cur_value = -4.5
        ------------
        result = -5.0
        func: cur_value = div_func
        	X: cur_value = -3.75
        	Y: cur_value = 0.75
      • Now I need to get this to work with different functions with different arg lists. I think I can do this with an EvolveAxis containing a list of EvolveAxis with functions. Done, I think. Here’s what the calling code looks like:
        # create a set of functions that all take two arguments
        func_array = [plus_func, minus_func, mult_func, div_func]
        vf = EvolveAxis("func", ValueAxisType.FUNCTION, range_array=func_array)
        v1 = EvolveAxis("X", ValueAxisType.FLOAT, parent=vf, min=-5, max=5, step=0.25)
        v2 = EvolveAxis("Y", ValueAxisType.FLOAT, parent=vf, min=-5, max=5, step=0.25)
        
        # create a single function that takes no arguments
        vp = EvolveAxis("random", ValueAxisType.FUNCTION, range_array=[random.random])
        
        # create a set of Axis from the previous function evolve args
        axis_list = [vf, vp]
        vv = EvolveAxis("meta", ValueAxisType.VALUEAXIS, range_array=axis_list)
        
        # run four times
        for i in range(4):
            result = vv.get_random_val()
            print("------------\nresult = {}\n{}".format(result, vv.to_string()))
      • Here’s the output. The random function has all the decimal places:
        ------------
        result = 0.03223958125899473
        meta: cur_value = 0.8840652389671935
        ------------
        result = -0.75
        meta: cur_value = -0.75
        ------------
        result = -3.5
        meta: cur_value = -3.5
        ------------
        result = 0.7762888191296017
        meta: cur_value = 0.13200324934487906
      • Verified that everything still works with the EvolutionaryOptimizer. Now I need to make sure that the new mutations include these new dimensions

     

  • I think I should also move TF2OptimizationTestBase to TimeSeriesML2?
  • Starting Human Compatible

Phil 11.13.19

7:00 – 3:00 ASRC

3rd Annual DoD AI Industry Day

From Stewart Russell, via BBC Business Daily and the AI Alignment podcast:

Although people have argued that this creates a filter bubble or a little echo chamber where you only see stuff that you like and you don’t see anything outside of your comfort zone. That’s true. It might tend to cause your interests to become narrower, but actually that isn’t really what happened and that’s not what the algorithms are doing. The algorithms are not trying to show you the stuff you like. They’re trying to turn you into predictable clickers. They seem to have figured out that they can do that by gradually modifying your preferences and they can do that by feeding you material. That’s basically, if you think of a spectrum of preferences, it’s to one side or the other because they want to drive you to an extreme. At the extremes of the political spectrum or the ecological spectrum or whatever image you want to look at. You’re apparently a more predictable clicker and so they can monetize you more effectively.

So this is just a consequence of reinforcement learning algorithms that optimize click-through. And in retrospect, we now understand that optimizing click-through was a mistake. That was the wrong objective. But you know, it’s kind of too late and in fact it’s still going on and we can’t undo it. We can’t switch off these systems because there’s so tied in to our everyday lives and there’s so much economic incentive to keep them going.

So I want people in general to kind of understand what is the effect of operating these narrow optimizing systems that pursue these fixed and incorrect objectives. The effect of those on our world is already pretty big. Some people argue that operation’s pursuing the maximization of profit have the same property. They’re kind of like AI systems. They’re kind of super intelligent because they think over long time scales, they have massive information, resources and so on. They happen to have human components, but when you put a couple of hundred thousand humans together into one of these corporations, they kind of have this super intelligent understanding, manipulation capabilities and so on.

  • Predicting human decisions with behavioral theories and machine learning
    • Behavioral decision theories aim to explain human behavior. Can they help predict it? An open tournament for prediction of human choices in fundamental economic decision tasks is presented. The results suggest that integration of certain behavioral theories as features in machine learning systems provides the best predictions. Surprisingly, the most useful theories for prediction build on basic properties of human and animal learning and are very different from mainstream decision theories that focus on deviations from rational choice. Moreover, we find that theoretical features should be based not only on qualitative behavioral insights (e.g. loss aversion), but also on quantitative behavioral foresights generated by functional descriptive models (e.g. Prospect Theory). Our analysis prescribes a recipe for derivation of explainable, useful predictions of human decisions.
  • Adversarial Policies: Attacking Deep Reinforcement Learning
    • Deep reinforcement learning (RL) policies are known to be vulnerable to adversarial perturbations to their observations, similar to adversarial examples for classifiers. However, an attacker is not usually able to directly modify another agent’s observations. This might lead one to wonder: is it possible to attack an RL agent simply by choosing an adversarial policy acting in a multi-agent environment so as to create natural observations that are adversarial? We demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots with proprioceptive observations, against state-of-the-art victims trained via self-play to be robust to opponents. The adversarial policies reliably win against the victims but generate seemingly random and uncoordinated behavior. We find that these policies are more successful in high-dimensional environments, and induce substantially different activations in the victim policy network than when the victim plays against a normal opponent. Videos are available at this http URL.