7:00 – 8:00 ASRC
- Saw this on Twitter this morning: Training Agents using Upside-Down Reinforcement Learning
- Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques. Many of its main principles are outlined in a companion report [34]. Here we present the first concrete implementation of UDRL and demonstrate its feasibility on certain episodic learning problems. Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.
- I wonder how it compares with Stuart Russell’s paper Cooperative Inverse Reinforcement Learning
- For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial- information game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.
- Dissertation
- In the Ethics section, change ‘civilization’ to ‘culture’, and frame it in terms of the simulation – done
- Last slide should be ‘Thanks for coming to my TED talk’
- Ping Don’s composer and choreographer, if I can find them
- Cool! A T-O style universe map (Unmismoobjetivo , via Wikipedia). The logarithmic distance effect is something that I need to look into:
- Evolver
- Quickstart
- User’s guide
- Finished commenting!
- Flailing on geting the documentation tools to work.
- Installing Sphinx
- ML Seminar
- Double Crab Cake Platter (2) – 2 Vegetables – $34.00
- Went over the Evolver. The Ensemble charts really make an impression, but overall, the code walkthrough is too difficult – there are two many moving parts. I need to write a paper with screengrabs that walk through the whole process. I’ll need to evaluate against Bayesian tuners, but I also have architecture search
- The venue could be IEEE ICTAI 2020: The IEEE International Conference on Tools with Artificial Intelligence (ICTAI) is a leading Conference of AI in the Computer Society providing a major international forum where the creation and exchange of ideas related to artificial intelligence are fostered among academia, industry, and government agencies. It will be in Baltimore, I think.
- Meeting with Aaron. He thinks that part of the ethics discussion needs to be an addressing of the status quo