Phil 2.9.2023

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

  • Recent works successfully leveraged Large Language Models’ (LLM) abilities to capture abstract knowledge about world’s physics to solve decision-making problems. Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.

My Twitter tools still appear to be working…


  • Schedule physical


  • 9:15 standup
  • FOM meeting today?
  • Continue with embedding work – the pull blew up on the second attempt, after I broke the first file. Added some error handling. This will work much better when I’m using the db
  • My embeddings appear to be wrong! there are just 10 distinct embeddings. Maybe it was a bug on OpenAI’s side. Need to do another pull.
  • Made a version of the embedding pull that uses lists of texts which should speed things up.
  • Since I have to wipe the embeddings, I’m going to try storing the np.arrays as blobs using dumps()
  • Get Orest’s help with signoff – sent an email


  • Respond to Alden’s invite
  • Find some additional cites for the paper, and something about how LLMs can generate toxic content, though in this case it may be a feature. If there is room, add something about the ethics of being able to better target minority groups and their views is a two-edged sword, and how the biases of LLMs may affect the generation of keywords.


  • Starting the “Scale” paper, which will be part 2 of the new book: “Speed and Scale, Societal Defense in the age of the Singularity”
  • The Era of the Algorithm
    • The internet age has made democracies exploitable. As an act of societal self-defense, it is necessary to strengthen the critical thinking of the young generation.