Phil 2.27.2023

Stanford Human Preferences Dataset (SHP) is a dataset of 385K collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to legal advice. The preferences are meant to reflect the helpfulness of one response over another, and are intended to be used for training RLHF reward models and NLG evaluation models


  • Checked to see that all the other pictures are a good size. Yes! And the art row is intact, so phew.


  • At GSAW. The conference is not at the hotel. Fortunately, I brought a bike!
  • Panel is today after lunch

GPT Agents

  • Added summary and narrative to generation options
  • Stubbed out the saving of the relevant project data to the narrative maps input json file


  • “NOVA – I’m not sure what that means – we have some great acronym guys. Anyway, we have SUPERNOVA connecting the NOVAs”
  • Really thinking about the humans as the only attack surface that matters. Social hacking for everything, as long as you’re patient. This is going to be the real power of AI “social weapons.” They take advantage of intrinsic human bias and use it to shovel sand into high-tech adversaries. So how do you detect that? Is “death by PowerPoint” an example of a successful attack?