Phil 1.13.2023

Brian is coming over this evening to do Bossomatic things




Phil 1.12.2023

Dinner at 6:00, talk at 7:00

GPT Agents


  • 9:15 standup
  • Finish first pass of editing – done
  • Talk to Rukan about loss function. Everything looks really good, enough for a paper that shows binary encoding trains faster and is more accurate than one-hot. We also discussed ways of indicating confidence.
  • Register for GSAW and get tix, hotel – done!

Phil 1.11.2023

Mastering Diverse Domains through World Models

  • General intelligence requires solving tasks across many domains. Current reinforcement learning algorithms carry this potential but are held back by the resources and knowledge required to tune them for new tasks. We present DreamerV3, a general and scalable algorithm based on world models that outperforms previous approaches across a wide range of domains with fixed hyperparameters. These domains include continuous and discrete actions, visual and low-dimensional inputs, 2D and 3D worlds, different data budgets, reward frequencies, and reward scales. We observe favorable scaling properties of DreamerV3, with larger models directly translating to higher data-efficiency and final performance. Applied out of the box, DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula, a long-standing challenge in artificial intelligence. Our general algorithm makes reinforcement learning broadly applicable and allows scaling to hard decision making problems.


  • Still no word back from Disney. If I don’t get anything back by the 15th, I’ll put in Fair Use for cultural criticism and see what happens

GPT Agents


  • MORS rewrite
  • Beyond Single-Mindedness: A Figure-Ground Reversal for the Cognitive Sciences
    • A fundamental fact about human minds is that they are never truly alone: all minds are steeped in situated interaction. That social interaction matters is recognized by any experimentalist who seeks to exclude its influence by studying individuals in isolation. On this view, interaction complicates cognition. Here, we explore the more radical stance that interaction co-constitutes cognition: that we benefit from looking beyond single minds toward cognition as a process involving interacting minds. All around the cognitive sciences, there are approaches that put interaction center stage. Their diverse and pluralistic origins may obscure the fact that collectively, they harbor insights and methods that can respecify foundational assumptions and fuel novel interdisciplinary work. What might the cognitive sciences gain from stronger interactional foundations? This represents, we believe, one of the key questions for the future. Writing as a transdisciplinary collective assembled from across the classic cognitive science hexagon and beyond, we highlight the opportunity for a figure-ground reversal that puts interaction at the heart of cognition. The interactive stance is a way of seeing that deserves to be a key part of the conceptual toolkit of cognitive scientists.
  • Register for GSAW and get tix, hotel

Phil 1.10.2023

Space is a latent sequence: Structured sequence learning as a unified theory of representation in the hippocampus

  • Fascinating and puzzling phenomena, such as landmark vector cells, splitter cells, and event-specific representations to name a few, are regularly discovered in the hippocampus. Without a unifying principle that can explain these divergent observations, each experiment seemingly discovers a new anomaly or coding type. Here, we provide a unifying principle that the mental representation of space is an emergent property of latent higher-order sequence learning. Treating space as a sequence resolves myriad phenomena, and suggests that the place-field mapping methodology where sequential neuron responses are interpreted in spatial and Euclidean terms might itself be a source of anomalies. Our model, called Clone-structured Causal Graph (CSCG), uses a specific higher-order graph scaffolding to learn latent representations by mapping sensory inputs to unique contexts. Learning to compress sequential and episodic experiences using CSCGs result in the emergence of cognitive maps – mental representations of spatial and conceptual relationships in an environment that are suited for planning, introspection, consolidation, and abstraction. We demonstrate that over a dozen different hippocampal phenomena, ranging from those reported in classic experiments to the most recent ones, are succinctly and mechanistically explained by our model.

The Association of Professional Futurists is a global community of futurists advancing professional foresight. Our credentialed members help their clients and employers anticipate and influence the future.

GPT Agents


  • 9:15 – 9:30 Standup
  • 1:00 – 1:30 biweekly meeting
  • 2:00 – 3:00 Weekly status report
  • 3:00 – 5:00 MORS Journal meeting with Aaron

Phil 1.8.2023

GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the algorithm of General Language Model (GLM). It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. With INT4 quantization, the hardware requirements can further be reduced to a single server with 4 * RTX 3090 (24G) with almost no performance degradation. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English)

Phil 1.5.2023

In this video, we will learn how to use the Cohere Embed API endpoint to generate language embeddings using a large language model (LLM) and then index those embeddings in the Pinecone vector database for fast and scalable vector search. Cohere is an AI company that allows us to use state-of-the-art large language models (LLMs) in NLP. The Cohere Embed endpoint we use in this video gives us access to models similar to other popular LLMs like OpenAI’s GPT 3, particularly their recent offerings via OpenAI Embeddings like the text-embedding-ada-002 model. Pinecone is a vector database company allowing us to use state-of-the-art vector search through millions or even billions of data points.

GPT Agents


  • 9:15 standup
  • Submit request for GSAW – started
  • Package up code for Rukan – done
  • Floated the idea of a GPT brownbag


  • Updated my financial paperwork

Phil 1.4.2022

Roadmap for Researchers on Priorities Related to Information Integrity Research and Development

  • We in the Federal Government developed this Roadmap through sustained discussion across Federal agencies and through numerous consultations with academic researchers, commercial entities, international partners, those adversely affected by corrupted information, former government employees across the political spectrum, and others seeking to address information integrity challenges.

GPT Takes the Bar Exam

  • Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as “the Bar Exam,” as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in “AI?” In this research, we document our experimental evaluation of the performance of OpenAI’s `text-davinci-003` model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5’s zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5’s zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5’s ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future.

From the ICML Call for papers:


  • Meeting with Aaron? Most of the day, actually
  • Meeting with Rukan? Comment and zip up code with example spreadsheets by Thursday


  • Roll in Shimei’s changes – done
  • 4:00 Meeting. A lot of discussion about students cheating with the GPT. For foreign students in particular, the pressures to succeed and get an advanced degree seem to outweigh the penalties of plagiarism, much less using LLMs to write text. It’s a point that should be added to the essay.

Phil 1.2.23

Idly looking forward to 2.3.23


  • Add zero-day exploit example – done
  • Finish application – add “Neural Narrative Mapping”
  • Work on Shimei’s suggestions


  • Chapter asset form?
  • Fix payment form

Phil 12.23.2022

I wrote a new blog post! Some thoughts on the ChatGPT

Mastodon Digest

  • This is a Python project that generates a digest of popular Mastodon posts from your home timeline. The digest is generated locally. The digests present two lists: posts from users you follow, and boosts from your followers. Each list is constructed by respecting your server-side content filters and identifying content that you haven’t yet interacted with. Digests are automatically opened locally in your web browser. You can adjust the digest algorithm to suit your liking (see Command arguments).

Really not feeling motivated. It’s been raining for 36 hours or so, and then it’s going to get cold. By Tuesday, things should be getting back to seasonal, and then even a little nice by Friday


  • More MORS. Get a first pass through the conclusions – done! Currently at 18 pages with references
  • Nice chat with Aaron to wrap up the year.

Phil 12.22.2022

The days are (marginally) getting longer


  • Fill out Disney form – done. They say 10 days?


  • 9:15 standup
  • In the time between these, submit expense report – done
  • 10:00 CWOC meeting
  • Then a quiet day of MORS writing. Maybe “proposition” rather than “lesson”. Heck, make a variable

GPT Agents

  • Nice talk with Shimei last night
  • Add some detail and justification for the creation of models from keyword data

Phil 12.21.2022

Shortest day of the year! It gets better from here



  • Early morning helping Rukan with getting everything done
  • Need to make videos when they are ready. Change all the raid numbers to NINE
  • Working on some test files to train the NN to chose the nth-best choice – done
  • MORS – going to set up the History section to have numbered lessons
  • Submit for reimbursement!
import pandas as pd
from random import random
from pathlib import Path

from typing import List

class FindLowest:

    def __init__(self, num_items, size:int, rows:int = 100):
        self.num_items = num_items
        self.size = size
        self.rows = rows

    def int_to_bin_list(self, val:int, places:int = 16) -> List:
        l = []
        for i in range(places):
            b = int(val & 1 << i != 0)
        return l

    def calc_data(self, bin_list_len:int = 4):
        row = 0
        self.input_matrix = []
        self.output_matrix = []

        for r in range(self.rows):
            i = r % self.num_items
            d = {}
            #d['id'] = i
            for j in range(self.size):
                d[j] = random()
            sd = dict(sorted(d.items(), key=lambda item: item[1]))
            #print("{}, {}".format(sd.keys(), d.values()))
            best_choice = list(sd.keys())[i]
            bc_list = self.int_to_bin_list(best_choice, bin_list_len)
            id_list = self.int_to_bin_list(i, bin_list_len)
            input_d = {}
            output_d = {}
            for i in range(bin_list_len):
                input_d["b{}".format(i)] = id_list[i]
                output_d["b{}".format(i)] = bc_list[i]
            #print("row {}: id = {}, inout = {}, output = {}".format(row, id_list.reverse(), d, bc_list.reverse()))
            print("row {}: input_d = {}, output_d = {}".format(row, input_d, output_d))
            row += 1

    def to_csv(self, prefix:str, directory:str = None):
        if directory == None:
            directory = str(Path.home())
        df = pd.DataFrame(self.input_matrix)
        filename = "{}/{}_input.csv".format(directory, prefix)
        print("saving {}".format(filename))
        df.to_csv(filename, index=False)

        df = pd.DataFrame(self.output_matrix)
        filename = "{}/{}_output.csv".format(directory, prefix)
        print("saving {}".format(filename))
        df.to_csv(filename, index=False)

def main():
    fl = FindLowest(5, 10)

if __name__ == "__main__":

GPT Agents

  • Start on paper? At least get the template up and copy stuff over from the other doc
  • 4:00 Meeting

Phil 12.20.22

Agreed to go on this podcast – should be interesting

Elixir: Train a Large Language Model on a Small GPU Cluster

  • In recent years, the number of parameters of one deep learning (DL) model has been growing much faster than the growth of GPU memory space. People who are inaccessible to a large number of GPUs resort to heterogeneous training systems for storing model parameters in CPU memory. Existing heterogeneous systems are based on parallelization plans in the scope of the whole model. They apply a consistent parallel training method for all the operators in the computation. Therefore, engineers need to pay a huge effort to incorporate a new type of model parallelism and patch its compatibility with other parallelisms. For example, Mixture-of-Experts (MoE) is still incompatible with ZeRO-3 in Deepspeed. Also, current systems face efficiency problems on small scale, since they are designed and tuned for large-scale training. In this paper, we propose Elixir, a new parallel heterogeneous training system, which is designed for efficiency and flexibility. Elixir utilizes memory resources and computing resources of both GPU and CPU. For flexibility, Elixir generates parallelization plans in the granularity of operators. Any new type of model parallelism can be incorporated by assigning a parallel pattern to the operator. For efficiency, Elixir implements a hierarchical distributed memory management scheme to accelerate inter-GPU communications and CPU-GPU data transmissions. As a result, Elixir can train a 30B OPT model on an A100 with 40GB CUDA memory, meanwhile reaching 84% efficiency of Pytorch GPU training. With its super-linear scalability, the training efficiency becomes the same as Pytorch GPU training on multiple GPUs. Also, large MoE models can be trained 5.3x faster than dense models of the same size. Now Elixir is integrated into ColossalAI and is available on its main branch.

I think the ChatGPT article should be on teaching critical thinking with large language models

  • On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
    • Generating a chain of thought (CoT) can increase large language model (LLM) performance on a wide range of tasks. Zero-shot CoT evaluations, however, have been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In this paper, we perform a controlled evaluation of zero-shot CoT across two sensitive domains: harmful questions and stereotype benchmarks. We find that using zero-shot CoT reasoning in a prompt can significantly increase a model’s likelihood to produce undesirable output. Without future advances in alignment or explicit mitigation instructions, zero-shot CoT should be avoided on tasks where models can make inferences about marginalized groups or harmful topics.
  • ChatGPT Has Infiltrated Twitter Replies


  • Read Fair Use chapter from The Librarian’s Guide to Intellectual Property in the Digital Age. Done. It makes me think that I can redraw the images as sketches and should be ok.


  • Sprint planning, looks like make videos and work on JMOR paper
  • Submit paperwork for MORS membership