Author Archives: pgfeldman

Phil 8.8.2022

TOLKIEN’S ILLUSTRATORS: SERGEY YUKHIMOV

https://img0.liveinternet.ru/images/attach/c/0//51/804/51804945_37315702.jpg

SBIRs

  • Need to start the MORS slide deck
  • Need to start demo slides
  • 9:15 Standup
  • Sent a note to Erika Mackin – done
  • Tweaked MapDisplay1 so that it works again. Need to port it to the laptop
  • Looks like the presentation is next Friday at 1:00

GPT Agents

  • Got a good run of threads, but I need to verify a few things:
    • There is an imbalance. Is this because the run ended improperly? Looks like it all checks out. There are simply more threads with “ivermectin” based on the sample
    • The view is broken for threads. Need to fix or make a new one. Looks like the experiment_id is being set to -1. Fixed. I wasn’t passing in the value to run_thread_query()
    • Back up the DB – done
    • Specify the current experiment somewhere, and indicate if there are threads (in label) – done
  • Continue with EmbeddingExplorer
Imbalanced threads because peple like to talk about ivermectin

Phil 9.7.2022

Yay! Got a prescription!

My guess is that Trump made a deal with the Saudis for nuclear information on Israel and Iran

Book

  • Need to respond to Brenda’s email and set up a meeting on Friday?

SBIRs

  • Really good conversation with Aaron about CWoC. The idea that lower parts of the hierarchy could simulate higher levels is very cool. It could even be a separate model for each layer, trained on what that part of the hierarchy can be aware of and the commands that it gets. That way, it could “interpolate” across times when communication fails.
  • Need to set up a separate root document for research that has tasks broken down by people with a small introduction and then room for their documentation. Include a ToC. – done

GPT Agents

Phil 9.6.2022

Set up a monthly contribution to the UNHCR

Book

  • Adding a bit on beauty for diversity injection

GPT Agents

  • Start on the GPT and Embedding interfaces. Prompt the GPT with something like “Once upon a time there was” and set the number of times to run and the number of tokens. Split on sentences (r”\.|!|?”) and get the embeddings for each. Then cluster and extract topics (Using EmbeddingExplorer pointing at a different db). Build maps!
  • Continue fleshing out the Twitter embedding app
  • Ok, what I really wound up doing was getting threading to work on TweetDownloader and fixing an interesting bug in the sampled day method. When I wrote it, I assumed that the number of tweets per day are reasonably constant. Not true. So as a bit of a hack, I moved the endpoint of the query to include the entire day and use REPLACE INTO rather than INSERT. Much better results so far. Will work on the other stuff tomorrow.

SBIRs

  • Need to read this carefully. I like the fact that it uses the MinGPT: Transformers are Sample Efficient World Models
    • Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at this https URL.
    • Delivered the quarterly report.
https://twitter.com/jerclifton/status/1565397169623797760

Book

  • Brenda has started a readthrough and will get back to me with comments
  • Need to add a reference to “beauty” in the diversity injection chapter and reference Transcendence

SBIRs

  • Finish first pass of quarterly report
  • MinGPT!!!!
https://twitter.com/hardmaru/status/1565569808548376576

GPT Agents

  • Added parens to the Twitter query. You can now do foo OR bar OR (Frodo AND Gandalf) OR (Sauron AND Sauruman)
  • Thinking about combining the GPT and Embedding interfaces. Prompt the GPT with something like “Once upon a time there was” and set the number of times to run and the number of tokens. Split on sentences (r”\.|!|?”) and get the embeddings for each. Then cluster and extract topics (Using EmbeddingExplorer pointing at a different db). Build maps!

Phil 8.31.2023

A house made of bones in the middle of a desert

Heavenbanning is real in the world: We flooded our dating app with bots…to scam scammers

DALL·E Editor Guide

  • The DALL·E editor interface helps you edit images through inpainting and outpainting, giving you more control over your creative vision.

Book

  • I finished the new proposal last night, so now I need to sent it off today – DONE!!!
  • Working on appt with Brenda

SBIRs

  • Tweak Ron’s stuff and add an intro
  • Tweak Rukan’s stuff and add an intro for interpolation and regression sections. Include an intro to Perceptrons and then Attention-MLPs.
  • Start the final pass through the doc

GPT Agents

  • Move “selected experiment” and “keyword” out of the tabs
  • Add a “Create Corpora” tab
  • General TODOs:
    • Implement threads, and make sure that the extended queries work
    • Implement calls to GPT embeddings and verify on small dataset. I could even try Aaron’s Wikipedia cats vs computers idea but use tweets
    • Need to treat AND like OR so that tweets containing multiple keywords work

Phil 8.30.2022

Back from more travels! It’s good to be home

SBIRs

  • Submit expenses
  • 9:00 Sprint planning – done
  • 2:MDA meeting
  • Quarterly report

GPT Agents

  • Working on EmbeddingExplorer
Progress for the last couple of days
  • 3:30 Meeting

Book

  • Finish proposal
  • Schedule Thursday meeting

Phil 8.26.29

Umberto Eco: A Practical List for Identifying Fascists

Multi Dimensional and Domain Operations (MDDO)

  • Cebrowski referred to the traditional concept of combat power being measured in the movement of traditional physical forces through time and space in a physical domain.  But to do this, military forces need information and control from the information domain.  Winning a conflict happens in the intangible cognitive domain is in the mind of the individual war-fighter with feelings of success or failure.  Collectively these individual minds make up the social domain, the shared societal awareness and understandings referred to in culture, values, attitudes and beliefs.

SBIRs

  • Need to either pick up docs or get them printed – getting them printed
  • Quarterly report – Adjusting slots for Rukan’s work. Ron looks like he’s doing better

Book

  • Working on the Elsivier proposal. Finding reviewers

GPT Agents

  • Top2Vec is hanging on the import! Going to try it on another box and see if that will still happen. If it does, then I’m going to use OpenAI’s embeddings until it’s fixed
  • Started!

Phil 8.25.2022

SBIRs

  • Was at the Tech Summit for the last two days. Good to see people again!
    • Pinged Jennifer about Elicit
  • Trip
    • Tix! – done
    • Hotel! – done
    • Car! – done
    • Slides! – done, but not printed
  • Continue on Quarterly report
    • 9:00 Meeting with Ron

Book

  • Respond to Katy’s letter – figuring out who would be good to send this to for review
    • Ping Brenda. I think we’ll need to meet next week – done
    • Wound up writing a short python program to scan through my book to find what’s cited most. Mostly based on this. Handy!
from tkinter import filedialog
import PyPDF2
import re
from typing import List, Dict

filename = filedialog.askopenfilename(filetypes=(("pdf files", "*.pdf"),), title="Load pdf File")
if filename:
    print("opening {}".format(filename))
    # open the pdf file
    object = PyPDF2.PdfFileReader(filename)
    d = {}

    # get number of pages
    NumPages = object.getNumPages()
    print("There are {} pages".format(NumPages))
    # extract text and do the search
    for i in range(0, NumPages):
        PageObj = object.getPage(i)
        # print("this is page " + str(i))
        Text = PageObj.extractText()
        # print(Text)
        reml:List = re.findall("\[\d+\]", Text)
        if len(reml) > 0:
            for r in reml:
                if r in d:
                    d[r] += 1
                else:
                    d[r] = 1
    ds = dict(sorted(d.items(), key = lambda x: x[1], reverse=True))
    for k, v in ds.items():
        print("{} = {}".format(k, v))

GPT-Agents

  • Is today the day to try topic2vec? Sure hope so!
  • Started poking around. It’s hanging on the import. That is really odd

Phil 8.22.2022

Book

  • I seem to have an encouraging nibble from Elsevier, via a side door from another author’s acquisition editor. We will see how that goes
  • Working on grammar
  • Setting up a meeting with my new copy editor. Thursday?

SBIRs

  • Watch Rukan’s talk
  • More quarterly report

GPT Agents

  • Bail on this week’s meeting – done
  • Still need to poke at topic2vec

Phil 8.19.2022

Book

  • Moved the definitions to the front and tweaked a bit. I also discovered that MSWord will open a PDF and then you can use the grammar checking feature, which is nice
  • Send proposal to NYU. Looks pretty straightforward – done!
  • Assemble the annotated ToC + Book and send it to Brenda – done!
  • Set up a meeting to see Rukan walk through his slides

SBIRs

  • Might go to Huntsville on the 29th?
  • Quarterly report

GPT Agents

Phil 8.18.2022

OPT model for online use: opt.alpa.ai

Book

  • Got a decline from Princeton
  • Sent off the last proposal to University of Toronto. Next are the for-profit presses
  • Got a nice decline from the University of Toronto, that suggested I look at NYU press
  • Sent an example proposal to Jimmy, and got a very encouraging response back!
    • I took a look through Dr. Feldman’s proposal. It’s fascinating stuff; very appropriate and timely with regard to how people get their messaging from social media.  I’ve forwarded it to one of my colleagues who signs in the social sciences area to see if it might be a good fit for her portfolio. If she thinks it is, I’d be happy to make introductions.
    • With regard to the proposal itself, I don’t really have any comments. It’s well done and compelling. I just have one comment about the sample chapter, unrelated to the proposal as a whole.  Dr. Feldman includes several 3rd party figures that are not properly attributed, and with therefore questionable permissions.  For instance, Figure 2: Ruby Bridges with U.S. Marshals.  The caption should include information about where the image came from and should indicate that permission was given to use it.  The same goes for figures 1 and 4.  Perhaps he’s planning to sort that all out once he has a contract but authors do need to be very careful about any third party content they include, even in a sample chapter.
  • Getting in contact with a copy editor via school – get the email and sent an intro. She’s $20/hr. I need to send a sample

SBIRs

  • Continue to push MORS up hill – Submitted!
  • More markdown documentation for Chirp – Done!. Also update PyPi – Done!
  • 9:15 standup
  • 11:30 CSC
  • Quarterly report

GPT Agents

  • Install and play with topic2vec?

Phil 8.17.2022

GPT-Agents

  • Continue Chirp submission
  • See if topic2vec works, and if it can tell the difference between ivermectin and paxlovid posts
  • IUI 2023
  • I have a fun idea for a paper. Use a mad-libs approach to mindalle prompt generation and see how well the system(s) perform as the prompts go from normal to borderline. We could use machine image description to validate.

SBIRs

  • Quarterly report
  • Chirp
  • MORS
  • RCSNN/3D graphics

Book

  • University of Toronto press? Looked at this last night, but I had just come back from the dentist and didn’t have the motivation. It’s a letter, so it should be straightforward. Then I think the National Academic press is pretty vague, and may just be a letter too. Then it’s time to poke at the for-profit academic press
  • Send Jimmy an example proposal that he can pass on for a sanity check with his editor.

Phil 8.16.2022

You can really tell that the days are getting shorter

Efficient Training of Language Models to Fill in the Middle (This is basically the reverse GPT concept)

  • We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset, which simply moves a span of text from the middle of a document to its end. While this data augmentation has garnered much interest in recent years, we provide extensive evidence that training models with a large fraction of data transformed in this way does not harm the original left-to-right generative capability, as measured by perplexity and sampling evaluations across a wide range of scales. Given the usefulness, simplicity, and efficiency of training models to fill-in-the-middle (FIM), we suggest that future autoregressive language models be trained with FIM by default. To this end, we run a series of ablations on key hyperparameters, such as the data transformation frequency, the structure of the transformation, and the method of selecting the infill span. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research.

Patching open-vocabulary models by interpolating weights

  • Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.

Alex Jones and the Lie Economy

  • Discerning audiences who stumble on Jones’ show turn him off, but his message excites the credulous who, if they don’t fully subscribe to the man’s views, want to hear more of the same. Lies are almost always more exciting and exploitable than dull truths. Having culled the impressionable from the doubting and boosted their pulse rate, he turns them over to his merchandising wing where he sells survivalist gear and health supplements like Brain Force Ultra, Winter Sun Plus Vitamin D and a variety of “Superblue Silver” products (immune gargle, toothpaste and wound dressing) that Jones claimed could mitigate Covid. It’s not incidental that the products he hawks are presented as the fix for coming apocalyptic perils predicted on his shows. Citing court filings submitted by Jones’ attorneys in discovery, HuffPost reports that InfoWars collected $165 million in sales of these products from September 2015 to the end of 2018.

GPT-Agents

  • Continue Chirp submission
  • See if topic2vec works, and if it can tell the difference between ivermectin and paxlovid posts
  • 3:30 Meeting
  • IUI 2023

SBIRs

  • 8:30 SEG staffing changes
  • 9:00 Sprint planning
    • Chirp
    • MORS
    • Quarterly Report
    • RCSNN/3D graphics

Book

  • University of Toronto press?
  • Started on the Strategy and Tactics in Online Conflict proposal

Phil 8.15.2022

Book

  • Rejection from Columbia
  • Looked at how to hire a copy editor a bit. Found this and this
  • Need to continue submissions, and then start followups.
  • Submitted to McGill-Queens. It’s a Canadian school and I used the deep bias chapter which has the indigenous school fiasco

GPT Agents

  • Working on Chirp submission – finished the (a?) video and edited it down to three minutes. If I have more time I’ll redo it
  • Tweaked the KeywordExplorer UI a bit

SBIRs

  • Working on quarterly report

Phil 8.12.2022

Baseball tix!

Social Simulacra: Creating Populated Prototypes for Social Computing Systems

  • Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer’s description of a community’s design — goal, rules, and member personas — and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of “what if?” scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models’ training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

SBIRs

  • Submit an abstract by 19 August for the opportunity to participate in MORS’ one-of-a-kind event held at the new IDA Center from 27-29 September! With high-level speakers including Dr. Baruch Fischhoff, Dr. Kristen Kulinowsk, Dr. Michael Ford and Dr. Ryan Barrett, the Emerging Techniques Forum (EFT) is one you will not want to miss this year. All abstracts must be submitted in an unclassified format and 1,500 (including spaces) or less characters without images or videos. If you are submitting an abstract for the classified session, indicate the classification level at the time of submission.
    • Mostly done. Need some additional paperwork filled out. Sent that off as well
  • Prep slides for Sprint review and finish off tasks – done

GPT Agents

  • Deploy updated versions for Chirp
  • Test and validate balanced pull
  • Run balanced and proportional 10,000 tweet pulls for ivermectin and plaxovid
  • Try running Top2Vec on tweets to see what the topic spaces look like
  • Try to get some threads in those two spaces and use those to show trajectories through topics
  • If there are enough intersecting trajectories, then create narrative embedding space
  • Had a good talk with Aaron yesterday about his discord group and how that could be a nice source of maps.
  • Submit KeywordExplorer to Chirp Developer Challenge by Aug 19
    • Content discovery apps
    • Include an App built with the required developer tools and meets the above Project Requirements.
    • Include a text description that should explain the features and functionality of your App.
    • Include a description of which category you are submitting to.
    • Include a link to a fully deployed app. 
    • Provide Twitter handle associated with the developer account.
    • Include a demonstration video of your App. The video portion of the submission:

Book

  • Submitted to U Columbia Press