Author Archives: pgfeldman

Phil 7.20.2023

GPT Agents

  • Looking for a straightforward way to build a webapp that has a simple front end (submit email page, then page protected by a GUID that has the IRB statement and the experiment(s)). The last time I did this was in 2015 with Angular. The code still works, amazingly enough, so I could just try reusing that. It looks like I have all the books still, so maybe that’s not the worst answer.
  • Got ahold of Zach, and we’ll put together something mode modern using Supabase and solid.js. This should be fun!

SBIRs

  • 9:00 standup
  • 11:30 touchpoint
  • More paper. Good, albeit halting progress. I should be able to finish the analysis section for vignette 1 tomorrow

Phil 7.19.2023

How is ChatGPT’s behavior changing over time?

  • GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the same LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality.
https://huggingface.co/chat/

Tag up with Adti to discuss UMBC HCC PhD program

SBIRs

  • Went up to NJ yesterday for a meeting and Identrust stuff. Got everything done!
  • Also had a good discussion with Aaron about the Scale paper and how to tie it into a wargame demo. Found this Wikipedia entry and this pdf as well.
  • Need to prep for ML capabilities meeting
  • USNA Intern prep

GPT Agents

  • 4:00 UMBC meeting

Phil 7.17.2023

Tasks

  • Check with Rheena on guarantor stuff
  • Did what I could about car rental

SBIRs

  • Demo slides
  • Weekly meeting. Check overleaf before
  • Work on Scale paper

GPT Agents

  • Sent out experiment email
  • Ping Zach about test website?

Phil 7.14.2023

This is wild:

Severe Depressive Symptoms Exacerbate the Relationship Between Conspiracy Beliefs and Voting for Election Doubters

  • Two of the most significant concerns about the contemporary United States are the erosion of democratic institutions and the increase in rates of depression. The researchers provide evidence linking these phenomena. They use a survey (N=11,517) to show a relationship between COVID-19 conspiracy beliefs and the endorsement of the 2020 election fraud claim as well as voting, in 2022, for gubernatorial candidates who cast doubt on the 2020 election results. The authors further predict and find that the presence of severe depressive symptoms exacerbates these relationships. An increase in depression among COVID-19 conspiracy believers is positively associated with voters casting their ballots for candidates who question the foundation of democratic legitimacy. The results highlight how interventions to address mental health can improve the country’s political health.

SBIRs

  • JSC meeting at 10:00

GPT Agents

  • Write up experiment email
  • Add human-readable text to sources
create or replace view parsed_text_view as
    select t.id, t.source, s.text_name, t.parsed_text
    from table_parsed_text t
        inner join table_source s on t.source = s.id;

Phil 7.13.2023

Vacation is over. Back to the mines.

SBIR’s

  • 9:15 standup
  • Need to look at what I have as deliverables – MDA only
  • Start writing abstract for Emerging Techniques forum – got permission
  • Reply to Chris K – done
  • Identrust? Yup. Form is done and now needs to be notarized
  • MDA subject meeting

GPT Agents

  • Write up resume experiment thoughts
  • Need to get back to mapmaking

Phil 7.2.2023

On Hate Scaling Laws For Data-Swamps

  • `Scale the model, scale the data, scale the GPU-farms’ is the reigning sentiment in the world of generative AI today. While model scaling has been extensively studied, data scaling and its downstream impacts remain under explored. This is especially of critical importance in the context of visio-linguistic datasets whose main source is the World Wide Web, condensed and packaged as the CommonCrawl dump. This large scale data-dump, which is known to have numerous drawbacks, is repeatedly mined and serves as the data-motherlode for large generative models. In this paper, we: 1) investigate the effect of scaling datasets on hateful content through a comparative audit of the LAION-400M and LAION-2B-en, containing 400 million and 2 billion samples respectively, and 2) evaluate the downstream impact of scale on visio-linguistic models trained on these dataset variants by measuring racial bias of the models trained on them using the Chicago Face Dataset (CFD) as a probe. Our results show that 1) the presence of hateful content in datasets, when measured with a Hate Content Rate (HCR) metric on the inferences of the Pysentimiento hate-detection Natural Language Processing (NLP) model, increased by nearly 12% and 2) societal biases and negative stereotypes were also exacerbated with scale on the models we evaluated. As scale increased, the tendency of the model to associate images of human faces with the `human being’ class over 7 other offensive classes reduced by half. Furthermore, for the Black female category, the tendency of the model to associate their faces with the `criminal’ class doubled, while quintupling for Black male faces. We present a qualitative and historical analysis of the model audit results, reflect on our findings and its implications for dataset curation practice, and close with a summary of our findings and potential future work to be done in this area.

Phil 6.29.2023

Welcome to the future (From the Washington Post)

Textbooks Are All You Need

  • We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of “textbook quality” data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.
  • This makes me think that smaller models on better data that use context prompting might be a good approach for trustworthy agents. In addition to the data used for the text, you could also provide style text in the prompt. Possibly few-shot prompting? I could try that with davinci.

Phil 6.23.2023

Chores today, pack tomorrow. I am so burned out.

I added the Quran to the db last night and am having a little trouble downloading it from svn. Seems to be working now

SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

  • In many domains, autoregressive models can attain high likelihood on the task of predicting the next observation. However, this maximum-likelihood (MLE) objective does not necessarily match a downstream use-case of autoregressively generating high-quality sequences. The MLE objective weights sequences proportionally to their frequency under the data distribution, with no guidance for the model’s behaviour out of distribution (OOD): leading to compounding error during autoregressive generation. In order to address this compounding error problem, we formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset, including divergences with weight on OOD generated sequences. The IL framework also allows us to incorporate backtracking by introducing a backspace action into the generation process. This further mitigates the compounding error problem by allowing the model to revert a sampled token if it takes the sequence OOD. Our resulting method, SequenceMatch, can be implemented without adversarial training or major architectural changes. We identify the SequenceMatch-χ2 divergence as a more suitable training objective for autoregressive models which are used for generation. We show that empirically, SequenceMatch training leads to improvements over MLE on text generation with language models.

GPT Agents

  • I tried having one of the LLMs describe my research, which it missed completely. I’m going to try to use my CV as context and see if that works as well. If it does, then I can use the faculty at UMBC to evaluate themselves, which should be kind of fun.
  • Works quite well, though the model sometimes can’t figure out the publications? Need to work on that. The context prompts are spot on, while the no context prompts are wildly hallucinatory.

SBIRs

  • Status report (again)
  • JSC meeting
  • More story

Phil 6.22.2023

Via Twitter

Trip

  • Cancel CA hotels – done
  • Get Astoria hotel -done
  • Get Seattle airport hotel with shuttle service -done
  • Tell Sande we’re getting home a day early

SBIRs

  • 9:00 standup
  • See what it takes to run JavaUtils in VS-Code – got everything working. You need the Java extensions and to point to the jar files
  • More reading, maybe start writing. Good start! Borrowing Nema

Phil 6.20.2023

TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI

  • While several recent works have identified societal-scale and extinction-level risks to humanity arising from artificial intelligence, few have attempted an exhaustive taxonomy of such risks. Many exhaustive taxonomies are possible, and some are useful — particularly if they reveal new risks or practical approaches to safety. This paper explores a taxonomy based on accountability: whose actions lead to the risk, are the actors unified, and are they deliberate? We also provide stories to illustrate how the various risk types could each play out, including risks arising from unanticipated interactions of many AI systems, as well as risks from deliberate misuse, for which combined technical and policy solutions are indicated.

SBIRs

  • 9:00 Sprint demos. Need to make slides – done
  • 10:30 Overleaf meeting – nope
  • 1:00 Q4/Q5 presentation – done, but I need to do it again
  • 2:00 Sprint planning – done
  • Working on the scale paper

Phil 6.17.2023

Back from New York! Seriously, West Point is Hogwarts:

Enabling delightful user experiences via predictive models of human attention

  • In this blog, we present two papers (one from CVPR 2022, and one just accepted to CVPR 2023) that highlight our recent research in the area of human attention modeling: “Deep Saliency Prior for Reducing Visual Distraction” and “Learning from Unique Perspectives: User-aware Saliency Modeling”, together with recent research on saliency driven progressive loading for image compression (12). We showcase how predictive models of human attention can enable delightful user experiences such as image editing to minimize visual clutter, distraction or artifacts, image compression for faster loading of webpages or apps, and guiding ML models towards more intuitive human-like interpretation and model performance. We focus on image editing and image compression, and discuss recent advances in modeling in the context of these applications.

Phil 6.15.2023

We’re excited to introduce the first AI model based on a key component of LeCun’s vision. This model, the Image Joint Embedding Predictive Architecture (I-JEPA), learns by creating an internal model of the outside world, which compares abstract representations of images (rather than comparing the pixels themselves). I-JEPA delivers strong performance on multiple computer vision tasks, and it’s much more computationally efficient than other widely used computer vision models. The representations learned by I-JEPA can also be used for many different applications without needing extensive fine tuning. For example, we train a 632M parameter visual transformer model using 16 A100 GPUs in under 72 hours, and it achieves state-of-the-art performance for low-shot classification on ImageNet, with only 12 labeled examples per class. Other methods typically take two to 10 times more GPU-hours and achieve worse error rates when trained with the same amount of data.