Phil 7.1.2022

Le Tour de France commence aujourd’hui !

I heard about Futureshape this morning. Might ping them about Belief Maps and TACJ

Beyond neural scaling laws: beating power law scaling via data pruning

  • Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.


  • International driver’s license – done
  • Delta and Iberia apps – done
  • Synchronize laptop – done
  • International calling plan – done
  • Financial notifications (Visa and BoA?) – done


  • Write cover letter
  • Write audience section

Phil 6.30.2022

Get Iberia App!

Penrose is a platform that enables people to create beautiful diagrams just by typing mathematical notation in plain text. The goal is to make it easy for non-experts to create and explore high-quality diagrams and provide deeper insight into challenging technical concepts. We aim to democratize the process of creating visual intuition.


  • Created a “cover_letter” folder and added the template
  • Updated my CV in LaTeX and online


  • Continue with RCSNN paper. Finish today? Done! Nine pages and just under 3,000 words
  • 9:15 standup
  • Rukan is running on the Lambda box! He is using local VSCode using the remote via ssh.
  • Revisit sharpened cosine similarity as an attention mechanism?

Phil 6.29.2022

Fact-checking movement grapples with a world awash in false claims

  • The torrent of false information, such as the election-fraud claims that led to the assault on the U.S. Capitol, Russian disinformation about the invasion of Ukraine and pseudoscientific assertions about the coronavirus pandemic, has emerged despite the astonishing growth of the fact-checking movement. In 2021, there were 391 active fact-checking projects, according to an annual census by the Duke Reporters’ Lab, up from 168 in 2016.


  • Add a “Clamped” button so that each keyword is limited to max number of pulls but not balanced beyond that
  • Add a “Percent” button so that a certain percentage of tweets are gathered with no max but a minimum of 10


  • Grinding through the RCSNN paper. I’m beginning to think that this will be harder than the “golf ball interceptor”
  • We’re still having problems with IT getting the server up

Phil 6.28.2022

Dentist at 9:40

IPhone at 5:30


  • Lot’s of interaction with Steve
  • Working on lists of comparable for RCSNN


  • Updated the repo
  • Ordered Association of University Presses Directory 2021
  • I’m starting to think about a second book about how to egalitarian systems can actively disrupt authoritarian systems in technology-mediated contexts using the concepts of belief space.
    • This is tangentially related. It is about attacking hackers: Lamboozling Attackers: A New Generation of Deception
      • Imagine a world in which developers and operators of systems exploit attackers as much as attackers exploit defenders. By leveraging system-design knowledge and modern computing to deploy deception environments, software engineering teams can successfully bamboozle attackers for fun and profit while deepening systems resilience.

GPT Agents

  • Everything looks like it’s working when I perise the DB. Need to do some counts of the output
  • 3:30 Meeting tonight

Phil 6.27.2022

Went to see some minor league ball yesterday and stayed in Rehoboth, so I got to work late

Go Shorebirds!



  • I think I’ve got balanced downloading working. Will test tomorrow


  • Weekly 2:00 meeting. The server is almost up!
  • 1:00 meeting with Steve explaining transformers

Phil 6.24.2022

Check MVA – done!

Change tire

Set up a longer ride for tomorrow – done!

Combining interventions to reduce the spread of viral misinformation

  • Misinformation online poses a range of threats, from subverting democratic processes to undermining public health measures. Proposed solutions range from encouraging more selective sharing by individuals to removing false content and accounts that create or promote it. Here we provide a framework to evaluate interventions aimed at reducing viral misinformation online both in isolation and when used in combination. We begin by deriving a generative model of viral misinformation spread, inspired by research on infectious disease. By applying this model to a large corpus (10.5 million tweets) of misinformation events that occurred during the 2020 US election, we reveal that commonly proposed interventions are unlikely to be effective in isolation. However, our framework demonstrates that a combined approach can achieve a substantial reduction in the prevalence of misinformation. Our results highlight a practical path forward as misinformation online continues to threaten vaccination efforts, equity and democratic processes around the globe.



  • 3:00 Meeting with Nabeel
  • Add pulls of more than 500. Hopefully it’s just uncommenting and verifying?


  • Start reading file documentation – done with HDFProc
  • Asked for the exe and a test file

Phil 6.23.2022

Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.

A portrait photo of a kangaroo wearing an orange hoodie and blue sunglasses standing on the grass in front of the Sydney Opera House holding a sign on the chest that says Welcome Friends!

It is interesting that the size of the model does not matter as much for abstract concepts


Pinged Nabeel. If that goes well, maybe we can set up a tiny working group? Maybe add Lynnette? Others? We’ll see


  • I’ve kind of run out of things to do as far as direct content, so I’m playing around with titles using the GPT-3. I have the [insert] prompt between Title: and a brief description from the proposal. The results are pretty interesting:
    • Title: The Coordination of Nature
    • Author: John Clippinger
    • Publisher: MIT Press
    • Genre: Technology, Human Nature, Coordination, Communication, Law
  • I then added Belief is a Place: to the prompt and got these:
    • Title: Belief is a Place: The Evolution of Human Nature and the Emergence of Communication Technologies
    • Author: Dr. Robert E. Markley
    • Publisher: Cambridge University Press (2021)
    • Title: Belief is a Place: Human Nature and the Evolution of Communication
    • Author: Andrew Odlyzko
    • University of Wisconsin Press (Here’s the Author portal)
    • Title: Belief is a Place: How Human Nature and Technology Shapes our Lives
    • Author: William H. Calvin (look up publishers!)
  • This is interesting. It thinks that it is a book that exists already! I have to say, it looks interesting. Ordered…
  • Also, I just discovered Gaia Vince, who has written a book called Transcendence: How Humans Evolved through Fire, Language, Beauty, and Time
    • Humans are a planet-altering force. Gaia Vince argues that our unique ability – compared with other species – to determine the course of our own destiny rests on a special relationship between our genes, environment and culture going back into deep time. It is our collective culture, rather than our individual intelligence, that makes humans unique. Vince shows how four evolutionary drivers – Fire, Language, Beauty and Time – are further transforming our species into a transcendent superorganism: a hyper-cooperative mass of humanity that she calls Homo omnis. Drawing on leading-edge advances in population genetics, archaeology, palaeontology and neuroscience, Transcendence compels us to reimagine ourselves, showing us to be on the brink of something grander – and potentially more destructive.


Phil 6.22.2022

I think this is one of those interesting posts about how AI is a tool like other tools. It’s a valid point, but I’m not so sure. In my creative experience there is an initial creative part and then an extensive editing part. Generating that initial content is hard in cases like writing, graphic arts, chorography, etc. It’s not as hard when working with found objects (like photography or this piece by Marcel Duchamp. AI models like Dall-e and GPT-3 change this balance and make the initial creation more working from found objects that are latent in the models.
Look! Trajectories!

Online Coordination: Methods and Comparative Case Studies of Coordinated Groups across Four Events in the United States

  • Coordinated groups of user accounts working together in online social media can be used to manipulate the online discourse and thus is an important area of study. In this study, we work towards a general theory of coordination. There are many ways to coordinate groups online: semantic, social, referral and many more. Each represents a coordination dimension, where the more dimensions of coordination are present for one event, the stronger the coordination present. We build on existing approaches that detect coordinated groups by identifying high levels of synchronized actions within a specified time window. A key concern with this approach is the selection of the time window. We propose a method that selects the optimal window size to accurately capture local coordination while avoiding the capture of coincidental synchronicity. With this enhanced method of coordination detection, we perform a comparative study across four events: US Elections Primaries 2020, Reopen America 2020, Capitol Riots 2021 and COVID Vaccine Release 2021. Herein, we explore the following three dimensions of coordination for each event — semantic, referral and social coordination — and perform group and user analysis within and among the events. This allows us to expose different user coordination behavior patterns and identify narratives and user support themes, hence estimating the degree and theme of coordination.

Pinged Nabeel. We’ll see where that goes

Sent intro to Shannon for Aaron – nope, can’t test out of a BS?


  • Send a note to MIT Press and see if I can get a proposal template – done!

GPT Agents

  • Test some downloads
  • Create experiment, query, tweet database (users table later, based on SELECT DISTINCT on user id’s from the tweet table) got the experiment and query DBs working. Tweets are harder
  • Coding
    • Remove Sample Multiple – done
    • Remove Balance (Individual == balanced. Maybe rename?) -done
    • Create a unique experiment name (each time the App is launched?) – done
  • See if I can have a chat with Aaron about clustering


  • Set up a file for papers/websites/repos of monolithic models and start looking
  • Set up an Overleaf doc for the survey?

Phil 6.21.2022

Summer solstice!

Tix and hotel – verify airport shuttle – done


  • Sprint review yesterday
  • MDA meeting. Spent most of the time with Chris I. going over the file. Turns out there is a manual, so I need to read that
  • Sprint planning
    • Research monolithic model to compare with RCSNN
    • Continue with RCSNN tool. Start eval?
    • MDA mgmt
    • Support simaccel
    • Set up acct for remote dev on the new server
    • Support Rukan


  • Contact Nabil Galini to set up a chat
  • Add option to clamp to sample size rather than always downloading all the smallest sample
  • Make sure to pull in the full day – done


  • Pull out extra detail from overview – done!
  • Reword “examine” in chapter detail – done!
  • Submit! -Done

Phil 6.17.2022

Safer is a supertanker in advanced state of decay that will break apart or explode if the world does not act. The result will be an environmental and humanitarian catastrophe centered on the coast of a country already devastated by seven years of war and affecting the entire region. The UN is ready to stage an emergency operation to address this threat, but work will only begin when we have the necessary funds.

TOKEN is a MASK: Few-shot Named Entity Recognition with Pre-trained Language Models

  • Transferring knowledge from one domain to another is of practical importance for many tasks in natural language processing, especially when the amount of available data in the target domain is limited. In this work, we propose a novel few-shot approach to domain adaptation in the context of Named Entity Recognition (NER). We propose a two-step approach consisting of a variable base module and a template module that leverages the knowledge captured in pre-trained language models with the help of simple descriptive patterns. Our approach is simple yet versatile and can be applied in few-shot and zero-shot settings. Evaluating our lightweight approach across a number of different datasets shows that it can boost the performance of state-of-the-art baselines by 2-5% F1-score.


  • Finish proposal? Yes!
  • Gita Manaktala (Information Science and Communication editor) oversees the MIT Press’s book acquisitions and works closely with our other editors. She acquires her own list of books in the areas of information science, communication, and internet studies. Her interests include networked communication, news and information, privacy, data security, and access to knowledge.
  • Katie Helke | Editor: I acquire trade books, professional books, crossover books, and (very occasionally) textbooks. Head here if you’d like to learn more about those different book types and some other random publishing stuff that may or may not be useful to you. Head here if you’d like to learn more about the MIT Press, its history, and some of its current initiatives.


  • Continue working on balanced pull. I think I finally got the math right


  • Demo slides

Phil 6.15.2022

Overview and key findings of the 2022 Digital News Report

  • A clear throughline in this year’s report is the changing habits of younger groups, specifically those under 30, whom news organisations often struggle to reach. Throughout this Executive Summary, and in a separate chapter, we find that this group that has grown up with social media is not just different but more different than they were in the past. We also explore their use of newer visual networks for news such as TikTok and Instagram, with support from a detailed qualitative study in three countries (UK, US, and Brazil).

GPT Agents

  • Met with Shimei and Jimmy.
    • added ” OR ” to the input list to handle things like “chinavirus OR china virus”
    • Still working towards initial corpus creation. Need to store the queries in the db too
    • No meeting next week


  • Working on Stripe proposal. Set up the root tex file and reworked the OUP parts. I need to do a first person author bio and a lite version of the comparables
  • Need to write a marketing plan


  • Work on RCSNN App
    • Filtering – done
    • Color coding dictionary entries by type – got colors working. It is not obvious!
    def color_text(self, target:str, c:str='red'):
        self.tk_text.tag_remove(target, '1.0', tk.END)

        start_pos =, '1.0', stopindex=tk.END)
        spl = start_pos.split('.')
        index = int(spl[1])
        end_pos = "{}.{}".format(spl[0], index+len(target))
        print("{} pos = {}, end = {}".format(target, start_pos, end_pos))
        self.tk_text.tag_add(target, start_pos, end_pos)
        self.tk_text.tag_config(target, foreground=c)

Phil 6.14.2022

Did travel insurance!


  • Backed up to svn
  • Sent letters to Oxford, Cambridge, and Stripe
  • If that doesn’t work, I think it’s time to hire an editor

GPT Agents

  • Still working on collecting balanced data. I think the trick will be to look for the lowest number of tweets per day starting at the first day of collection and work forward, collecting that many tweets from each keyword, then repeat
  • For unbalanced, just make the one request and go forward in time until the corpora size is reached?
  • 3:30 Meeting


  • 9:15 standup. Need to get reacquainted with the RCSNN codebase and tool
  • 1:00 server status meeting
  • 1:30 resync meeting

Phil 6.13.2022


  • Finished? The proposal rewrite


  • More on FMDS. Need to write an abstract
  • MDA Meeting at 2:00

GPT Agents

  • Calculate rough tweet rate per keyword. Actually used the count interface and that works well.

Phil 6.12.2022

The town crier

  • Six years into the grass-roots movement unleashed by Donald Trump in his first presidential campaign, Angela Rubino is a case study in what that movement is becoming. Suspicious of almost everything, trusting of almost nothing, believing in almost no one other than those who share her unease, she has in many ways become a citizen of a parallel America — not just red America, but another America entirely, one she believes to be awash in domestic enemies, stolen elections, immigrant invaders, sexual predators, the machinations of a global elite and other fresh nightmares revealed by the minute on her social media scrolls. She is known online as “Burnitdown.”


  • Working on the proposal