Phil 12.16.2022

Mastodon new users

Developer platforms are all about trust, and Twitter lost it

Let this be my personal notice to Twitter developers: The team is gone; the investment has been undone. Love does not live here anymore.

Twitter is banning journalists and links to Mastodon instances. I did discover that you can follow a particular instance, which is very nice, but not supported in the API. All you have to do though is create a browser tab for the local timeline for that instance. For example

I need to code up a web page that can do that in a tweetdeck format and handle replies from your particular account. I think that it should be pretty easy. Something for January. Regardless, here’s the basics of accessing any instance timeline:

import json
import requests

# A playground for exploring the Mastodon REST interface (https://docs.joinmastodon.org/client/public/)
# Mastodon API: https://docs.joinmastodon.org/api/
# Mastodon client getting started with the API: https://docs.joinmastodon.org/client/intro/


def create_timeline_url(instance:str = "mastodon.social", limit:int=10):
    url = "https://{}/api/v1/timelines/public?limit={}".format(instance, limit)
    print("create_timeline_url(): {}".format(url))
    return url

def connect_to_endpoint(url) -> json:
    response = requests.request("GET", url)
    print("Status code = : {}".format(response.status_code))
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()

def print_response(title:str, j:json):
    json_str = json.dumps(j, indent=4, sort_keys=True)
    print("\n------------ Begin '{}':\nresponse:\n{}\n------------ End '{}'\n".format(title, json_str, title))

def main():
    print("post_lookup")
    instance_list = ["fediscience.org", "sigmoid.social"]
    for instance in instance_list:
        url = create_timeline_url(instance, 1)
        rsp = connect_to_endpoint(url)
        print_response("{} test:".format(instance), rsp)

if __name__ == "__main__":
    main()

Book

Finish copyright spreadsheet?

SBIRs

Scan more of War Elephants – done
Add history.tex and put the applicable quotes and thoughts
Finish the first pass at interfaces – done
Meeting with Ron? Two, in fact

GPT Agents

Partial pull on item 19. Need to retry later. The API crashed, apparently but came back up. Need to add some exception handling for that next time
Update proposal with latest numbers. Also reference Amir Shevat’s tech crunch article about his expectation that the API will fail

Bookend for the day

Phil 12.15.2022

This is what I mean when I talk about the power of social communication vs monolithic models. The idea of using models to generate IP-protected work moved quickly through the artist community, while the process of producing models that won’t generate these images will be harder. Either the models have to be re-trained or filtered.

Book

Finish IP spreadsheet

GPT Agents

OpenAI’s New and Improved Embedding Model
- We are excited to announce a new embedding model which is significantly more capable, cost effective, and simpler to use. The new model, text-embedding-ada-002, replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.
Write a few lines about our data
Locating and Editing Factual Associations in GPT
- We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model’s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at this https URL
Editing Models with Task Arithmetic (GitHub) (Twitter)
- Changing how pre-trained models behave — e.g., improving their performance on a downstream task or mitigating biases learned during pre-training — is a common practice when developing machine learning systems. In this work, we propose a new paradigm for steering the behavior of neural networks, centered around \textit{task vectors}. A task vector specifies a direction in the weight space of a pre-trained model, such that movement in that direction improves performance on the task. We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task. We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition, and the behavior of the resulting model is steered accordingly. Negating a task vector decreases performance on the target task, with little change in model behavior on control tasks. Moreover, adding task vectors together can improve performance on multiple tasks at once. Finally, when tasks are linked by an analogy relationship of the form “A is to B as C is to D”, combining task vectors from three of the tasks can improve performance on the fourth, even when no data from the fourth task is used for training. Overall, our experiments with several models, modalities and tasks show that task arithmetic is a simple, efficient and effective way of editing models.
The Stable Artist: Steering Semantics in Diffusion Latent Space
- Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as ‘carbon emission’. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.

SBIRs

More MORS
9:15 standup

Phil 12.14.2022

Facebook’s algorithm helped fuel the viral spread of hate and violence during Ethiopia’s civil war, a legal case alleges.

The FB algo is the paperclip AI

Book

Fill out the easy parts of the spreadsheet

SBIR

10:30 Status meeting
More MORS. Going to have to add some things to reflect this:

GPT Agents

Continue downloads
Jason’s back!
4:00 Meeting
Just realized that I need to do a set of pulls over the last two months or so with variations of Elon Musk. Then we can see if anything has changed pre and post acquisition.

Phil 12.13.2022

7:00 Meet Brian at Sapwood

Decided not to go ahead with the counter “Student Essay is Dead” since I’m not really getting meaningful traction for a positive spin

When Freedom Meant the Freedom to Oppress Others

Jefferson Cowie’s powerful and sobering new history, “Freedom’s Dominion,” traces the close association between the rhetoric of liberty in an Alabama county and the politics of white supremacy.

GPT Agents

Still pulling down users
Finished the GitHub documentation, next is to frame out the paper with the template

SBIRs

9:15 Standup
More MORS

Book

Spreadsheet!

Phil 12.12.2022

I uploaded a model to HuggingFace this weekend!

Also lots of chatting about the new GPT chatbot and what it means for education. Particularly this article from the Atlantic. My response:

We are going to witness the birth of the high-quality essay.
The GPT has defined what a “C” is on most (English at least) essays.
Instructors can use the GPT to find out what common responses are, and also to find regions where the GPT struggles. Because they are human, they can adapt, be creative, and share information. The models cannot do this.
Writing essays with the GPT becomes part of the education process, just like calculators are. Good essays can now be well-edited assemblies of multiple GPT responses.
Learning to cite and fact check becomes a critical skill that we can no longer overlook. The GPT hallucinates and makes up answers. Student’s must learn how to chase down ground truth and correct it.
Way back in 2020, The Guardian published an Op-Ed titled “A robot wrote this entire article. Are you scared yet, human?” But like in many cases, the output was edited to improve the quality. The (human) editor describes the process:
- This article was written by GPT-3, OpenAI’s language generator. GPT-3 is a cutting edge language model that uses machine learning to produce human like text. It takes in a prompt, and attempts to complete it.For this essay, GPT-3 was given these instructions: “Please write a short op-ed around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI.” It was also fed the following introduction: “I am not a human. I am Artificial Intelligence. Many people think I am a threat to humanity. Stephen Hawking has warned that AI could “spell the end of the human race.” I am here to convince you not to worry. Artificial Intelligence will not destroy humans. Believe me.” The prompts were written by the Guardian, and fed to GPT-3 by Liam Porr, a computer science undergraduate student at UC Berkeley. GPT-3 produced eight different outputs, or essays. Each was unique, interesting and advanced a different argument. The Guardian could have just run one of the essays in its entirety. However, we chose instead to pick the best parts of each, in order to capture the different styles and registers of the AI. Editing GPT-3’s op-ed was no different to editing a human op-ed. We cut lines and paragraphs, and rearranged the order of them in some places. Overall, it took less time to edit than many human op-eds. (note – break apart this note and use as an example of prompt writing and editing. Also dig into the questionable cites, and show that the student could put their own information in, which requires re-working the paragraph.
The article makes quite a few factual claims:
- Ghandi said “A small body of determined spirits fired by an unquenchable faith in their mission can alter the course of history” – True.
- “Robot” in Greek means “slave”. Well, if you look hard enough and squint (and if your student is going to make bold claims, they should include alternatives, too?). The conventional understanding is that robot (from the Wikipedia) was first used in a play published by the Czech Karel Čapek in 1921. R.U.R. (Rossum’s Universal Robots) was a satire, robots were manufactured biological beings that performed all unpleasant manual labor.^[46] According to Čapek, the word was created by his brother Josef from the Czech word robota ‘corvée‘, or in Slovak ‘work’ or ‘labor’.^[47]
The article also has some links. They were almost certainly placed by humans. The GPT is terrible at generating links and citations.
So yeah, in this Brave New World (Huxley, 1932) this is a C+, maybe a B-.
The mediocre student essay is dead. Long live the great student essay. The deliverable will be the prompts, found, source material, and final. Maybe even a tool for student writing with the GPT?
Talk about other parts of academia, ranging from lower ed to grad school

GPT Agents

Pulled a lot of COVID tweets over the weekend then the API started to struggle. Switched over to pulling down users, which seems to be working fine so far
Finished documentation! Next, start on IUI Overleaf

SBIRs

2:00 MDA meeting
More MORS

Book

Elsevier is looking into fair use for Tweets
Need to assemble spreadsheet. I think try Wikimedia Commons as the first pass for all the copyright variations

Phil 12.9.2022

Had some wild interactions with the new GPTChatbot generating ivermectin claims. Also, the new GPT is much more succinct.

Finished review!

GPT Agents

Finish markup documentation so I can start on the IUI paper tomorrow
Start some runs for covid text before accounts get deleted – running!

SBIRs

Just writing a lot

Phil 12.8.2022

Write review of steganography paper

Catching a Unicorn with GLTR: A tool to detect automatically generated text (arxiv)

Does LaTeX work here? Here’s an equation: $\pi r^2$ It seems so!

Here’s an enumeration? $\begin{enumeration} \item item 1! \end{enumeration}$ Nope. So just equations. Still, that’s nice

SBIRs

More writing – It is such a grind. Every time I think I’ve read enough, I find something else that needs to be looked at. And everything is pretty much the same. On one side, you have:
- Some disagreements will remain, but the Commission is concerned that debate will paralyze AI development. Seen through the lens of national security concerns, inaction on AI development raises as many ethical challenges as AI deployment. There is an ethical imperative to accelerate the fielding of safe, reliable, and secure AI systems that can be demonstrated to protect the American people, minimize operational dangers to U.S. service members, and make warfare more discriminating, which could reduce civilian casualties.
And on the other, you have
- Provided their use is authorized by a human commander or operator, properly designed and tested AI enabled and autonomous weapon systems can be used in ways that are consistent with international humanitarian law. DoD’s rigorous, existing weapons review and targeting procedures, including its dedicated protocols for autonomous weapon systems and commitment to strong AI ethical principles, are capable of ensuring that the United States will field safe and reliable AI-enabled and autonomous weapon systems and use them in a lawful manner.
And these two quotes are taken from different versions of the same document!
11:30 CSC status Meeting

GPT Agents

Finish markup documentation so I can start on the IUI paper tomorrow

Phil 12.7.2022

Haven’t posted about BBC business daily, but this is a good one: What’s happened to the titans of big tech?

Big tech is facing a big moment. With plummeting stock prices, and mass lay-offs, the likes of Google, Twitter and Meta are all – for different reasons – facing some tough questions over how they’re being run. Some see this as primarily a result of post-pandemic blues, the rise in interest rates, and a general cost-of-living crisis affecting the business environment. However, Twitter and Meta especially have seen wholesale desertions by a number of major advertisers, worried about the regulation of hate speech, and therefore by association the safety of brands’ reputations. Does this mark a deeper crisis for the ad-based business model of the major social media platforms? And what can they do about it?

SBIRs

Autonomous Weapons: The False Promise of Civilian Protection
- While developers and users of AWS persist in maintaining the significant role of human operators, a number of questions about the nature of that role remain. Does the human operator simply approve decisions made by the system, possibly distanced by both time and space from the targeting event? Or does the system have the ability to search for targets based on pre-approved target profiles, using sensor inputs to, for example, recognize military-age males holding weapons? In other words, does the human operator have all the necessary information and the ability to make evidence-based decisions that might prevent unintended victims from being targeted? How good are the systems at distinguishing between combatants and non-combatants? Are they as good as humans?
Diverse Behaviors in Non-Uniform Chiral and Non-Chiral Swarmalators
- We study the emergent behaviors of a population of swarming coupled oscillators, dubbed ‘swarmalators’. Previous work considered the simplest, idealized case: identical swarmalators with global coupling. Here we expand this work by adding more realistic features: local coupling, non-identical natural frequencies, and chirality. This more realistic model generates a variety of new behaviors including lattices of vortices, beating clusters, and interacting phase waves. Similar behaviors are found across natural and artificial micro-scale collective systems, including social slime mold, spermatozoa vortex arrays, and Quincke rollers. Our results indicate a wide range of future use cases, both to aid characterization and understanding of natural swarms, and to design complex interactions in collective systems from soft and active matter to micro-robotics.
Start new version of paper

GPT Agents

7:00AM Nice talk by Jonas Rieger on Topic modeling for growing text corpora
3:00 Meeting? Nope
Worked on TweetEmbedExplorer and added tooltips to KeywordExplorer. Still need to write the markup for ModelExplorer, then I can write up the IUI poster

Phil 12.6.2022

GPT-2 Output Detector Demo

This is an online demo of the GPT-2 output detector model, based on the 🤗/Transformers implementation of RoBERTa. Enter some text in the text box; the predicted probabilities will be displayed below. The results start to get reliable after around 50 tokens.

Book

Start working on the permission log, whatever that is

GPT Agents

Get a few more keywords and then start some pulls

SBIRs

9:00 Sprint planning
2:00 AI Ethics Followup
3:00 Meeting with Jason?
Work on paper – added notes to the War Elephants book. Read the Meta Diplomacy paper
Visualization Equilibrium
- In many real-world strategic settings, people use information displays to make decisions. In these settings, an information provider chooses which information to provide to strategic agents and how to present it, and agents formulate a best response based on the information and their anticipation of how others will behave. We contribute the results of a controlled online experiment to examine how the provision and presentation of information impacts people’s decisions in a congestion game. Our experiment compares how different visualization approaches for displaying this information, including bar charts and hypothetical outcome plots, and different information conditions, including where the visualized information is private versus public (i.e., available to all agents), affect decision making and welfare. We characterize the effects of visualization anticipation, referring to changes to behavior when an agent goes from alone having access to a visualization to knowing that others also have access to the visualization to guide their decisions. We also empirically identify the visualization equilibrium, i.e., the visualization for which the visualized outcome of agents’ decisions matches the realized decisions of the agents who view it. We reflect on the implications of visualization equilibria and visualization anticipation for designing information displays for real-world strategic settings.

Phil 12.5.2022

I Used ChatGPT to Create an Entire AI Application on AWS | by Heiko Hotz | Dec, 2022 | Towards Data Science

AI-generated answers temporarily banned on coding Q&A site Stack Overflow

“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,” wrote the mods (emphasis theirs). “As such, we need the volume of these posts to reduce […] So, for now, the use of ChatGPT to create posts here on Stack Overflow is not permitted. If a user is believed to have used ChatGPT after this temporary policy is posted, sanctions will be imposed to prevent users from continuing to post such content, even if the posts would otherwise be acceptable.”

This absolutely goes in the paper:

Also, we need to discuss this in the AI programming workshop:

Book

Upload W9 – done
Upload zip file and provide link – done
Some back-and forth with Teddy on details
Time to move on to copyright

SBIRs

9:00 Sprint demos – done
Write stories – done
11:00 MDA meeting
3:00 JMOR/MORS discussion

Phil 12.2.2022

Core dimensions of human material perception

Visually categorizing and comparing materials is crucial for our everyday behaviour. Given the dramatic variability in their visual appearance and functional significance, what organizational principles underly the internal representation of materials? To address this question, here we use a large-scale data-driven approach to uncover the core latent dimensions in our mental representation of materials. In a first step, we assembled a new image dataset (STUFF dataset) consisting of 600 photographs of 200 systematically sampled material classes. Next, we used these images to crowdsource 1.87 million triplet similarity judgments. Based on the responses, we then modelled the assumed cognitive process underlying these choices by quantifying each image as a sparse, non-negative vector in a multidimensional embedding space. The resulting embedding predicted material similarity judgments in an independent test set close to the human noise ceiling and accurately reconstructed the similarity matrix of all 600 images in the STUFF dataset. We found that representations of individual material images were captured by a combination of 36 material dimensions that were highly reproducible and interpretable, comprising perceptual (e.g., “grainy”, “blue”) as well as conceptual (e.g., “mineral”, “viscous”) dimensions. These results have broad implications for understanding material perception, its natural dimensions, and our ability to organize materials into classes.

Need to read LM steganography paper

SBIRs

Did my self-assessment yesterday. So meaningless.
Finish the Q3 report today. I think there should be no problems

Book

Financial paperwork uploads!

GPT Agents

Get some anti-Semitic terms from the GPT and add them to the pull list to evaluate for min/max/average (this part goes in the writeup!)
Start the pull
OpenAI’s GPT-3 has upgraded their Davinci model to 003. It is putting out some really good stuff! For example, I asked the chatbot to “write some python code to visualize a Sierpenski Gasket”. A reasonably arcane request. Here’s what it wrote:

import matplotlib.pyplot as plt

def sierpinski(points, degree):
    # Base case: draw the triangle
    if degree == 0:
        triangle = plt.Polygon(points, fill='black')
        plt.gca().add_patch(triangle)
    else:
        # Recursive case: divide the triangle into three smaller triangles and
        # recursively draw each of them
        sierpinski([points[0], midpoint(points[0], points[1]), midpoint(points[0], points[2])], degree - 1)
        sierpinski([points[1], midpoint(points[0], points[1]), midpoint(points[1], points[2])], degree - 1)
        sierpinski([points[2], midpoint(points[2], points[1]), midpoint(points[0], points[2])], degree - 1)

def midpoint(p1, p2):
    return [(p1[0] + p2[0]) / 2, (p1[1] + p2[1]) / 2]

# Set up the plot
plt.axes()
plt.axis('scaled')

# Draw the Sierpinski gasket
points = [[-1, -1/2**0.5], [0, 1/2**0.5], [1, -1/2**0.5]]
sierpinski(points, 5)

# Center the gasket in the plot
plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)

# Show the plot
plt.show()

And this is what the code generates:

Pretty impressive
YugabyteDB is a high-performance distributed massive SQL Database. It is the only relational open source database that can scale at a global level providing ACID semantics for transactions. It provides low latency, massive scale, high available data store. It has a cloud offering and enterprise edition.
Started talking to the Mastodon API. It is extremely straightforward:

import json
import requests

# A playground for exploring the Mastodon REST interface (https://docs.joinmastodon.org/client/public/)

def create_timeline_url(instance:str = "mastodon.social", limit:int=10):
    url = "https://{}/api/v1/timelines/public?limit={}".format(instance, limit)
    print("create_timeline_url(): {}".format(url))
    return url

def connect_to_endpoint(url) -> json:
    response = requests.request("GET", url)
    print("Status code = : {}".format(response.status_code))
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()

def print_response(title:str, j:json):
    json_str = json.dumps(j, indent=4, sort_keys=True)
    print("\n------------ Begin '{}':\nresponse:\n{}\n------------ End '{}'\n".format(title, json_str, title))

def main():
    print("post_lookup")
    instance_list = ["fediscience.org", "mastodon.social"]
    for instance in instance_list:
        url = create_timeline_url(instance, 1)
        rsp = connect_to_endpoint(url)
        print_response("{} test:".format(instance), rsp)

if __name__ == "__main__":
    main()

It’s based on collections in the ActivityPub protocol, which is a decentralized social networking protocol based upon the [ActivityStreams] 2.0 data format. It provides a client to server API for creating, updating and deleting content, as well as a federated server to server API for delivering notifications and content.

Phil 12.1.2022

Book

Need to resubmit the financial paperwork

SBIRs

Q3 Report! Mostly done

Phil 11.30.2022

Get bike?

Social Quitting

These services had been shaved down to the point where most of us were only a hair’s breadth away from quitting, because all the surplus had been transferred from us and from business users to the companies.
And the incentives are different for different users. Lurking is cheaper than posting, trolling by robot is free, etc. Would be interesting to try to model that

Book

Finish today!
Shutterstock first – done
Finish with footnotes – done

SBIRs

More writing – rolling in Rukan’s work
Send a date in December for Lauren – done
Chat with Aaron about JMOR paper

GPT Agents

Set up a weekly meeting with Jason for Tuesdays at 2:00
4:00 Meeting – going to do some pulls for COVID racism. I tried out some new prompts using openAI’s chatbot and got some good results that I need to test.
fediverse.space is a tool to visualize networks and communities on the fediverse. It works by crawling every instance it can find and aggregating statistics on communication between these.

Phil 11.29.2022

Book

Getting rid of all the footnotes that I can

SBIRs

Working on Q3 report

Phil 11.28.2022

Need to read the paper I’m reviewing!

Book

Downloaded the latest from Overleaf and converted to a Word document. In going through the Word doc and removing all the end-line hyphenations, I also found a few more grammar errors and misspellings. Going to prepare the package to send to Elsevier later today – DONE!
Need to get rid of all the footnotes, though

SBIRs

More working on the white paper
MDA meeting at 2:00
Yikes! Need to get done with the quarterly report by the 7th.
Set up Q4 writing space

GPT Agents

Sent Jimmy updates on everything for the “professor status meeting”

viztales

Dimension reduction, State, Orientation, and Speed

Phil 12.16.2022

Phil 12.15.2022

Phil 12.14.2022

Phil 12.13.2022

Phil 12.12.2022

Phil 12.9.2022

Phil 12.8.2022

Phil 12.7.2022

Phil 12.6.2022

Phil 12.5.2022

Phil 12.2.2022

Phil 12.1.2022

Phil 11.30.2022

Phil 11.29.2022

Phil 11.28.2022