“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,” wrote the mods (emphasis theirs). “As such, we need the volume of these posts to reduce […] So, for now, the use of ChatGPT to create posts here on Stack Overflow is not permitted. If a user is believed to have used ChatGPT after this temporary policy is posted, sanctions will be imposed to prevent users from continuing to post such content, even if the posts would otherwise be acceptable.”
This absolutely goes in the paper:
Also, we need to discuss this in the AI programming workshop:
Visually categorizing and comparing materials is crucial for our everyday behaviour. Given the dramatic variability in their visual appearance and functional significance, what organizational principles underly the internal representation of materials? To address this question, here we use a large-scale data-driven approach to uncover the core latent dimensions in our mental representation of materials. In a first step, we assembled a new image dataset (STUFF dataset) consisting of 600 photographs of 200 systematically sampled material classes. Next, we used these images to crowdsource 1.87 million triplet similarity judgments. Based on the responses, we then modelled the assumed cognitive process underlying these choices by quantifying each image as a sparse, non-negative vector in a multidimensional embedding space. The resulting embedding predicted material similarity judgments in an independent test set close to the human noise ceiling and accurately reconstructed the similarity matrix of all 600 images in the STUFF dataset. We found that representations of individual material images were captured by a combination of 36 material dimensions that were highly reproducible and interpretable, comprising perceptual (e.g., “grainy”, “blue”) as well as conceptual (e.g., “mineral”, “viscous”) dimensions. These results have broad implications for understanding material perception, its natural dimensions, and our ability to organize materials into classes.
Need to read LM steganography paper
SBIRs
Did my self-assessment yesterday. So meaningless.
Finish the Q3 report today. I think there should be no problems
Book
Financial paperwork uploads!
GPT Agents
Get some anti-Semitic terms from the GPT and add them to the pull list to evaluate for min/max/average (this part goes in the writeup!)
Start the pull
OpenAI’s GPT-3 has upgraded their Davinci model to 003. It is putting out some really good stuff! For example, I asked the chatbot to “write some python code to visualize a Sierpenski Gasket”. A reasonably arcane request. Here’s what it wrote:
import matplotlib.pyplot as plt
def sierpinski(points, degree):
# Base case: draw the triangle
if degree == 0:
triangle = plt.Polygon(points, fill='black')
plt.gca().add_patch(triangle)
else:
# Recursive case: divide the triangle into three smaller triangles and
# recursively draw each of them
sierpinski([points[0], midpoint(points[0], points[1]), midpoint(points[0], points[2])], degree - 1)
sierpinski([points[1], midpoint(points[0], points[1]), midpoint(points[1], points[2])], degree - 1)
sierpinski([points[2], midpoint(points[2], points[1]), midpoint(points[0], points[2])], degree - 1)
def midpoint(p1, p2):
return [(p1[0] + p2[0]) / 2, (p1[1] + p2[1]) / 2]
# Set up the plot
plt.axes()
plt.axis('scaled')
# Draw the Sierpinski gasket
points = [[-1, -1/2**0.5], [0, 1/2**0.5], [1, -1/2**0.5]]
sierpinski(points, 5)
# Center the gasket in the plot
plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)
# Show the plot
plt.show()
And this is what the code generates:
Pretty impressive
YugabyteDB is a high-performance distributed massive SQL Database. It is the only relational open source database that can scale at a global level providing ACID semantics for transactions. It provides low latency, massive scale, high available data store. It has a cloud offering and enterprise edition.
Started talking to the Mastodon API. It is extremely straightforward:
import json
import requests
# A playground for exploring the Mastodon REST interface (https://docs.joinmastodon.org/client/public/)
def create_timeline_url(instance:str = "mastodon.social", limit:int=10):
url = "https://{}/api/v1/timelines/public?limit={}".format(instance, limit)
print("create_timeline_url(): {}".format(url))
return url
def connect_to_endpoint(url) -> json:
response = requests.request("GET", url)
print("Status code = : {}".format(response.status_code))
if response.status_code != 200:
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
return response.json()
def print_response(title:str, j:json):
json_str = json.dumps(j, indent=4, sort_keys=True)
print("\n------------ Begin '{}':\nresponse:\n{}\n------------ End '{}'\n".format(title, json_str, title))
def main():
print("post_lookup")
instance_list = ["fediscience.org", "mastodon.social"]
for instance in instance_list:
url = create_timeline_url(instance, 1)
rsp = connect_to_endpoint(url)
print_response("{} test:".format(instance), rsp)
if __name__ == "__main__":
main()
It’s based on collections in the ActivityPub protocol, which is a decentralized social networking protocol based upon the [ActivityStreams] 2.0 data format. It provides a client to server API for creating, updating and deleting content, as well as a federated server to server API for delivering notifications and content.
These services had been shaved down to the point where most of us were only a hair’s breadth away from quitting, because all the surplus had been transferred from us and from business users to the companies.
And the incentives are different for different users. Lurking is cheaper than posting, trolling by robot is free, etc. Would be interesting to try to model that
Book
Finish today!
Shutterstock first – done
Finish with footnotes – done
SBIRs
More writing – rolling in Rukan’s work
Send a date in December for Lauren – done
Chat with Aaron about JMOR paper
GPT Agents
Set up a weekly meeting with Jason for Tuesdays at 2:00
4:00 Meeting – going to do some pulls for COVID racism. I tried out some new prompts using openAI’s chatbot and got some good results that I need to test.
fediverse.space is a tool to visualize networks and communities on the fediverse. It works by crawling every instance it can find and aggregating statistics on communication between these.
Downloaded the latest from Overleaf and converted to a Word document. In going through the Word doc and removing all the end-line hyphenations, I also found a few more grammar errors and misspellings. Going to prepare the package to send to Elsevier later today – DONE!
Need to get rid of all the footnotes, though
SBIRs
More working on the white paper
MDA meeting at 2:00
Yikes! Need to get done with the quarterly report by the 7th.
Set up Q4 writing space
GPT Agents
Sent Jimmy updates on everything for the “professor status meeting”
This Stability-AIrepository contains Stable Diffusion models trained from scratch and will be continuously updated with new checkpoints. The following list provides an overview of all currently available models. More coming soon.
Adjective ordering preferences stand as perhaps one of the best candidates for a true linguistic universal: When multiple adjectives are strung together in service of modifying some noun, speakers of different languages—from English to Mandarin to Hebrew—exhibit robust and reliable preferences concerning the relative order of those adjectives. More importantly, despite the diversity of the languages investigated, the very same preferences surface over and over again. This tantalizing regularity has led to decades of research pursuing the source of these preferences. This article offers an overview of the findings and proposals that have resulted.
Disinformation Watch is a fortnightly newsletter covering the latest news about disinformation, including case studies, research and reporting from the BBC, international media and leading experts in the field.
Book
Working on chasing down pictures that I can use. Folks, I strongly suggest never using images that might have copyright issues as placeholders. You can get very attached to them!
Finished with figures. Here’s an example of what needs to be done. Here’s the before with the placeholder:
And here’s the after, using assets from Wikimedia, and an hour or so with Illustrator
It looks better, I think, but it was a lot of work
The key to our achievement was developing new techniques at the intersection of two completely different areas of AI research: strategic reasoning, as used in agents like AlphaGo and Pluribus, and natural language processing, as used in models like GPT-3, BlenderBot 3, LaMDA, and OPT-175B. CICERO can deduce, for example, that later in the game it will need the support of one particular player, and then craft a strategy to win that person’s favor – and even recognize the risks and opportunities that that player sees from their particular point of view.
Need to look at the Mastodon API with an eye towards anonymous journalism
Had a good chat with Aaron on how population thinking is kind of like NN models, with all the odd artifacts and required dimension reduction for the loss function. This tends to explain how companies like Facebook approximate the canonical paperclip AI and consume everything to create engagement and grow the network
Discovered gbif.org, which has a lot of tracked wildlife data, including white storks. I can make a new map with Plotly express maps
Got my metronomes yesterday! Need to get a lightweight platform and see if the experiment works
SBIRs
9:15 standup
Great discussion with Aaron about the JMOR paper. Looking at War Elephants and Mahouts (mAIhouts? Nah) as a useful metaphor for models and handlers. Aaron’s going to write a short story introduction.
Galactica is an AI trained on humanity’s scientific knowledge. You can use it as a new interface to access and manipulate what we know about the universe. (Made by Papers with Code, Meta AI)
In “Emergent Abilities of Large Language Models,” recently published in the Transactions on Machine Learning Research (TMLR), we discuss the phenomena of emergent abilities, which we define as abilities that are not present in small models but are present in larger models. More specifically, we study emergence by analyzing the performance of language models as a function of language model scale, as measured by total floating point operations (FLOPs), or how much compute was used to train the language model. However, we also explore emergence as a function of other variables, such as dataset size or number of model parameters (see the paper for full details). Overall, we present dozens of examples of emergent abilities that result from scaling up language models. The existence of such emergent abilities raises the question of whether additional scaling could potentially further expand the range of capabilities of language models.
Book
More migration. Done with part one! Don is working on getting me a studio.
Tweaked my twitter counts to work with other languages. Here’s trends for “world cup” in Persian:
Trying Fentanyl again with more traps for zero tweets – Done! Got all the user info as well. Total of 5,507,159 tweets and 2,402,215 users
Book
More migration
Order metronomes! Done!
SBIRs
Had a discussion with Aaron and Rukan about what our first model should be – commands or target priority, We’ll start with targets and the idea that there might be multiple NNs within a controller
You must be logged in to post a comment.