Empirical inquiries of political news consumption are typically based on analysis at the level of the news source: a given web domain can be assigned a partisanship score reflective of its relative tendency to be shared by Democrats or Republicans. This practical, tractable approach represents an important methodological advance which has allowed for large-scale empirical studies of how democratic citizens consume political information online. However, despite strong evidence that information sharing is dominated by in-group bias, previous work has also found that most users are exposed to information from a balanced variety of mainstream sources. Such conflicting findings around filter bubbles and echo chambers highlights the need to be able to estimate partisanship at the more fine-grained level of individual stories. It may be that individuals tend to consume politically homogeneous content which originates from a relatively heterogeneous collection of sources. Rather than never sharing stories associated with their political opponents, partisans may selectively share out-group content precisely when that information is favorable to them. Using a panel of 1.6 million Twitter users linked to administrative data, we test this dynamic by examining within-domain sharing patterns by user partisanship over time. Consistent with previous work, we find that, in aggregate, partisans do consume news from a variety of sources. However, we find notable story-level differences suggesting that, despite the heterogeneity of sources, the news curated from partisan’s social networks contains politically homogeneous information. Our findings suggest that domain-level analyses of information sharing gives a false impression of exposure to politically diverse content, and raises new concerns regarding polarization in the consumption and sharing of digital media
This really fits with my experience, where Fox News viewers share links to the NYTimes that are mentioned on Fox, often without reading them
Add 200k data to rollup spreadsheet
Here’s the 200k added to the stars counts for each model vs the Yelp 75k ground truth
It seems to be better at the lower star counts, but worse at 5. To make sure this wasn’t an artifact of the training data, here’s a measure of the error vs the specific data used to create the training corpora:
Have the same size corpora (100k) and the same number of training steps, and prompt with “stars = “.
Fine-tuning a pretrained model: In this tutorial, we will show you how to fine-tune a pretrained model from the Transformers library. In TensorFlow, models can be directly trained using Keras and the fit method. In PyTorch, there is no generic training loop so the 🤗 Transformers library provides an API with the class Trainer to let you fine-tune or train a model from scratch easily. Then we will show you how to alternatively write the whole training loop in PyTorch.
The Momo Challenge is a repackaged version of an older, nearly identical and largely debunked suicide game called the Blue Whale Game. In 2017 scary warnings circulated on social media asking parents, teachers, and police to beware of a hidden threat to children: a sinister online “game” that can lead to death.
Create a 200k corpora and start training the model. Started training at 9:00am
Pushshift will soon provide API endpoints to do audio to text transcribing (at a combined aggregate rate of 10 minutes of audio per second). The upcoming API will also have convenience methods where you can provide a link to a youtube video and..
Starting the run that creates synthetic entries for all the models – done
4:00 – 6:00 Arpita’s defense! Looks like she did a great job. It makes me wonder whether larger databases with features that indicate where data came from for multiple platforms could be better than separate smaller models for each platform. In the satellite anomaly detection systems I build, we typically train one model per vehicle. This work implies that it might be better to have one model for all vehicles with domain knowledge about each vehicle included in the feature vector
…when cities in China, Europe, and finally the United States descended into lockdown, there was no mass panic. There was fear, yes, plenty of it—but that fear did not lead to irrational, hysterical, or violent group behavior. Our fear did not lead to looting, pogroms, or unrest. The fearful of Wuhan did not rise up in rebellion against the Communist Party; even when Italian doctors began rationing medical equipment and supplies, the fearful of Milan did not loot stores or disrupt the medical system; the fearful of New York did not duel each other to the death over toilet paper rolls.
A real-world example of stampede vs. flock behavior?
There is also a more detailed exploration here. What I think is really important here is the idea that populations that would not normally be included in a stampede can be “pulled along” by the structure of the surrounding belief environment.
… the lower-left quadrant (counties which are deep blue politically but also have a low vaccination rate), since that would seem, on the surface, to go completely against my “Red/Blue = Vaccination Status” theme. What’s going on there? Shouldn’t these deep-blue counties have higher-than-average vaccination rates? … 62 of them are more than 40% Black (in fact, 55 of those are majority Black counties). Of the remaining 13 counties, 7 are majority Native American (over 80% of the population, in fact), while 1 in Texas (Zavala County) is “91.22% Hispanic or Latino of any race” according to Wikipedia.
Working on slide deck for a bit before I head home
Chat yesterday with Aaron abut the proposal. We still don’t have Peter’s contributions. Maybe Loren can do it?
Had a thought about the communication without coordination concept. The simulations can be compressed using something like run-length encoding at some specified quantization level. The compressed models can be compared (maybe just as number of bytes?) to give an idea of how well they are in agreement. There should be some level of granularity that the representations (and hence the underlying models) diverge. That should be an indication of a need for how much and what kind of coordinating data.
We’re releasing an analysis showing that since 2012, the amount of compute used in the largest AI training runs has been increasing exponentially with a 3.4-month doubling time (by comparison, Moore’s Law had a 2-year doubling period). Since 2012, this metric has grown by more than 300,000x (a 2-year doubling period would yield only a 7x increase). Improvements in compute have been a key component of AI progress, so as long as this trend continues, it’s worth preparing for the implications of systems far outside today’s capabilities.
NIST outlines the approach in A Proposal for Identifying and Managing Bias in Artificial Intelligence (NIST Special Publication 1270), a new publication that forms part of the agency’s broader effort to support the development of trustworthy and responsible AI. NIST is accepting comments on the document until Sept. 10, 2021 (extended from the original deadline of Aug. 5, 2021), and the authors will use the public’s responses to help shape the agenda of several collaborative virtual events NIST will hold in coming months . This series of events is intended to engage the stakeholder community and allow them to provide feedback and recommendations for mitigating the risk of bias in AI. Comments are sought on the publication, which is part of NIST’s effort to develop trustworthy AI.
The silos of political groupthink created by social media have turned out to be ideal settings for the germination and dissemination of extremist ideas and alternative realities. To date, the most significant and frightening cultic phenomenon to arise from social media is QAnon. According to some observers, the QAnon movement does not qualify as a proper cult, because it lacks a single charismatic leader. Donald Trump is a hero of the movement, but not its controller. “Q,” the online presence whose gnomic briefings—“Q drops”—form the basis of the QAnon mythology, is arguably a leader of sorts, but the army of “gurus” and “promoters” who decode, interpret, and embroider Q’s utterances have shown themselves perfectly capable of generating doctrine and inciting violence in the absence of Q’s directives. (Q has not posted anything since December, but the prophecies and conspiracies have continued to proliferate.) It’s possible that our traditional definitions of what constitutes a cult organization will have to adapt to the Internet age and a new model of crowdsourced cult.
Back up db
More discussion with Jarod
Laic proposal? Take the map article and chess paper and use them as a base. Done! That was too much work
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Today’s white evangelicals in the U.S.—along with many conservative white Catholics and mainline Protestants—imagine themselves to be the persecuted faithful, victims of state oppression in the mold of biblical apocalypses. While this might seem ludicrous to outsiders, it aptly captures their sense of the disorder of the last half century as they’ve been compelled to share cultural and political power with other groups. As it did centuries ago, apocalypse channels the persecuted group’s fear, focusing their resentment and properly directing their anger. Apocalypse’s crucial component for U.S. politics today is this extreme moral dualism, not the imminent End Times.
Sent a note with preliminary results to the team
Back up db
Put some text together for Jarod’s proposal – done
Ever since OpenAI released the weights and code for their CLIP model, various hackers, artists, researchers, and deep learning enthusiasts have figured out how to utilize CLIP as a an effective “natural language steering wheel” for various generative models, allowing artists to create all sorts of interesting visual art merely by inputting some text – a caption, a poem, a lyric, a word – to one of these models.
So what is a prompt? A prompt is a piece of text inserted in the input examples, so that the original task can be formulated as a (masked) language modeling problem. For example, say we want to classify the sentiment of the movie review “No reason to watch”, we can append a prompt “It was” to the sentence, getting No reason to watch. It was ____”. It is natural to expect a higher probability from the language model to generate “terrible” than “great”. This piece reviews of recent advances in prompts in large language models.
Hate speech detection is a critical problem in social media, being often accused for enabling the spread of hatred and igniting violence. Hate speech detection requires overwhelming computing resources for online monitoring as well as thousands of human experts for daily screening of suspected posts or tweets. Recently, deep learning (DL)-based solutions have been proposed for hate speech detection, using modest-sized datasets of few thousands of sequences. While these methods perform well on the specific datasets, their ability to generalize to new hate speech sequences is limited. Being a data-driven approach, it is known that DL surpasses other methods whenever scale-up in trainset size and diversity is achieved. Therefore, we first present a dataset of 1 million hate and nonhate sequences, produced by a deep generative model. We further utilize the generated data to train a well-studied DL detector, demonstrating significant performance improvements across five hate speech datasets.
The Zipp warranty worked! Dinged rim replaced for free!
Start adding initial text
4:00 Meeting with Michelle
Write code to do the database pulls to count stars and votes
Important to avoid NaNs in the dataframe which xlsxwriter can’t handle:
for t in tag_list:
ws = worksheet_dict[t]
l = combined_dict[t]
df = pd.DataFrame(l)
df = df.fillna(0) <------ THIS
stx.write_dataframe(ws, df, row=1, avg=True)
The idea of the Western construct of time as a source of neurosis came up yesterday. I found this, which kind of supports the idea. It also ties into the thing that we’re trying to work out with indigenous software practices, that might not have the same focus on scheduling and individual adherence to a schedule?
The goal of this paper is to introduce Phenomenology and the Cognitive Sciences’ thematic issue on disordered temporalities. The authors begin by discussing the main reason for the neglect of temporal experience in present-day psychiatric nosologies, mainly, its reduction to clock time. Methodological challenges facing research on temporal experience include addressing the felt sense of time, its structure, and its pre-reflective aspects in the life-world setting. In the second part, the paper covers the contributions to the thematic issue concerning temporal experience in anxiety, depression, mania, addiction, post-traumatic stress disorder, autism, and in recovery from psychosis. The authors argue in favor of integrative and cross-disciplinary approaches. In conclusion, they present time as a significant aspect of human suffering.
The model finished training on Thursday, and I got the model putting values into table_synth_review, with an entry in the experiment table as well
Today, do ten 1,000 review runs and compare. Then compare to the actual data
Got some nice internally consistent runs!
Sentiment analyzer on actual data. Test whether predict whether predicted sentiment matches stars
Binary threshold accuracy?
Correlation analysis of confidence vs stars?
Does the sentiment analyzer work? Evaluate sentiment analyzer vs star ratings?
Does ground truth sentiment distribution in GPT data match ground distribution in the data?
Does predicted sentiment distribution in GPT data match ground distribution in the data?
Does predicted sentiment distribution in GPT data match predicted distribution in the data?
Here’s more data:
9:00 Sprint review (slides!) – done
Phase 2 proposal – updated. Now I need to fill in some initial working text