Phil 4.2.21

I think milling = fashion

GPT Agents

  • Extract the sentiment into a workbook. It looks like it should be pretty easy:
select count(*) as count, probe from table_output where experiment_id = 89 and tag = 'raw' and sent_label = 'NEGATIVE' group by probe order by probe;
  • Continue on paper, upload to Overleaf, too
  • Meeting at 5:00

SBIR

  • More work with Rukan? Need to figure out why a 5×256 is going in, but a 256×256 is coming out. We could try an attention layer first. Let’s see how things go?
  • Set up a time to discuss research with Orest

Book

  • 2:00 Meeting with Michelle

Phil 4.1.21

Exploring the effects of algorithm-driven news sources on political behavior and polarization

  • Do algorithm-driven news sources have different effects on political behavior when compared to non-algorithmic news sources? Media companies compete for our scarce time and attention; one way they do this is by leveraging algorithms to select the most appealing content for each user. While algorithm-driven sites are increasingly popular sources of information, we know very little about the effects of algorithmically determined news at the individual level. The objective of this paper is to define and measure the effects of algorithmically generated news. We begin by developing a taxonomy of news delivery by distinguishing between two types of algorithmically generated news, socially driven and user-driven, and contrasting these with non-algorithmic news. We follow with an exploratory analysis of the effects of these news delivery modes on political behavior, specifically political participation and polarization. Using two nationally representative surveys, one of young adults and one of the general population, we find that getting news from sites that use socially driven or user-driven algorithms to generate content corresponds with higher levels of political participation, but that getting news from non-algorithmic sources does not. We also find that neither non-algorithmic nor algorithmically determined news contribute to higher levels of partisan polarization. This research helps identify important variation in the consequences of news consumption contingent on the mode of delivery.

GPT Agents

  • Finished POS tokenizing terms
  • Started sentiment (POS/NEG) on terms – done
  • Stubbed out the POS and sentiment for the token full_string – done
  • Working on the paper – progress

SBIR

  • 2:00 Standup
  • 2:00 VDI Ubuntu

Phil 3.31.21

Cool thing for the day!

https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6979

Cognitive networks identify the content of English and Italian popular posts about COVID-19 vaccines: Anticipation, logistics, conspiracy and loss of trust

  • Monitoring social discourse about COVID-19 vaccines is key to understanding how large populations perceive vaccination campaigns. We focus on 4765 unique popular tweets in English or Italian about COVID-19 vaccines between 12/2020 and 03/2021. One popular English tweet was liked up to 495,000 times, stressing how popular tweets affected cognitively massive populations. We investigate both text and multimedia in tweets, building a knowledge graph of syntactic/semantic associations in messages including visual features and indicating how online users framed social discourse mostly around the logistics of vaccine distribution. The English semantic frame of “vaccine” was highly polarised between trust/anticipation (towards the vaccine as a scientific asset saving lives) and anger/sadness (mentioning critical issues with dose administering). Semantic associations with “vaccine,” “hoax” and conspiratorial jargon indicated the persistence of conspiracy theories and vaccines in massively read English posts (absent in Italian messages). The image analysis found that popular tweets with images of people wearing face masks used language lacking the trust and joy found in tweets showing people with no masks, indicating a negative affect attributed to face covering in social discourse. A behavioural analysis revealed a tendency for users to share content eliciting joy, sadness and disgust and to like less sad messages, highlighting an interplay between emotions and content diffusion beyond sentiment. With the AstraZeneca vaccine being suspended in mid March 2021, “Astrazeneca” was associated with trustful language driven by experts, but popular Italian tweets framed “vaccine” by crucially replacing earlier levels of trust with deep sadness. Our results stress how cognitive networks and innovative multimedia processing open new ways for reconstructing online perceptions about vaccines and trust.

GPT-3 (actually GPT-Neo) is available on Huggingface: huggingface.co/EleutherAI/gpt-neo-1.3B

https://twitter.com/huggingface/status/1377273424641466370

GPT Agents

python run_clm.py \
    --model_name_or_path gpt2 \
    --train_file path_to_train_file \
    --validation_file path_to_validation_file \
    --do_train \
    --do_eval \
    --output_dir /tmp/test-clm
  • There is also an API that gives you more control described here.
from transformers import BertForSequenceClassification, Trainer, TrainingArguments

model = BertForSequenceClassification.from_pretrained("bert-large-uncased")

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers  model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=test_dataset            # evaluation dataset
)
  • Got tired of recalculating parts-of-speech, so I added a field to table_output for that and sentiment. currently reprocessing all the tables from Fauci/Trump forward.
  • Update the Overleaf doc
  • Figuring out what to do with the chess paper with Antonio

SBIR

  • MDA Meeting at 10:00

Phil 3.30.21

GPT Agents

import flair
from flair.models import TextClassifier
flair_sentiment = TextClassifier.load('en-sentiment')
text="Avengers: Infinity War is a giant battle for which directors Anthony and Joe Russo have given us touches of JRR Tolkien’s Return of the King and JK Rowling’s Harry Potter and the Deathly Hallows. The film delivers the sugar-rush of spectacle and some very amusing one-liners."
sentence=flair.data.Sentence(text)
flair_sentiment.predict(sentence)
total_sentiment = sentence.labels
print(total_sentiment)

Output:
[POSITIVE (0.9994151592254639)]

SBIR

  • Sprint planning
  • Trying to get a charge number for the RFI response – done
  • Finished the response

Phil 3.29.21

IT’s the end of the month, so it’s time for these two charts again. i think we’re seeing the Vaccines starting to have and effect? At least Switzerland seems to have gotten its second wave under control. Italy though…

https://public.flourish.studio/visualisation/4504138/

And here’s the USA. Georgia is over two times worse than the UK. Think about that.

https://public.flourish.studio/visualisation/4303726/

SBIR

  • Sprint review
  • Proposal. Boy, was that interesting. We had one vague paragraph to go on. I fed that into the GPT playground along with some additional text to structure the response. and it damn near wrote the whole thing, pulling latent knowledge out of the model. I’l do a more detailed writeup later.

GPT-Agents

  • Working on say-mask token analysis code. Done!

Phil 3.26.21

Vaccine today (hopefully)! Here’s Maryland one year ago:

And here we are today:

What a terrible year.

GPT Agents

  • Rolling in Sim’s suggestions
  • Heatmaps by row, column, or matrix are done. Need to put together a base class for a lot of this
  • Working out the summaries
  • 3:30 Meeting, got some good prompts for masks. Running them now

Phil 3.25.21

Call from Tim after 3:30?

GPT Agents

  • Got some very polarized reviews back from the IJCAI. Need to look for any factual errors
  • Getting some really nice results from Sim’s table idea. I’ve run terms and noun tables, and now I’m also extracting all tweets for context
Clearly three populations here
  • I have 16 hours to charge to the paper. Moving my content to Overleaf
  • Correlate by experiment (model) rows are probes (explicit probe list), columns are words (explicit token list)
  • 4:30 Meeting with Sim. Changing heatmaps to columns, and doing table compares across models

SBIR/ONR

  • 9:30 – 11:00 More work with Rukan
  • 1:00 Standup

GOES

  • 2:00 meeting. Presentation. Went ok
  • 3:00 AI/ML group. Presentation of the same material. Much better reception

Phil 3.24.21

Got a text from Maryland asking if I wanted a shot, and to reply “Y” to set up an appt. I was in the middle of brushing my teeth, so I waited a few minutes. By that time, the slot had been filled. For now, respond immediately!

Pay bills!

GPT Agents

  • Find a good source that explains “grounding” in NLP
  • Back up DB
  • Create Ecco spreadsheets
  • Start creating the probe/model, term table. Got it done for terms and nouns. Do a second table that is percentages
  • Need to to ranked tokens next

GOES

  • 2:00 Meeting. Didn’t wind up presenting. Tomorrow
  • Got 16 hours for writing

JuryRoom

  • Had a good chat with Jarod. UW is still working on the adjunct thing

Phil 3.23.21

Podcast Trailer: Too Lazy to Read the Paper

  • The setup is a video call where the author explains a paper to me. We can use screen-sharing, for figures, etc. We’ll record the call and post to YouTube. Possible participants are authors of a paper in network science or data science.

GPT-Agents

  • Ranking is still running. I really should have checked the amount of data I was generating, but now I have sunk costs!
  • 3:00 Meeting today. I’d like to add something about qualitative research to the discussion section
    • Create a new set of spreadsheets where all models are compared. Probe is the sheet, the model is the column, and the terms are the rows. Display as heatmap
    • Also, look at ways of doing this:
https://viztales.files.wordpress.com/2021/03/image-12.png

SBIR/ONR

  • Keep working on slide deck – done!
  • Added a bunch of generators to the data directory
  • Ping Rukan around 10:00 to start figuring out how to assemble the Transformer. I want to try assembling a one-to-many and a many-to-one set of densely connected layers of arbitrary dimensionality. Started building a stripped-down MLP

Phil 3.22.21

GPT-Neo is proud to release two pretrained GPT-Neo models trained on The Pile, the weights and configs can be freely downloaded from the-eye.eu.

3:30 Huggingface meeting

Pay bills

Send the RV back to the shop. Again. Create a checklist:

  • When disconnected from shore power, please verify:
    • Lights come on
    • Generator starts
    • Refrigerator light comes on
    • Microwave runs
    • All status panels are functioning
    • Water pressure pump runs

GOES

  • Check out and verify Vadim’s code works
  • 2:00 Meeting

GPT Agents

  • Finished terms and nouns over the weekend and started rank runs

SBIR/ONR

  • Some good content: The uncontrollability of Artificial Intelligence
    • Explicit control â€“ AI immediately stops the car, even in the middle of the highway because it interprets demands literally. This is what we have today with assistants such as SIRI and other narrow AIs. 
    • Implicit control â€“ AI attempts to comply safely by stopping the car at the first safe opportunity, perhaps on the shoulder of the road. This AI has some common sense, but still tries to follow commands.  
    • Aligned control â€“ AI understands that the human is probably looking for an opportunity to use a restroom and pulls over to the first rest stop. This AI relies on its model of the human to understand the intentions behind the command.
    • Delegated control â€“ AI does not wait for the human to issue any commands. Instead, it stops the car at the gym because it believes the human can benefit from a workout. This is a superintelligent and human-friendly system which knows how to make the human happy and to keep them safe better than the human themselves. This AI is in control.  

Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

  • Machine learning methods offer great promise for fast and accurate detection and prognostication of coronavirus disease 2019 (COVID-19) from standard-of-care chest radiographs (CXR) and chest computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we consider all published papers and preprints, for the period from 1 January 2020 to 3 October 2020, which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. All manuscripts uploaded to bioRxiv, medRxiv and arXiv along with all entries in EMBASE and MEDLINE in this timeframe are considered. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 62 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher-quality model development and well-documented manuscripts.
    • Many papers gave little attention to establishing the original source of the images
    • All proposed models suffer from a high or unclear risk of bias in at least one domain
    • We advise caution over the use of public repositories, which can lead to high risks of bias due to source issues and Frankenstein datasets as discussed above
    • [Researchers] should aim to match demographics across cohorts, an often neglected but important potential source of bias; this can be impossible with public datasets that do not include demographic information
    • Researchers should be aware that algorithms might associate more severe disease not with CXR imaging features, but the view that has been used to acquire that CXR. For example, for patients that are sick and immobile, an anteroposterior CXR view is used for practicality rather than the standard posteroanterior CXR projection
    • We emphasize the importance of using a well-curated external validation dataset of appropriate size to assess generalizability
    • Calibration statistics should be calculated for the developed models to inform predictive error and decision curve analysis

Phil 3.19.21

GPT Agents

  • Working on SocialSens2021 paper. Added some more references and figures. At 4 pages.
  • 3:30 meeting

Book

  • 2:00 Meeting with Michelle

GOES

  • 11:00 Meeting with Vadim

SBIR/ONR

  • Working on slides

Phil 3.18.21

Taxes!

GPT-Agents

  • I have a lot of results. Now I need to put some preliminary-style text into the doc

GOES

  • Get the sim to generate a pile of data.Done! And it looks good!

SBIR/ONR

  • 9:30 Meeting with Aaron – got good guidance
  • 1:00 IR&D Stand-up
  • 1:30 Meeting with Rukan – going to hand of the initial Transformer model creation and evaluation., Done. Created a spreadsheet with a desired use case
  • 4:30 Meeting with Orest. Went well? I have funding through the summer at 100%. After that

Phil 3.17.21

Shifting attention to accuracy can reduce misinformation online

  • In recent years, there has been a great deal of concern about the proliferation of false and misleading news on social media1,2,3,4. Academics and practitioners alike have asked why people share such misinformation, and sought solutions to reduce the sharing of misinformation5,6,7. Here, we attempt to address both of these questions. First, we find that the veracity of headlines has little effect on sharing intentions, despite having a large effect on judgments of accuracy. This dissociation suggests that sharing does not necessarily indicate belief. Nonetheless, most participants say it is important to share only accurate news. To shed light on this apparent contradiction, we carried out four survey experiments and a field experiment on Twitter; the results show that subtly shifting attention to accuracy increases the quality of news that people subsequently share. Together with additional computational analyses, these findings indicate that people often share misinformation because their attention is focused on factors other than accuracy—and therefore they fail to implement a strongly held preference for accurate sharing. Our results challenge the popular claim that people value partisanship over accuracy8,9, and provide evidence for scalable attention-based interventions that social media platforms could easily implement to counter misinformation online.

GPT Agents

  • Ranking is still running
  • Worked on the workshop paper. Added in a modified version of the intro from the chess paper that uses the GPT-3 now

ONR

  • Working on literature

SBIR

  • 10:00 Meeting

GOES

  • 2:00 Meeting
  • Turns out that we still have to do a demo. I need to create some data to show what that would look like. Set up a meeting with Vadim for Friday to make sure all the new code is working
  • Generated all the scripts – about 700! Tomorrow I’ll run the “sim” and generate training values

Phil 3.16.21

GPT Agents

  • I think I know how I want to structure the paper
    • Intro – discuss Tay, and how machine learning incorporates human input and reflects it back. This means that we have created ‘oracles’ that we can ask about the populations that contributed to their knowledge. In this type of computational sociology, finding and understanding the biases in these populations is an important part of the research
    • Introduce finetuned language models. Start with the chess model, and show how we can see the rank of piece terms rise and fall over the course of a sentence
    • Methods/results – describe the process of extracting chinavirus and sars-cov-2 as potential markers of different populations. Then prompts and runs to see the central terms that the models use. Show the stats. Then using the most popular terms from each model, run Ecco trajectories to show the rank behavior of these terms
    • Discussion. The possibilities of “interactive snapshots” of a population’s online behavior. The ongoing difficulty in prompt creation. Potential of maps?
    • Created the template
  • Note – Create Dr. Fauci and Donald Trump prompts – done!
  • Finished the noun finding, now running the ranks

SBIR

  • Project planning
  • Working on the ONR slides task

Phil 3.15.21 (Ides of March)

GPT Agents

  • Worked on getting useful text to look at out of the models. Using flair to scan for POS. That way I can grab the first noun that occurs which makes for less text to look through, and more useful than just looking at the first word. I think that this will also be the approach that I’ll use to pull data out of the GPT-3 for maps.
  • Finished training the COVID model, and committed to VCS
  • Got some results for the first term. Going to re-run for some number of terms next. Also played around with the resulting spreadsheets a bit to look for patterns

SBIR

  • Updating my drivers, verifying that TF still works, and upgrading to PT 1.8
    • Drivers are all updated as per here
    • Updated TF to 2.4.1 and everything still works
    • Trying to install pytorch 1.8, which wants CUDA 11.1. Going to try it with 11.0 first