- During 2020, based on changes on the use of language on Twitter, three distinct phases were identified. The first was the refusal phase: people in the US refused to accept reality despite the increasing numbers of deaths in other countries. The second was the anger phase, started after the announcement of the first death in the country: people’s fear translated into anger about the looming feeling that things were about to change. The third phase was the acceptance phase, started after the authorities imposed physical-distancing measures: people found a “new normal” for their daily activities. During the year, as cases surged in waves, so did anger, re-emerging cyclically at each wave. These results suggest the concrete future possibility of embedding epidemic psychology derived from the use of language on social media into more traditional epidemiological models.
- We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the prevalence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statistics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.
- SMD: Register for conference, book hotel and flight – Done. That took hours! There are few flights to Huntsville, and I had to use Delta, which doesn’t integrate with car rental. A lot of hotels were booked, too.
- LAIC: Finish writeup. Had a meeting with Aaron about what was needed, and then we worked on his SPIE presentation
- IRAD: Coordinate with Rukan. We reworked the JSON file so that it is human readable. Rukan will use this as a basis for rendering the targets and platforms, as well as running the scenarios. Once that’s working we’ll add ordinance
- Set up finetuning folder. See if it works with the current setup and upgrade Pytorch, TF, and Transformers as needed. Could not get that to work, so I went back to the run_clm CI approach. I rebased the transformers project, and found that there are now TF and Pytorch versions. I’m using Pytorch for this.
- Try training using the 6k corpus for fixed epochs.
- Built models for 1, 2, 4, 8, 16, and 32 epochs. You can see the formatting results improve until 16 epochs. I need to build spreadsheets that show the values and compare them to the ground truth to see if this is a better approximation. THen move on to larger corpora
- 7:00 Meeting