Phil 1.19.21

Feeling like the inauguration will go smoothly, but holding my breath anyway

Fixing disinformation won’t save usEthan Zuckerman

  • There have been countless fact-checking and other efforts designed to rid social media of misinformation. They’re not going to work until the party and the major ideological amplifiers start explicitly renouncing these points of view. The signs are not good – while Fox News was willing to declare that Joe Biden had won the election, they are still providing platforms for people denying the facts of the victory. And a majority of Republican representatives voted to overturn a democratic election. Until there are consequences for perpetuating those falsehoods, don’t count on changes to the media to solve this problem

The end of the Trump-Fox feedback loop

  • Twitter’s January 8 decision to permanently suspend Trump’s account closed a rare window into a president’s mindset and policymaking that we are unlikely to ever see again. For the past four years, I documented the sources of the president’s grievances and obsessions, matching Trump’s tweets to the television segments he was watching. The president’s TV addiction inspired at least 1,375 tweets dating back to September 1, 2018. The vast majority came in response to his favorite programs on the pro-Trump Fox News and Fox Business networks. 

But if there ever was a coda for the Trump years, this has got to be it:

https://twitter.com/johnastoehr/status/1351607351275528192
https://jalammar.github.io/hidden-states/

Book

  • Start on diversity injection section
  • Research note: Examining false beliefs about voter fraud in the wake of the 2020 Presidential Election
    • The 2020 U.S. Presidential Election saw an unprecedented number of false claims alleging election fraud and arguing that Donald Trump was the actual winner of the election. Here we report a survey exploring belief in these false claims that was conducted three days after Biden was declared the winner. We find that a majority of Trump voters in our sample – particularly those who were more politically knowledgeable and more closely following election news – falsely believed that election fraud was widespread, and that Trump won the election. Thus, false beliefs about the election are not merely a fringe phenomenon. We also find that Trump conceding or losing his legal challenges would likely lead a majority of Trump voters to accept Biden’s victory as legitimate, although 40% said they would continue to view Biden as illegitimate regardless. Finally, we found that levels of partisan spite and endorsement of violence were equivalent between Trump and Biden voters.

MDS

  • Meeting with Aaron today to discuss nest steps and how to combine with his project?
  • Still need to be able to access the VPN – more paperwork. Wheee!

GOES

  • Continue with the new TopController
  • Reading in and stepping through the script. Now I need to slew through the points and return a done when the l2 dist is within a threshold

GPT Agents

Phil 1.15.21

My Former Hasidic Community Still Supports Trump. I’m Not Surprised.

  • The thing is, contemporary Hasidic sects are designed for authoritarian control. Each Hasidic sect, from Bobov to Viznitz to Satmar to Skver, are run by what is called a “grand rabbi.” These rabbis are demanding patriarchs. They expect women to wear particular shades of stockings, men to dress identically, congregants to receive their blessings before making any personal life decisions, and they believe in a world where Hasids are the only Jews worth mentioning. Most importantly, Hasidic grand rabbis center their congregants’ worlds around themselves. They are populist leaders of miniature nations. Congregants have paintings and photographs of grand rabbis around their homes, sacrifice family time for tisches (Friday night gatherings) with their leaders, and would do anything to protect the power of their particular grand rabbi.

Book

  • Working on Making better Human-Computer Interfaces for Populations. Finished my first pass at The signature of dangerous misinformation section
  • 2:00 Meeting with Michelle

GOES

  • Decided to build out a sandbox ScriptReaderScratch RCS controller to work out the file loading and playback. Rather than AngleController, I’ll have a method that interpolates to the newest target. That should be enough to let me work out the details without breaking anything

GPT Agents

  • Start pulling off pages from paper
  • 3:30 meeting

Phil 1.14.21

Today’s webpage from the Washington Post:

https://viztales.files.wordpress.com/2021/01/image-9.png

Book

  • Working on Making better Human-Computer Interfaces for Populations

GOES

  • 10:00 meeting with Vadim, then start working on TopController and AngleController
  • Have AngleController return DONE when it’s sufficiently close to its goal
  • Have TopController load the script, send a command to AngleController to get to the start point of the manouver
  • Upon DONE from AngleController, run the script at the specified speed. While the script is running, just re-issue the TO_ANGLE command.
  • Once the script is done, wait for the AngleController to reach the goal
  • Another option is to leave the AngleController running and handle the logic in TopController. Need to think about that.
  • Customer meeting at 2:00
  • Submitted spreadsheet to Eric about bandwidth needs

ML meeting at 3:30 – wound up being an evening of financial and cycling advice!

Phil 1.13.21

2021 continues to produce surprises: Several senior Republicans join impeachment push

I was looking at a McSweeny’s article (LEST WE FORGET THE HORRORS: A CATALOG OF TRUMP’S WORST CRUELTIES, COLLUSIONS, CORRUPTIONS, AND CRIMES). It references a pile of Trump Tweets that now look like this:

https://twitter.com/realDonaldTrump/status/411247268763676673

History has been deplatformed. Now what?

Good article on QAnon: QAnon reshaped Trump’s party and radicalized believers. The Capitol siege may just be the start.

Speaking of Twitter, this is a good thread on how to write and contest ML conference papers

https://twitter.com/bneyshabur/status/1349225436153319429

ParametricUMAP allows users to train a neural network to optimize the embedding, resulting in a direct neural net based mapping from source data to embedding. This allows for extremely fast inference (embedding of new data points), orders of magnitude faster than standard UMAP. It also provides facilities for an inverse transform, mapping from the embedding space to the original data space that is both far faster and more robust that that provided by standard UMAP. Since network architectures can be user provided this also allows for CNN and RNN based UMAP embeddings for images or sequences.

Book

  • Continue Making better Human-Computer Interfaces for Populations

GOES

  • Add in mapping to script reader, verify by adding legends

MDS

  • Status meeting maybe produce a spreadsheet to walk through that shows a time series of inputs and a calculation for each set? I think the inputs can be a column of six (for now?) variables as a set of rows, and the prediction calculations are shown below that. Make a DataFrame and see what that looks like.

Phil 1.12.21

From Charlottesville to the Capitol: how rightwing impunity fueled the pro-Trump mob

  • The playbook for the Maga invasion of the nation’s Capitol building on Wednesday has been developing for years in plain sight, at far-right rallies in cities like Charlottesville, Berkeley and Portland, and then, in the past year, at state capitols across the country, where heavily armed white protesters have forced their way into legislative chambers to accuse politicians of tyranny and treason.

Here’s what seems to have happened with the Parler hack. The data may be available for research

Nice paper on training a model to generate synthetic data for better classification training: Reducing AI bias with Synthetic data. It uses the gretel’s gretel-synthetics library It’s free to use during the beta period, not sure about after, or what the pricing will be. They are hiring, with about seven openings at the moment, so they are burning through someone’s money.

GPT Agents

  • Finish abstract submission – done
  • Make an Overleaf project for qualitative paper?

GOES

  • Finish up the ManeuverReader – done! Here’s the original, with some large number of points that is subsampled to 100 points and stored as a json file
  • Here’s a reconstructed version that uses 1/3 (33) steps through the file. You can see a little roughness, but with more points it’s indistinguishable from the original pulled off influxDB:
  • And here’s a snippet of the json file
{
"title": "test",
"speed_multiple": 1.0,
"read_fmt": "%H:%M:%S",
"mapping": {
"GNC_AC_MOM_GEN_MOMBODY_X": "pitch",
"GNC_AC_MOM_GEN_MOMBODY_Y": "roll",
"GNC_AC_MOM_GEN_MOMBODY_Z": "yaw"
},
"duration": "00:29:56",
"samples": [
{
"timestamp": "00:00:00",
"values": [
{
"name": "GNC_AC_MOM_GEN_MOMBODY_X",
"value": 63.68628693
},
{
"name": "GNC_AC_MOM_GEN_MOMBODY_Y",
"value": 1.657353401
},
{
"name": "GNC_AC_MOM_GEN_MOMBODY_Z",
"value": -3.304497004
}
]
},
  • start to integrate into TopController

Phil 1.11.21

Book – Not much, just jotting down notes

GPT Agents

  • Working on submitting

GOES

  • Working on script generator

MDS

  • Trying to find the right charge number
  • Made slide deck for todays meeting and presented overview and next steps

Phil 1.8.21

GOES

  • Work on script generator and reader

Book

  • Working on Hierarchies, Networks, and Technology. New technologies may have the same arc as writing and printing, which is initial hierarchy that produces influence networks that counter (to a degree), the more aggressive aspects of a dominance hierarchy
  • Meeting with Michelle

MDS

  • Discussion with Aaron about phase2
  • Wrote up thoughts and sent to Clay

Phil 1.7.21

https://twitter.com/andrewheiss/status/1347029129535889410

And just so we remember that the pandemic is not going well here. For comparison, the battle that took the most American lives was Antietam, where there were 3,675 fatalities if you count both sides.

Source: New York Times, 1.7.21

Need to look into replacing JetBrains

GOES

  • Slide deck for 2:00 meeting
  • 11:00 AI-ML meeting
  • 2:00 Sim discussion. We have until the end of March to come up with a compelling demo
  • More script generator. I need to write a method that searches through a Measurement list looking for the last value before a datetime
    • Need to map the names in the database to the desired name for the sim

MDS

  • Write up notes from meeting and distribute
  • Write up a couple of paragraphs for Clay

GPT Agents

  • More coding
  • ML group meeting

Book

  • More Hierarchies, Networks, and Technology

Phil 1.6.21

Georgia is looking promising! Maryland is trying to be more flexible in its vaccinations!

And then later in the afternoon, this happened:

Image
https://twitter.com/igorbobic/status/1346906369232920576

MDS

  • 10:00 meeting with Aaron and Peter
    • Create pipeline for data (what does it look like?) and FOM evaluation (input/output)
    • Who creates the sim and generates the data?
    • Who feeds that into the FOM?
    • Can all this run locally?
    • Write up notes
  • Meeting with Clay
    • Write up two paragraphs on phase 2 thoughts

GOES

  • Slide deck for tomorrow’s meeting
  • More work on script generator

GPT Agents

  • Created a local version of the IJCAI paper project. Need to fit the format and then create the Overleaf project and share with Antonio
  • Do some coding, dammit

JuryRoom

  • 5:00 Huri Whakatau meeting

Book

Phil 1.5.21

Voting in Georgia today. I am pessimistic but hopeful about the outcome

GPT Agents

  • I’m not sure if the meeting is today at 3:30 or Friday at 4:00?
    • It was today. Continuing on trying to figure out the best way to understand the behavior of the model. One of the interesting findings for today was that if the data isn’t in the dataset, then the model will start generating tokes at the meta wrapper.
  • More coding

Book

  • Working on what’s become Hierarchies, Networks, and Technology, and I think I’m now happy with where it’s going. It makes sense to use as the end of the chapter as well
  • Made a cool figure:
https://viztales.files.wordpress.com/2021/01/democracies_and_technology.png

GOES

  • The Lambda box was cancelled. Sigh
  • 11:00 Meeting with Vadim
  • I’m going to start on a script-reading capability for TopController. I think a JSON or XML file that contains the following elements:
    • Absolute or relative move
    • axis name
    • Target (HPR or XYZ)
    • Timestamp
    • Required accuracy
  • So a move could be a series of HPR coordinates that ‘play’. The first step is a MOVE command which includes the filename. The TopController opens the file (or fails and reports it), loads the move into memory and begins to step through it based on the timestamp. On reaching the end of the file and when the AngleController reports success/failure, the TopController reports DONE and is ready for the next MOVE
  • Downloading the yaw flip maneuver from influx:
https://viztales.files.wordpress.com/2021/01/image.png

Phil 1.4.21

Have to get my fingers used to typing a new date

Book:

  • Working on the section about displaying. I found Mike, the chimp that used the Kerosene cans. There’s apparently a paper as well, so I put in a request
  • Loading data about democracies from here (ourworldindata.org/democracy) into my db for better queries and charts. I want to look at recent changes in authoritarian systems as social technologies have changed in the last couple of decades

GOES

  • 11:00 Meeting with Vadim
  • More sparring with Biruh?

MDA

  • Need some kind of kickoff with the technical folks?

GPT Agents

Phil 12.30.20

Last work day of the year.

Still looking at COVID deaths. Here’s what’s going on in a sample of countries as of today

https://public.flourish.studio/visualisation/4504138/

And here are the worst performing states over the duration of the epidemic. Georgia continues to be a mess. Those states at the bottom are coming up fast…

https://public.flourish.studio/visualisation/4812886/

Book

  • Working on importing and transcribing the debate. Since the original won’t upload, I pulled the video into Adobe Premiere and cut off the head and tail, then exported as an AVI. We’ll see how that works. Nope – it’s ENOURMOUS! Trying other formats and getting progressively more annoyed. Aaaaand never got it to work. At least not today.
  • I did start editing the whole video down to just the displays

GPT Agents

  • Need to start coding, Going to talk to Stacey about that before I start.
  • Got some good advice and started.
  • As I’m coding, it looks like I’m making a nice set of tags for a training set. I wonder how small a set could be used to train something like BERT. Here’s an article:
  • interpreting GPT: the logit lens
    • Other work on interpreting transformer internals has focused mostly on what the attention is looking at. The logit lens focuses on what GPT “believes” after each step of processing, rather than how it updates that belief inside the step.

GOES

  • Sent a note to Biruh asking how the servers will handle interactive video. He said that I could keep the server at home. So he just hates workstations? Anyway, lots of back and forth. Not sure where it’s going.

Phil 12.29.20

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

  • Although pretrained language models can be fine-tuned to produce state-of-the-art results for a very wide range of language understanding tasks, the dynamics of this process are not well understood, especially in the low data regime. Why can we use relatively vanilla gradient descent algorithms (e.g., without strong regularization) to tune a model with hundreds of millions of parameters on datasets with only hundreds or thousands of labeled examples? In this paper, we argue that analyzing fine-tuning through the lens of intrinsic dimension provides us with empirical and theoretical intuitions to explain this remarkable phenomenon. We empirically show that common pre-trained models have a very low intrinsic dimension; in other words, there exists a low dimension reparameterization that is as effective for fine-tuning as the full parameter space. For example, by optimizing only 200 trainable parameters randomly projected back into the full space, we can tune a RoBERTa model to achieve 90\% of the full parameter performance levels on MRPC. Furthermore, we empirically show that pre-training implicitly minimizes intrinsic dimension and, perhaps surprisingly, larger models tend to have lower intrinsic dimension after a fixed number of pre-training updates, at least in part explaining their extreme effectiveness. Lastly, we connect intrinsic dimensionality with low dimensional task representations and compression based generalization bounds to provide intrinsic-dimension-based generalization bounds that are independent of the full parameter count.

GPT Agents

  • Working on getting the data out of the database in a useful way, so I learned how to create a view that combines multiple rows:
create or replace view combined as
select distinct t_1.root_id, t_1.experiment_id, t_1.probe as 'probe', DATE_FORMAT(t_1.content, "%M, %Y") as 'date', t_2.content as 'text'
from table_output as t_1
inner join table_output as t_2
on t_1.root_id = t_2.root_id and t_1.tag = 'date' and t_2.tag = 'trimmed';
  • What’s nice about this is that I can now order results by date which gives a better way of looking through the data
  • Imported the query output spreadsheet into NVivo and flailed with the importer a bit. I think I need to create a script that iterates over all the probes and creates a spreadsheet for each. It also needs to split off the probe from the content. Maybe remove the links as well? I’m conflicted about that because linking is an important thing. Maybe produce two files?

Book

  • Working on coding the Biden-Trump debate in NVivo. Had to buy a transcription license. Can’t upload the video???