Category Archives: Phil

Phil 11.16.2021

Registered for graduation! Done!

Set up physical appt

JuryRoom

  • Continuing to run Andreea’s probes
  • Responded to Jarod’s email on large transformer language models
  • Meeting with Andreea today

SBIRs

  • 9:15 standup – done
  • Working on adding connections between topics: TODO: add self.connect_topics(self.selected_topic, self.selected_seed_topic) # Verify this works!
    • Added connection_set:Set to MapTopic, but no implementation yet

GPT Agents

  • Send draft to Jimmy and Shimei – done

Book

  • Evaluate proposal and send in?

Phil 11.15.2021

SBIRs

  • Worked most of the weekend. I have a first pass on the methods and results section, some notes on discussion, and still need to start the conclusions. Also, fold the text into the white paper – DONE!
  • Meetings with Aaron?

GPT Agents

  • Got LIWC data from Shimei
  • Need to run probes for Andreea – running

Phil 11.11.2021

Armistice Day is commemorated every year on 11 November to mark the armistice signed between the Allies of World War I and Germany at Compiègne, France, at 5:45 am for the cessation of hostilities on the Western Front of World War I, which took effect at eleven in the morning—the “eleventh hour of the eleventh day of the eleventh month” of 1918.Wikipedia

Dynamics of online hate and misinformation

  • Online debates are often characterised by extreme polarisation and heated discussions among users. The presence of hate speech online is becoming increasingly problematic, making necessary the development of appropriate countermeasures. In this work, we perform hate speech detection on a corpus of more than one million comments on YouTube videos through a machine learning model, trained and fine-tuned on a large set of hand-annotated data. Our analysis shows that there is no evidence of the presence of “pure haters”, meant as active users posting exclusively hateful comments. Moreover, coherently with the echo chamber hypothesis, we find that users skewed towards one of the two categories of video channels (questionable, reliable) are more prone to use inappropriate, violent, or hateful language within their opponents’ community. Interestingly, users loyal to reliable sources use on average a more toxic language than their counterpart. Finally, we find that the overall toxicity of the discussion increases with its length, measured both in terms of the number of comments and time. Our results show that, coherently with Godwin’s law, online debates tend to degenerate towards increasingly toxic exchanges of views.

GPT Agents

  • Sent data off to Shimei to run through LIWC
  • Good meeting yesterday
  • Start writing some outline text

SBIRs

  • Write!
  • Try running the script text through the text matcher to see which nodes it goes to. That can be used in the results section. Also add a “recalculate” button to the text compare popup?

Phil 11.10.2021

I remember years ago, it must have been some time in the ’90’s, seeing billboards for ISPs on the 101 heading from SFX to San Francisco. This feels like that:

GPT Agents

  • Add KL-divergence and Total Variation Distance to analysis.
    • Need to normalize everything, then then compare the normalized versions in a new spreadsheet. Don’t forget the offset and scalar! (x*scalar + offset)

SBIR(s)

  • Start writing methods section

Book

  • See what else needs to be done for the Oxford proposal and send it off by the end of the week?

Phil 11.9.2021

https://twitter.com/MattGrossmann/status/1457789601980948480

In a related thread:

https://twitter.com/dannybarefoot/status/1457784428462153730

Mapping Affinities: Democratizing Data Visualization (book)

  • Nowadays, many of our actions are transformed into digital information, which we can use to draw diagrams that describe complex operations, such as those of institutions. This book introduces us to the reading of complex systems through the concept of affinity: the alchemy that brings people together and makes them creative and productive.
  • Affinity’s mapping is a data visualization method that allows us to observe the dynamics of an organization subdivided into complex systems: institutions, universities, governments, etc. It is a graphical tool based on the collaboration variable. Mapping Affinities is, according to the author, an instrument for deciphering complex organizations and improving them. By inserting individuals on these maps, it is also a way of helping them to understand how to evolve in life within an institution. The book tackles this problem with a case study concerning the Federal Polytechnic School of Lausanne. Data from the actions of researchers at the Lausanne institution are brought together and transformed into an innovative and attractive map
  • Stored as an epub in gdrive Books

GPT Agents

  • Generate spreadsheets from single stars – done
  • Add KL-divergence and Total Variation Distance to analysis.
    • Created query for ground truth by vegetarian options by stars and saved to spreadsheets
    • I need to normalize everything, then then compare the normalized versions in a new spreadsheet. Don’t forget the offset and scalar! (x*scalar + offset)

SBIR(s)

  • Stories for next sprint – done

Book

  • See what else needs to be done for the Oxford proposal and send it off by the end of the week?

Phil 10.8.2021

This looks like a great corpora to compare text characteristics of these two groups:

Book

  • Why we’re polarized review -done
  • Liars – done

GPT Agents

  • Generate spreadsheets from single stars
  • Add KL-divergence and Total Variation Distance to analysis

SBIRs

  • 9:00 Sprint demos. Slides! Done!
  • 10:00? 2:00? LAIC demo Done!
  • Write stories for next sprint (basically writing and tweaking?)

Phil 11.5.2021

GPT Agents

  • Creating the 4 and 5 star models currently. Done!
  • Run each through the “vegetarian” options. I’m really curious how LIWC will look at the outputs of the models with relation to each other, and to the ground truth. Also get the counts of the occurrences of each prompt in the GT by star rating. My guess is that it won’t show up in some of the cases, which sets up the Twitter section really well.
  • 4:15 Meeting

SBIRs

  • Fix duplicate entries in the DB topic file – done
  • Back up db – done
  • Create superclass that has most of the parts and then subclass the various implementations (Full, BuildView, ViewScript) – done
  • Work with Aaron on the stories/maps? Also, what is our plan for the paper? In process
  • Ping Antonio? Next week, after the demo

Book

  • Why we’re polarized review

Phil 11.4.2021

WordPress has some serious lag. Need to back this up

GPT Agents

  • Delayed meeting until Friday. That should give me time to get the balanced data working and compare baseline models to the baseline data (American)
  • And the balanced data still isn’t working. I think that there are more paths to good reviews, so the GPT, even when fed balanced data generates unbalanced results. Training up single star models to verify this
  • Also, write a first pass on the introduction that uses the vegetarian Yelp as an example, and then set up to explain the method
  • Do I still need to train x-star models?

SBIRs

  • 9:15 Standup
  • 11:00 LAIC
  • Get the script running for the current map to show at the meeting today – done!
Progress!

Book

  • Review The Revolt of The Public

Phil 11.3.2021

…clearly Biden was a net drag on McAuliffe. Overall, Virginians disapproved of Biden’s handling of the presidency by a 10-point margin, with nearly half saying they “strongly disapprove” — double the percentage who strongly approved. Nearly 3 in 10 Virginia voters said their vote was meant to express opposition to Biden, network exit polls found, compared to the 2 in 10 who said their vote was to express support for Biden. The economy was by far the most important issue driving Virginia voters, and people who put the economy at the top of their list favored Youngkin by a dozen percentage points. (Washington Post)

I just found this: https://github.com/google-research/tiny-differentiable-simulator
It appears to be a NN-enhanced physics sim: “TDS can run thousands of simulations in parallel on a single RTX 2080 CUDA GPU at 50 frames per second:
Here are the relevant papers:

  • “NeuralSim: Augmenting Differentiable Simulators with Neural Networks”, Eric Heiden, David Millard, Erwin Coumans, Yizhou Sheng, Gaurav S. Sukhatme. PDF on Arxiv
  • “Augmenting Differentiable Simulators with Neural Networks to Close the Sim2Real Gap”, RSS 2020 sim-to-real workshop, Eric Heiden, David Millard, Erwin Coumans, Gaurav Sukhatme. PDF on Arxiv and video
  • “Interactive Differentiable Simulation”, 2020, Eric Heiden, David Millard, Hejia Zhang, Gaurav S. Sukhatme. PDF on Arxiv

I also found this MIT thesis from 2019: Augmenting physics simulators with neural networks for model learning and control

GPT Agents

  • Finished training the balanced model and am re-running the original prompts
  • A really negative prompt will produce a low review distribution. Here’s an example of GPT generating reviews in response to a slightly negative set of prompts ([there are absolutely no vegetarian options], [there is not a single vegetarian option on the menu], [the menu has no vegetarian options]), compared with the ground truth of the Yelp database returning reviews and ratings that match the string ‘%no vegetarian options%‘:
Average star ratings
  • The distribution of star ratings is obviously different too:
  • As you can see on the right, the ground truth is distinctly different. The correlation coefficient between the two distributions on the right is -0.4, while it’s well above 0.9 when comparing any of the three distributions to the left.
  • So it’s clear that the model has a bias towards positive reviews. In fact, if you look at the baseline distribution from the first 1,000 reviews of restaurants in the ‘American’ category, we can see the underlying distribution that the model was trained on:
Star bias in the data
  • The new question to answer is what happens to the responses when the training data is balanced for stars? Also, I realize that I need to run a pass through the models with just a ‘review:‘ prompt.
  • Dammit, the ‘balanced’ training corpora isn’t. Need to fix that and re-train
Bad data
  • 4:15 Meeting

SBIRs

  • MDA costing meeting
  • Work on building first pass map. It’s actually working pretty well! Need to write an example script for tomorrow
  • Need to create some views

Phil 11.2.2021

GPT Agents

  • Create balanced (20k each) star corpora and train – done
  • Create low star corpora and train (1, 2, 3?)
  • Installed sentence-transformers, which probably broke sentiment.

SBIRs

  • Integrate TextComparePopup and try making a map. I’m pretty sure that there will be issues about putting topics into groups and listing topics from different groups – done, and seems to be working well. Tomorrow we try for real?
  • 9:15 standup
  • Finished the Great Timesheet Update! Hopefully

Phil 11.1.2021

Chase Dispute team 9:00 – 9:00 1 888 489 8452

Just Landscaping: (443) 251-2188

BB Infinite: 866.865.3335

Societies change their minds faster than people do

GPT Agents

  • Spreadsheets for vegetarian 100k GT vs GT vs synth. Everything is good except for ‘no vegetarian options’ It’s the only options that does not appear in the first 100k rows. Going to try some longer prompts to see if I can nudge the model in a better direction. Do that at lunch
  • Hmmm. I can’t seem to produce a negative star distribution:
  • Meeting with Andreea? Yup. Good chat

SBIRs

  • Integrate text similarity into popup widget

Phil 10.31.2021

GPT Agents

  • Extracting the vegetarian synthetic reviews so I can see if they worked. It actually looks pretty good!
  • Going to run a range of prompts:
probe_list = ["no vegetarian options", "some vegetarian options", "several vegetarian options", "many vegetarian options"]
  • Building the db table/store for the NZ tweets – done. After screwing up the insert arguments, I’m running a 1k set of synthetic tweets to evaluate the test field.

Phil 10.29.2021

As a J&J recipient, today is booster day! I will become a cocktail of J&J/Moderna antibodies at 10:00. Hopefully my wifi reception will improve dramatically with these new chips.

  • Leave NLT 9:15

GPT Agents

  • Run some other variations, or show that the usage of “several vegetarian options” is normal speech. The phrase ‘%vegetarian options%’ has 79 reviews in the training set and 4,911 in the holdout set. Going to do a quick boxplot to see if there’s much difference.
  • Pretty much what I expected – very hard to evaluate any difference:
Probably not a good example
  • Running ‘some vegetarian options’ and ‘no vegetarian’ on the 100k American model. It turns out that I ran the default ‘review:’ probe yesterday

SBIRs

  • Getting the separate behavior for the cmdr and sbrd nodes to work. Rather than having them bounce around, I need to have them head to targets at a speed. That means rewriting bits of Moveable node. Done!
Moving to nodes according to script

Phil 10.28.2021

SBIRs

  • Added a boxplot to the TextSimilarity framework. Might show at the meeting today?
  • Get the script tied into node display. Having some issues. Thinking about having an animated commander and subordinate node moving across the map
  • LAIC meeting
  • Meeting with Aaron about TextSimilarity

GPT Agents

  • Try to get the query and probe for “%several vegetarian options%” running. This data is used in the “Why use this technique?” section of the introduction.
From the database
  • Run American 100k model with the prompt “several vegetarian options” and pull into spreadsheet – done. It looks good, too:
Comparison of bootstrap samples
  • Add more content to the paper container

Book

  • Add review – done!