Category Archives: Users

Phil 9.29.20

Behind Twitter’s biased AI cropping and how to fix it.

In Isolating Times, Can Robo-Pets Provide Comfort?

COVID

  • Updated my DaysToZero code to put charts into the spreadsheet, which was pretty straightforward using xlsxwriter. There do seem to be three basic patterns:
  • The first is steady growth:
  • So no curve flattening here. The disease is moving pretty steadily through the population. The USA is mostly on this track as are countries like India and Chile. I think the difference that we see is related to a first, faster wave among the more vulnerable populations.
  • The second pattern is the ‘flattened curve’. Ireland shows this really well, as does New York state:
  • The last pattern is the ‘second wave’ pattern. Japan seems to be having one now:
  • So it looks like we are far from out of the woods on this, and letting your guard down is soundly punished.

#COVID

  • Start the db to fixing “elderman” posts. Running. I’ve got about 100k bad posts. The fixes seem to be taking care of some. This is going to require multiple passes

GOES

  • 2:00 Meeting with Vadim

GPT-2 Agents

  • Updated the document. Antonio’s going to submit. Fingers crossed! It would be nice to go to London in May
  • 3:30 meeting?
  • Need to backup and save the db. Done. It’s almost 11GB! Compressing. Backed up the compressed version, which is (only!) 3GB.

Phil 9.28.20

GPT-2 Agents

  • Working on paper – done for now, off to Antonio
  • The data extraction finished! There are 9,375,587 ingested tweets
  • Need to backup and store the db

GOES

  • GVSETS rehearsal at 3:30 – recorded!
  • 2:00 Meeting with Vadim

NESDIS

  • Finish paperwork

Book

  • Finish transcription

Phil 9.25.20

Call Competitive Cyclist about this

Book

  • Adding to these thoughts about using three stories to frame nomad, flock, and stampede behaviors.
  • Listening to BBC Business Daily on London’s dirty financial secrets. In the episode, Tom Burgis, author of a new book Kleptopia: How Dirty Money is Conquering the World, discusses money laundering. It’s making me think about how though money is a dimension reduction process, it’s a very peculiar one. The ability to simplify transactions and “store” the profits means that power accumulates with money. That money lets the owner of the money pay people to add dimensions in other areas. This can be good, as with scientific research, or it can be bad, as with the creation of the byzantine dimensions of money laundering. In the middle somewhere are activities like high-speed trading.
  • The thing is, any increase in the ability to create social realities that exist independently of the environmental reality creates the conditions for stampedes. A lot of crime (Scams, cons, embezzlement) depends on the creation of a social reality that overwhelms trustworthy information coming in through other channels.
  • More transcription

GPT-2 Agents

  • At 7.5M tweets so far
  • Got Antonio’s comments back. Need to roll them in

GOES

  • More rotations. I think we found the problems. The first is that the angles being fed to the absolute angle calculations were wrong. The second was that the sign of the pitch and yaw vectors flip near 180 degrees and we were not compensating for that.
  • Need to try a runthrough for the GVSETS slides. Too tired. Maybe over the weekend.

NESDIS

  • Gobs of paperwork

Phil 9.24.20

Fireplace (410) 203‐2876 Invoice No. 1837 Issued on Tue Jan 21, 2020

Book

GPT-2 Agents

  • About 6.5M tweets processed so far
  • 3:30 Meeting

#COVID

  • Still reading in labeled tweets – done!
  • Need to fix “elderman” translations, though there are some other bad/partial translations as well

GOES

  • Create logger for DDict – done!
  • Start rwheel coding for incremental rotations – started
  • 2:00 Meeting and demo. It went well, I think. We can run 100x speedup now!
  • Vadim has this thought: Was just thinking that at the next demo meeting, we should mention that this is adaptable to many different situations, including simulating the Launch Orbit Raising scenarios, specifically for the upcoming GOES-T launch. Since it’s a physics sim and we can make the various pieces move, such as deploying the solar panels.

JuryRoom

  • 5:30 Meeting

Phil 9.23.20

Fireplace (410) 203‐2876 Invoice No. 1837 Issued on Tue Jan 21, 2020

MD Food bank

Book

#COVID

  • Adding in the labeled values to the Arabic data
  • Fix “elderman” translations

GOES

  • 10:00 Meeting with Vadim. Worked through bugs in the angle adjustment code and added clamping. Still have problems at 120 degrees. Need to see if this is a blocker for the demo
    • Roll freezes at exactly 90 degrees. WTaF? Need to extend the DDict so that it can dump a column-format csv file that is incrementally updated
    • Pitch works great. It’s neat to see how all the rwheels line up for that maneuver
  • 12:00 All hands
  • 2:00 Status meeting

Phil 9.22.29

https://public.flourish.studio/visualisation/3603910/

Today the USA passed 200,000 dead from COVID-19. That triggered a memory from the earlier days of the pandemic. Italy handled the virus poorly, and I remember thinking that will be the standard that the USA will be judged. It should have been relatively easy to do better than Italy.

In the chart above, I scaled the deaths for selected countries so that they could be compared to the US directly. As you can see, we are now worse than any country in the EU, and staggeringly worse than South Korea.

Not sure what to do about that other than just be angry.

Book

  • Transcribing
  • Reading more on the money book, and coming to the conclusion that money provides consistent mathematical rules for transactions that promote coordination

GOES

  • 10:00 Meeting with Vadim. Walked through a lot and fixed a few things. There seems to be some problem related to 120 degrees, which is the angular spacing of the reaction wheels. I think that we are hitting a singularity. Working on a short term fix, though I think the long term is to simply use the reference frame code I’m working on
  • Speaking of which, I need to have a case for handling overlapping vectors (angle = 0), and not using an angle that is too small, and using the last angle if possible. If no angles are available, then wait until the next time and get larger angles? Done!

#COVID

  • Processing labeled tweets

Phil 9.21.20

Welp, I missed my 3 day weekend getaway. I thought it was next week. Perfect weather too. Stupid.

Book

  • Continued transcribing

GOES

  • Going to try and see which vector sweeps out the largest arc and use the cross product for the vector to rotate all the objects in the frame.
  • That worked!
  • It’s a little rough occasionally, but it tracks well

GPT-2 Agents. Somewhere north of 4M tweets ingested

Phil 9.18.20

Get deposit!

Opened a ticket for VPN access issues

Book

  • Scanned notes
  • Playing interview into Google Docs, where it is doing a reasonable job of transcription!
  • Meeting with Michelle. Went over the text a bit and decided to skip next week

GPT-2 Agents

  • Progress as of this morning:

GOES

  • Continue on incremental rotations. Here’s the reference and vehicle both using absolute angles
  • Now, here’s the reference using incremental and the vehicle using absolute
  • This is why we need to calculate the plane of rotation each time of we’re using incremental rotations.
  • Thinking about this some more. I can check to see which vector sweeps out the largest arc and use the cross product for the vector to rotate all the objects in the frame.
  • 10:00 meeting with Vadim. There’s an issue where the calculated angles loop from +180 to -180. Here’s the code I use to catch that. It’s dumb, but I can’t find better:
def closest_angle(a1, a2) -> float:
diff = a1 - a2
if diff < 0:
while abs(diff) < 180:
diff += 360
return diff
while abs(diff) > 180:
diff -= 360
print("{} + {} = {}".format(a1, a2, diff))
return diff

Phil 9.17.20

GPT-2 Agents

  • Almost 2,000,000 tweets ingested as of this morning

#COVID

  • Work on fixing all the text broken by “@xxxxx”
  • Ingest the rest of the labeled data

GOES

  • 10:00 Meeting with Vadim

Book:

  • 2:00 Meeting with Ted Hiebert, hopefully. Got the questions written and printed out. Tested the voice-to-text in Docs, which is extremely good.
  • Interview went well

ML group Meeting at 3:30

JuryRoom Meeting at 5:30

Phil 9.16.20

Finishing a review for Shimei. Done!

IRAD

  • Make a smaller slide deck – done

GOES

  • 10:00 Meeting with Vadim We’ve still got this oscellation problem. Going to try and use simplified rotations to see if that exposes the problem

#COVID

  • 3:00 Meeting – went over translation progress. Need to write a regex to remove “@xxxxxx “

6:30 Dinner for Gary

Phil 9.15.20

#COVID

  • Translate CSV file and insert into db
  • My desktop box is so slammed that it can’t do anything else, moving this over to the laptop
  • Moved project to svn because github won’t let you store large files like models and databases
  • Getting the dev environment set up. Lost pip somehow – reinstalled
  • Got the model localized. Here’s the code to do it (from here):
from transformers.file_utils import hf_bucket_url, cached_path

src = 'ar'  # source language
trg = 'en'  # target language
pretrained_model_name = f'Helsinki-NLP/opus-mt-{src}-{trg}'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)

print(resolved_archive_file)
  • Added a “label” column, and inserted the labels and translations
  • Need to back up the db
  • Sent the csv with translations out to the group

GPT-2 Agents

  • Postponed meeting
  • At over 700k ingested Tweets

GOES

  • 2:00 Meeting with Vadim

Book

  • Set up interview – done! Set up for 2:00 this Thursday

Phil 9.14.20

GPT-2 Agents

  • Added Antonio’s suggestions and uploaded the paper to Overleaf. Looks like about one more page to play with
  • Working on Twitter parser – done-ish?
  • Wrapped the parser with a glob object and started the full run. 10,000 processed tweets and climbing!

Marketing:

  • DATC Working group meeting and presentation.
  • Presentation was technical as specified, and sailed over the heads of everyone, I think

GOES

  • Postponed the meeting with Vadim because I thought the workshop ran all day

Phil 9.11.20

I would love to see a timeline of all the things that we’ve done in response to 9-11 and how they’ve worked out

GPT-2 Agents

  • Parse timestamps – done
  • Hand-annotate the schema?
    • datetime – done!
    • arrayobject
  • Start running analysis? Wrote a tweet to the db. Calling it a day
  • 2:30 Meeting with Shimei and Sim – started a google doc for tasking

GOES

  • 10:00 Meeting with Vadim. I think the goal should be to tune the rwheels so that the yaw flip curves start to look more realistic, and then see how it works at speed. One of the issues that we need to think about is the role of mass in high-timestep physics. Would lower mass make better behavior?

Book

  • 2:00 Meeting with Michelle – tweaked the dimension reduction section
  • Write email to Theodore Hiebert – done
  • Looked a bit into voice recording and transcription. The best answer still seems to be Google? Here are other options

Phil 9.10.20

Didn’t get a chance to write a post yesterday, so I’ll just include yesterday’s progress.

Latent graph neural networks: Manifold learning 2.0?

  • Graph neural networks exploit relational inductive biases for data that come in the form of a graph. However, in many cases we do not have the graph readily available. Can graph deep learning still be applied in this case? In this post, I draw parallels between recent works on latent graph learning and older techniques of manifold learning.

Transformers are Graph Neural Networks

  • Through this post, I want to establish links between Graph Neural Networks (GNNs) and Transformers. I’ll talk about the intuitions behind model architectures in the NLP and GNN communities, make connections using equations and figures, and discuss how we could work together to drive progress.

GPT-2 Agents

  • Made good progress on the table builder. I have a MySql implementation that’s pretty much done
  • Made a pitch for IRAD funding
  • Working on the tweet parsing
    • I have an issue with the user table. Tweets have a many-to-one relationship with user, so it’s a special case.
    • Added “object” and “array” to the cooked dictionary so that I can figure out what to do
    • Got the main pieces working, but the arrays can contain objects and the schema generator doesn’t handle that.
    • I think I’m going to add DATETIME processing for now and call it a day. I can start ingesting over the weekend

#COVID

  • Didn’t make the progress I needed to on translating the text, so I asked for a week extension
  • Downloaded the CSV file. Looks the same as the other formats with a “Label” addition. Should be straightforward

GOES

  • Looks like Vadim fixed the transforms, so I’m off the hook
  • Registered for M&S Affinity Group. Looks like I’ll be speaking at 12:20 on Monday
  • 10:00 Meeting with Vadim
  • Updated the DataDictionary to sys.exit(-1) on a name redefinition
  • 11:00 Slides with T

Book

  • Write letter

ML-seminar (3:30 – 5:30)

  • My nomination for Adjunct Assistant Research Professor has been approved! Now I need to wait for the chain of approvals

JuryRoom (5:30 – 7:00)

  • Alex was the only one on. We discussed HTML, CSS, and LaTeX

Phil 9.8.20

Done with vacation. Which was very wet. The long weekend was a nice consolation prize.

Hmmm. Subversion repo can’t be reached. Give it an hour and try again at 8:00, otherwise open a ticket. Turned out to be the VPN. Switching locations fixed it.

Sturgis COVID outcomes:

  • “using data from the Centers for Disease Control and Prevention (CDC) and a synthetic control approach, we show that … following the Sturgis event, counties that contributed the highest inflows of rally attendees experienced a 7.0 to 12.5 percent increase in COVID-19 cases relative to counties that did not contribute inflows. … We conclude that the Sturgis Motorcycle Rally generated public health costs of approximately $12.2 billion.”
  • In this paper, we show that the performance of a learnt generative model is closely related to the model’s ability to accurately represent the inferred \textbf{latent data distribution}, i.e. its topology and structural properties. We propose LaDDer to achieve accurate modelling of the latent data distribution in a variational autoencoder framework and to facilitate better representation learning. The central idea of LaDDer is a meta-embedding concept, which uses multiple VAE models to learn an embedding of the embeddings, forming a ladder of encodings. We use a non-parametric mixture as the hyper prior for the innermost VAE and learn all the parameters in a unified variational framework. From extensive experiments, we show that our LaDDer model is able to accurately estimate complex latent distribution and results in improvement in the representation quality. We also propose a novel latent space interpolation method that utilises the derived data distribution.

An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web

  • Although a great deal of attention has been paid to how conspiracy theories circulate on social media, and the deleterious effect that they, and their factual counterpart conspiracies, have on political institutions, there has been little computational work done on describing their narrative structures. Predicating our work on narrative theory, we present an automated pipeline for the discovery and description of the generative narrative frameworks of conspiracy theories that circulate on social media, and actual conspiracies reported in the news media. We base this work on two separate comprehensive repositories of blog posts and news articles describing the well-known conspiracy theory Pizzagate from 2016, and the New Jersey political conspiracy Bridgegate from 2013. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative machine learning model where nodes represent actors/actants, and multi-edges and self-loops among nodes capture context-specific relationships. Posts and news items are viewed as samples of subgraphs of the hidden narrative framework network. The problem of reconstructing the underlying narrative structure is then posed as a latent model estimation problem. To derive the narrative frameworks in our target corpora, we automatically extract and aggregate the actants (people, places, objects) and their relationships from the posts and articles. We capture context specific actants and interactant relationships by developing a system of supernodes and subnodes. We use these to construct an actant-relationship network, which constitutes the underlying generative narrative framework for each of the corpora. We show how the Pizzagate framework relies on the conspiracy theorists’ interpretation of “hidden knowledge” to link otherwise unlinked domains of human interaction, and hypothesize that this multi-domain focus is an important feature of conspiracy theories. We contrast this to the single domain focus of an actual conspiracy. While Pizzagate relies on the alignment of multiple domains, Bridgegate remains firmly rooted in the single domain of New Jersey politics. We hypothesize that the narrative framework of a conspiracy theory might stabilize quickly in contrast to the narrative framework of an actual conspiracy, which might develop more slowly as revelations come to light. By highlighting the structural differences between the two narrative frameworks, our approach could be used by private and public analysts to help distinguish between conspiracy theories and conspiracies.

#COVID

  • Translate the annotated tweets and use them as a base for finding similar ones in the DB

GPT-2 Agents

  • Working on the schemas now
  • It looks like it may be possible to generate a full table representation using two packages. The first is genson, which you use to generate the schema. Then that schema is used by jsonschema2db, which should produce the tables (This only works for postgres, so I’ll have to install that). The last step is to insert the data, which is also handled by jsonschema2db
  • Schema generation worked like a charm
  • Got Postgres hooked up to IntelliJ, and working in Python too. It looks like jsonschema2db requires psycopg2-2.7.2, so I need to be careful
  • Created a table, put some data in it, and got the data out with Python. Basically, I copied the MySqlInterface over to PostgresInterface, made a few small changes and everything appears to be working.
  • Created the table, but it’s pretty bad. I think I* need to write a recursive dict reader that either creates tables or inserts linked data into a table
  • All tables are created from objects, and have a row_id that is the key value
    • disregard the “$schema” field
    • Any item that is not an object is an field in the table
    • Any item that is an object in the table that gets an additional parent_row_id that points to the immediate parent’s row_id

GOES

  • 2:00 meeting with Vadim