Category Archives: thesis

Phil 5.7.20

D20

  • Everything is silent again.

GPT-2 Agents

  • Continuing with PGNtoEnglish
    • Building out move text
    • Changing board to a dataframe, since I can display it as a table in pyplot – done!

chessboard

  • Here’s the code for making the chesstable table in pyplot:
    import pandas as pd
    import matplotlib.pyplot as plt
    
    class Chessboard():
        board:pd.DataFrame
        rows:List
        cols:List
    
        def __init__(self):
            self.reset()
    
        def reset(self):
            self.cols = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
            self.rows = [8, 7, 6, 5, 4, 3, 2, 1]
            self.board = df = pd.DataFrame(columns=self.cols, index=self.rows)
            for number in self.rows:
                for letter in self.cols:
                    df.at[number, letter] = pieces.NONE.value
    
            self.populate_board()
            self.print_board()
    
        def populate_board(self):
            self.board.at[1, 'a'] = pieces.WHITE_ROOK.value
            self.board.at[1, 'h'] = pieces.WHITE_ROOK.value
            self.board.at[1, 'b'] = pieces.WHITE_KNIGHT.value
            self.board.at[1, 'g'] = pieces.WHITE_KNIGHT.value
            self.board.at[1, 'c'] = pieces.WHITE_BISHOP.value
            self.board.at[1, 'f'] = pieces.WHITE_BISHOP.value
            self.board.at[1, 'd'] = pieces.WHITE_QUEEN.value
            self.board.at[1, 'e'] = pieces.WHITE_KING.value
    
            self.board.at[8, 'a'] = pieces.BLACK_ROOK.value
            self.board.at[8, 'h'] = pieces.BLACK_ROOK.value
            self.board.at[8, 'b'] = pieces.BLACK_KNIGHT.value
            self.board.at[8, 'g'] = pieces.BLACK_KNIGHT.value
            self.board.at[8, 'c'] = pieces.BLACK_BISHOP.value
            self.board.at[8, 'f'] = pieces.BLACK_BISHOP.value
            self.board.at[8, 'd'] = pieces.BLACK_KING.value
            self.board.at[8, 'e'] = pieces.BLACK_QUEEN.value
    
            for letter in self.cols:
                self.board.at[2, letter] = pieces.WHITE_PAWN.value
                self.board.at[7, letter] = pieces.BLACK_PAWN.value
    
        def print_board(self):
            fig, ax = plt.subplots()
    
            # hide axes
            fig.patch.set_visible(False)
            ax.axis('off')
            ax.axis('tight')
    
            ax.table(cellText=self.board.values, colLabels=self.cols, rowLabels=self.rows, loc='center')
    
            fig.tight_layout()
    
            plt.show()

GOES

  • Continuing with the MLP sequence-to-sequence NN
  • Writing
  • Reading
    • Hmm. Just realized that the input vector being defined by the query is a bit problematic. I think I need to define the input vector size and then ensure that the query creates sufficient points. Fixed. It now stores the model with the specified input vector size:

model_name

  • And here’s the loaded model in newly-retrieved data:
  • Here’s the model learning two waveforms. Went from 400×2 neurons to 3200×2:
  • Combining with GAN
    • Subtract the sin from the noisy_sin to get the moise and train on that
  • Start writing paper? What are other venues beyond GVSETS?
  • 2:00 status meeting

JuryRoom

  • 3:30 Meeting
  • 6:00 Meeting

Phil 5.6.20

#COVID

  • I looked at the COVID-19-TweetIDs GitHub project, and it is in fact lists of ids:
    1219755883690774529
    1219755875407224832
    1219755707001659393
    1219755610494861312
    1219755586272813057
    1219755378428338181
    1219755293397012480
    1219755288988798981
    1219755197645279233
    1219755157438828545
  • These can work by appending that number to the string “twitter.com/anyuser/status/”, like this: twitter.com/anyuser/status/1219755883690774529
  • The way to get the text in Python appears to be tweepy. This snippet from stackoverflow appears to show how to do it, but I haven’t verified yet.
    import tweepy
    consumer_key = xxxx
    consumer_secret = xxxx
    access_token = xxxx
    access_token_secret = xxxx
    
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    
    api = tweepy.API(auth)
    
    tweets = api.statuses_lookup(id_list) # id_list is the list of tweet ids
    tweet_txt = []
    for i in tweets:
        tweet_txt.append(i.text)

     

GPT-2 Agents

  • Continuing with PGNtoEnglish
    • Figuring out how to parse the moves text, using the wonderful regex101 site
  • 4:30 meeting
    • We set up an Overleaf project with the goal to submit to the Harvard/Kennedy Misinformation Review
    • We talked about the GPT-2 as a way of clustering tweets. Going to try finetuning with some Arabic novels first to see if it can work in that language

GOES

  • Continuing with the MLP sequence-to-sequence NN
    • Getting the data to fit into nice, rectangular arrays, which is no straightforward, since the time window of the query can return a varying number of results. So I have to run the query, then trim the arrays down so that they are all the length of the shortest. Here’s the results:
  • I’ve got the training and prediction working pretty well. Stopping for the day
  • Tomorrow I’ll get the models to write out and read in
  • 2:00 status meeting
    • Two weeks to getting the sim running?

Phil 5.5.20

D20Cubic

  • Just goes to show that you shouldn’t take regression fits as correct

GPT-2 Agents

  • More PGNtoEnglish
  • Discovered typing.TextIO. I love typing to death 🙂
  • Finished parsing meta information

#COVID

GOES

  • Progress meeting with Vadim and Isaac
  • Train and save a 2-layer, 400 neuron MLP. No ensembles for now
  • Set up GAN to add noise

 

Phil 5.4.20

It is a Chopin sort of morning

D20

  • Zach got maps and lists working over the weekend. Still a lot more to do though
  • Need to revisit the math to work over the past days

GPT-2 Agents

  • Working on PGN to English.
    • Added game class that contains all the information for a game and reads it in. Games are created and managed by the PGNtoEnglish class
  • Rebased the transformers project. It updates fast

GOES

  • Figure out how to save and load models. I’m really not sure what to save, since you need access to the latent space and the discriminator? So far, it’s:
    def save_models(self, directory:str, prefix:str):
        p = os.getcwd()
        os.chdir(directory)
        self.d_model.save("{}_discriminator.tf}".format(prefix))
        self.g_model.save("{}_generator.tf}".format(prefix))
        self.gan_model.save("{}_GAN.tf}".format(prefix))
        os.chdir(p)
    
    def load_models(self, directory:str, prefix:str):
        p = os.getcwd()
        os.chdir(directory)
        self.d_model = tf.keras.models.load_model("{}_discriminator.tf}".format(prefix))
        self.g_model = tf.keras.models.load_model("{}_generator.tf}".format(prefix))
        self.gan_model = tf.keras.models.load_model("{}_GAN.tf}".format(prefix))
        os.chdir(p)
    • Here’s the initial run. Very nice for 10,000 epochs!

acc_lossGAN_inputsGAN_trained

    • And here’s the results from the loaded model:

GAN_trained

    • The discriminator works as well:
      real accuracy = 100.00%, fake accuracy = 100.00%
      real loss = 0.0154, fake loss = 0.0947%
    • An odd thing is that I can save the GAN model, but can’t load it?
      ValueError: An empty Model cannot be used as a Layer.

      I can rebuild it from the loaded generator and discriminator models though

  • Set up MLP to convert low-fidelity sin waves to high-fidelity
    • Get the training and test data from InfluxDB
      • input is square, output is sin, and the GAN should be noisy_sin minus sin. Randomly move the sample through the domain
    • Got the queries working:
    • Train and save a 2-layer, 400 neuron MLP. No ensembles for now
  • Set up GAN to add noise

Fika

  • Ask question about what the ACM and CHI are doing, beyond providing publication venues, to fight misinformation that lets millions of people find fabricated evidence that supports dangerous behavior.
  • Effects of Credibility Indicators on Social Media News Sharing Intent
    • In recent years, social media services have been leveraged to spread fake news stories. Helping people spot fake stories by marking them with credibility indicators could dissuade them from sharing such stories, thus reducing their amplification. We carried out an online study (N = 1,512) to explore the impact of four types of credibility indicators on people’s intent to share news headlines with their friends on social media. We confirmed that credibility indicators can indeed decrease the propensity to share fake news. However, the impact of the indicators varied, with fact checking services being the most effective. We further found notable differences in responses to the indicators based on demographic and personal characteristics and social media usage frequency. Our findings have important implications for curbing the spread of misinformation via social media platforms.

Phil 4.5.20

The initial version of DaysToZero is up! Working on adding states now

dtz_launch

Got USA data working. New York looks very bad:

New_York_4_5_2020

Evaluating the fake news problem at the scale of the information ecosystem

  • “Fake news,” broadly defined as false or misleading information masquerading as legitimate news, is frequently asserted to be pervasive online with serious consequences for democracy. Using a unique multimode dataset that comprises a nationally representative sample of mobile, desktop, and television consumption, we refute this conventional wisdom on three levels. First, news consumption of any sort is heavily outweighed by other forms of media consumption, comprising at most 14.2% of Americans’ daily media diets. Second, to the extent that Americans do consume news, it is overwhelmingly from television, which accounts for roughly five times as much as news consumption as online. Third, fake news comprises only 0.15% of Americans’ daily media diet. Our results suggest that the origins of public misinformedness and polarization are more likely to lie in the content of ordinary news or the avoidance of news altogether as they are in overt fakery.

Phil 3.27.20

Working with Zach and Aaron on the app. I think we’ll have something by this weekend

  • Added a starting zero on the regression
  • Added the regression to the json file, and posted to see if Zach can reach
  • Set up the hooks for export to excel workbook, with one tab per active country. I’ll work on that later today – done! countries

Got clarification from Wayne on some edits. Going to turn those around this morning and try to submit before COB today. Maryland is at 580 confirmed cases as of yesterday. I’d expect to see nearly 800 when they update the site this morning. Sent over all the edits. It’s in!

Maryland_3.26_2020

Yup

Maryland_3.27_2020

ProQuest submission site.

Phil 3.25.20

Waking up to the news these days makes me want to stay in bed with the radio off

Working on automating the process of downloading the spreadsheet, parsing out the countries, and calculating daily rates. The goal is to have a website up this weekend so you can see how your country is doing.

Tasks

  • Set up converter class – done
  • download spreadsheet – done
  • parse out countries – working on it
  • Made mockups of the mobile and webpage displays, and refined a few times based on comments

Got notes for Chapter 11 from Wayne. Switching gears and rolling that in. Put in changes for all the items I could read. There are a few still outstanding. I’ll submit tonight if Wayne doesn’t come back for a discussion.

Back to Docker. Need to connect to the WLS. Done!

Meetings

  • AIMS – status for all, plus technichal glitches. We’ll try Teams next time. Vadim has made GREAT progress. We might be able to get a real Yaw Flip soon as well
  • A2P – Infor demo. Meh.

Stampede theory proposal deadline was delayed a couple of days

Phil 3.19.20

I found the data sources for the dashboard in the previous few posts. Yes, everything still looks grim:

So rather than working on my dissertation, I thought I’d take a look at the data for the last 9(!) days in Excel:

This is for the USA. The data is sorted based on the cumulative total of new cases confirmed. If you look at the chart on the right, everything is in line with a pandemic in exponential growth. However, that’s not the whole story.

I like to color code the cells in my spreadsheets because colors help me visualize patterns in the data that I wouldn’t otherwise see. And one of the things that really stands out here is the red rows with one yellow cell on the left. These are all cases where the rate of confirmed new cases dropped to zero overnight. And they’re not near each other. They are in WA, NY, and CA. Is this a measuring problem or is something going right in these places?

Maybe we’ll find out more in the next few days. Now that I know how to get the data, I can do some of my own visualizations that look for outliers. I can also train up some sequence-to-sequence ML models to extrapolate trends.

One more thing. I had heard earlier (Twitter, I think?) that Vietnam was handling the crisis well. And it looks like it was, but things are back to being bad:

Ok, back to work

8:00 – 4:30 ASRC PhD, GOES

  • Working on the process section – done!
  • Working on the TACJ bookend – done! Made a new figure:
  • Submitted to Wayne. Here’s hoping it doesn’t fall through the cracks
  • Neuroevolution of Self-Interpretable Agents
    • Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task-critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.

Phil 3.18.20

7:00 – 5:00 ASRC GOES

Today’s dashboard snapshot (more data here). My thoughts today are about supression and containment, which are laid out in the UK’s Imperial College COVID-19 report. The TL;DR is that suppression is the only strategy that doesn’t overwhelm healthcare. Suppression is fever clinics, contact tracing, and enforced isolation, away from all others (in China, this was special isolation clinics/dorms). This has clearly worked in China (and a town in Italy), though Hong Kong and Singapore seem to be succeeding in different (more cultural?) ways. The thing that strikes me is that suppression is just putting a lid on things. The moment the lid comes off, then infections start up again? I guess we’ll see over the next few months in China.

There appear to be vaccines in (human already!) testing. Normally, there is an extensive evaluation process to see if the treatment is dangerous, but that was sidestepped during the AIDS crisis (the parallel track policy). I wonder if at risk populations (People older than 70?), will allowed to use less-tested drugs. My guess is yes, probably within a month.

  • Finished all the dissertation revisions and made a document that contains only those revisions. Need to make a change tableand then send (full and revisions only) to Wayne today.
    • Whoops! No I didn’t. After putting together the change table, I realize there are still a few things to do. Dammit!
  • Update SDaaS paper as per John’s edits
  • Phone call with Darren at 2:00
    • Start a google doc that has all the parts of a proposal, plus a good introduction.
    • Also the idea of sims came up again as ways to define, explain, train ML, and test a problem/solutions
  • AIMS meeting at 3:00

Phil 3.17.20

7:00 – ASRC PhD/GOES

Today’s view of the dashboard. Looking at the numbers, it’s pretty clear that China has things under control, which means that we can get an idea of what it will look like in the US on the other side. The symptomatic population was (3,111 deaths + 55,987 recovered) = 59,098. That means that the mortality rate for that (infected? symptomatic?) population (59,098/3,111) is 5.26%. The median age in China is 38.4 years. Interestingly, that’s about the same as the USA.

So, if you know 20 people who come down with symptoms, it looks like one probably won’t make it? The CDC says that between 160 million and 214 million people in the United States could be infected over the course of the epidemic. So that works out to 8.5M – 11.2M fatalities? That seems really high. For a comparison, cancer and heart disease kill roughly 1.2M/year in the US.

In a fit of unbridled optimism, I’m booking vacation flights for September – done! Got to use my cancelled TF Dev tix

  • Ok, back to finishing the dissertation. Boy, it is hard to concentrate.
    • Conclusions are done
    • Working on tying things back to the literature

Phil 3.16.20

7:00 – 5:00 ASRC PhD/GOES

  • Working from home for the duration of the COVID-19 pandemic. It’s estimated that we are approximately 10 days behind Italy, So I’m hoping that when things start to get better there, it will be a head’s up that things might start to get better here.

(Via Corriere delle Sera)

  • Needless to say, things are not getting better there yet.
  • So, before the university gets to the point where it can’t handle the submission of the dissertation, I’m going to work on getting the revisions done and submitted.
    • Finished first pass through Limitations and Research chapter
    • Tried to start on fixing the conclusions but ran out of motivation
  • #COVID-10 meeting at noon –
    • Set up folders for lit, assets, software and data
    • Started a rough draft of the (chi 2021?) paper
  • Write BSO about moving Mahler to Bach/Radiohead – done
  • Started to work through the SDaaS paper with John D.
  • From models of galaxies to atoms, simple AI shortcuts speed up simulations by billions of times
    • Modeling immensely complex natural phenomena such as how subatomic particles interact or how atmospheric haze affects climate can take many hours on even the fastest supercomputers. Emulators, algorithms that quickly approximate these detailed simulations, offer a shortcut. Now, work posted online shows how artificial intelligence (AI) can easily produce accurate emulators that can accelerate simulations across all of science by billions of times.

John’s Hopkins gets dashboard of the day

Phil 3.13.20

7:30 – 7:00 ASRC PhD

  • 2:00 Meeting with Daren D? Nope
  • Working on revisions – Finished the limitations and research agenda chapter body! Now I need to add the overview and the summary. Then on to the revisit of my hut.

Phil 3.12.20

7:00 – 6:00 ASRC GOES

Phil 3.11.20

7:00 – 5:00 ASRC GOES

  • A couple more paragraphs in the revisions
  • Working on the SDaaS paper. Getting close to finished
  • Mission meeting
    • Update status to delay deliverables
    • Still waiting on data
    • Simulation running – demo tomorrow
    • Evaluate against known yaw flip
    • White papers for John D
    • 20 sims so far
    • Need to install Influx, dammit!
    • Paragraph on 400 hrs
    • Paragraph on schedule
  • Sent Erik paragraphs

Phil 2.20.20

7:00 – ?? ASRC GOES / PhD

  • Defense
    • Fixes as per Wayne
    • Walkthrough and timing
    • Order food (sandwiches, dessert, water)
    • 1:00 – 2:00pm ITE 459
  • Set up dev box
    • Intellij
    • Project
    • FF
    • GitHub desktop
    • Set up non-admin user
    • detach admin account from MS
  • Waikato meeting at 6:30