Category Archives: COVID-19

Phil 6.2.20


Remember when all we had to worry about dealing with a pandemic? Good times.

GPT-2 Agents

  • Downloaded a lot of PGN files. Looks like I could pull down the entire archive here: Need to write a script that pulls down the files and unzips them
  • Need to scan the directory and parse each pgn file – done
  • Created train (700,000 lines and eval 100,000 lines) files
  • Feed into GPT-2! Seems to be cranking along:



  • Submitted paper and slide deck
  • Putting together a brown-bag style presentation for the development of the GAN code
  • Ping Vadim to see what to do next?

ML seminar

  • Presented brown-bag talk
  • Need to share slides and put code on GitHub


Phil 5.26.20

Had a good, cathartic ride yesterday:

GPT-2 Agents

  • I’ve been working on the PGNtoEnglish class and was having an odd bug where occasionally a piece would pull a piece from the other side of the board. Since it was intermittent, it required many print statements and searching through the logs for “black knight”

blac knight

  • My problem was in forgetting how Python indexes into arrays. Here’s the code in question:


  • When I first wrote this, I had to deal with a lot of potential coordinates that were off the board, with indexes like (-2, -1), or (10, 8) for an 8×8 board. I thought to handle this with a try/except on IndexError (the bottom highlight). In other languages this would have worked, but Python allows negative indexes. Ooops! Adding the test for either index being negative (the top highlight) fixed that bug


  • Ping Zach – done


  • Write up code review thoughts for Erik -done
  • Add n_critic to base class, along with adjustable false flag value
    • First, making sure that everything still works. Seems to.
    • Here’s the best I can do today, using the OneDGAN2a class with an RMSProp(lr=0.0005)


  • Assemble all the bits for an example
    • Verified that the InfluxTestTrainBase still works, and it’s using the InfluxDB values
    • Assemble all the bits for an example
      • Created a NoiseGAN2 with the same amount of points as the InfluxTestTrainBase model – done. Looks real good on the noise, too:


  • How to trim the columns on a 2D Numpy array:
    results = self.ifq.run_query(self.bucket, begin, end, filter_str)
    results = self.ifq.to_nd_array(results)
    results = np.delete(results, slice(clamp, None), 1)
    predict_table = model.predict(results)
  • Here’s all the parts nailed together:
  • Start the paper and the deck

ML Group

  • Need to create a walkthrough of coding practices for next week. I think I’ll use the trajectory of the GAN coding as the basis


Phil 5.25.20

GPT-2 Agents

  • Work on openings
  • Maybe create database that contains games as collections of moves. A query could produce the text for the language model
  • Created a database for openings, since there are multiple versions of the same opening and I couldn’t just use the site as an index into a dict. I mean…


  • Chasing down more bugs. Did you know that ‘#’ means checkmate as well as ‘++’? Now you do!


  • Rework the offsets to a y-day linear model rather than an x-y day linear model


  • Semester’s over, so ping Thom – done

Phil 5.19.20

Groceries today. In the Before Time, this meant swinging by the store for fresh fruit and veggies while picking up something interesting for dinner that night. Now it means going to two stores (because shortages) standing in separated lines and getting two weeks of food. Very not fun

A Comprehensive guide to Fine-tuning Deep Learning Models in Keras (Part I)

Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

The Illustrated Transformer

Collective Intelligence 2020 explores the impact of technology and big data on the ways in which people come together to communicate, combine knowledge and get work done. (Thursday June 18)


Attention seq2seq example


  • Tried to get the HuggingFace translator (colab here) to download and run, but I got this error: ‘Helsinki-NLP/opus-mt-en-ROMANCE’ ‘Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted‘ I’m going to try downloading the model and vocab into a folder within the project to see if something’s missing using these calls:
    # to put the model in a named directory, load the model and tokenizer and then save (as per'pretrained/opus-mt-en-ROMANCE')'pretrained/opus-mt-en-ROMANCE')

GPT-2 Agents

  • Read in multiple games
    • Handled unconventional names
    • Handling moves split across lines – done
    • Need to handle promotion (a8=Q+), with piece change and added commentary. This is going to be tricky because the ‘Q’ is detected as a piece even though it appears after  the location. Very special case.
      • Rule: A pawn promotion move has =Q =N =R or =B appended to that.
  • Create a short intro
  • Save to txt file


  • Read the GAN chapter of Generative Deep Learning last night, so now I have a better understanding of Oscillating Loss, Mode Collapse, Wasserstein Loss, the Lipschitz Constraint, and Gradient Penalty Loss. I think I also understand how to set up the callbacks to implement
  • Since the MLP probably wasn’t the problem, go back to using that for the generator and focus on improving the discriminator.
  • That made a big difference!


  • The trained version is a pretty good example of mode collapse. I think we can work on improving the discriminator 🙂
  • This approach is already better at finding the amplitude in the noise samples from last week!


  • Ok, going back to the sin waves to work on mode collapse. I’m going to have lower-amplitude sin waves as well
  • That seems like a good place to start
  • Conv1D(filters=self.vector_size, kernel_size=4, strides=1, activation=’relu’, batch_input_shape=(self.num_samples, self.vector_size, 1))
  • Conv1D(filters=self.vector_size, kernel_size=6, strides=2, activation=’relu’, batch_input_shape=(self.num_samples, self.vector_size, 1)):
  • Conv1D(filters=self.vector_size, kernel_size=8, strides=2, activation=’relu’, batch_input_shape=(self.num_samples, self.vector_size, 1))
  • The input vector size here is only 20 dimensions. So this means that the kernel size is 80% of the vector! Conv1D(filters=self.vector_size, kernel_size=16, strides=2, activation=’relu’, batch_input_shape=(self.num_samples, self.vector_size, 1))
  • Upped the vector size from 20 to 32
  • Tried using MaxPool1D but had weird reshape errors. Doing one more pass with two layers before starting to play with Wasserstein Loss, which i think is a better way to go. First though, let’s try longer trainings.
  • 10,000 epochs:
  • 20,000 epochs:

Phil 5.15.20

Fridays are hard. I feel like I need a break from pushing this rock up hill alone. Nice day for a ride tomorrow, so a few of us will probably meet up.


  • Zach seems to be making progress in fits and starts. No word from Aaron
  • One way to make the system more responsive is to see if the rates are above or below the regression. Above can be flagged.

GPT-2 Agents

  • More PGNtoEnglish. Getting close I think.
  • Added pawn attack moves (diagonals)
  • Adding comment regex – done
  • Some problem handling this:
    Evaluating move [Re1 Qb6]
    search at (-6, -6) for black queen out of bounds
    search at (6, -6) for black queen out of bounds
    search at (0, -6) for black queen out of bounds
    search at (7, 7) for black queen out of bounds
    search at (-7, -7) for black queen out of bounds
    search at (7, -7) for black queen out of bounds
    search at (7, 0) for black queen out of bounds
    search at (0, -7) for black queen out of bounds
    raw: white: Re1, black: Qb6
    	expanded: white:  Fred Van der Vliet moves white rook from f1 to e1.
    	black: unset


  • Need to make the output of the generator work as input to the discriminator.
  • So I need to get to the input vector of latent noise to an output that is the size of the real data. It’s easy to do with Dense, but Dense and Conv1D don’t get along. I think I can get around that by reshaping the dense layer to something that a Conv1D can take. But that probably looses a lot of information, since each neuron will have some of each noise sample in it. But the information is noise in the first place, so it’s just resampled noise? The other option is to upsample, but that requires the latent vector to divide evenly into the input vector for the discriminator.
  • Here’s my code that does the change from a dense to Conv1D:
    self.g_model.add(Dense(self.vector_size*self.num_samples, activation='relu', batch_input_shape=(self.latent_dim, self.num_samples)))
    self.g_model.add(Reshape(target_shape=(self.vector_size, self.num_samples)))
    self.g_model.add(Conv1D(filters=self.vector_size, kernel_size=5, activation='tanh', batch_input_shape=(self.vector_size, self.num_samples, 1)))
  • The code that produces the latent noise is:
    def generate_latent_points(self, span:float=1.0) -> np.array:
        x_input = np.random.randn(self.latent_dim * self.num_samples)*self.span
        # reshape into a batch of inputs for the network
        x_input = x_input.reshape(self.latent_dim, self.num_samples)
        return x_input
  • The “real” values are:
    real_matrix = 
    [[-0.34737792 -0.7081109   0.93673414 -0.071527   -0.87720268]
     [ 0.99876073 -0.46088645 -0.61516785  0.97288676 -0.19455964]
     [ 0.97121222 -0.18755778 -0.81510907  0.8659679   0.09436946]
     [-0.72593361 -0.32328777  0.99500398 -0.50484775 -0.57482239]
     [ 0.72944027 -0.92555418  0.04089262  0.89151951 -0.78289867]
     [ 0.79514567 -0.88231211 -0.06080288  0.93291797 -0.71565884]
     [ 0.78083404 -0.89301473 -0.03758353  0.92429527 -0.73170157]
     [ 0.08266474 -0.94058595  0.70017899  0.3578314  -0.9979998 ]
     [-0.39534886 -0.67069473  0.95356385 -0.12295042 -0.85123299]
     [ 0.73424796  0.31175013 -0.99371562  0.5153131   0.56482379]]
  • The latent values are (note that the matrix is transposed):
    latent_matrix = 
    [[  8.73701754   6.10841293   9.31566343  -2.00751851   0.10715919
        6.94580853  -6.95308374   6.97502697 -11.09777023  -8.79311041]
     [ -3.61789323   0.11091496  10.94717459   3.14579647 -13.23974342
        2.78914476   9.40101397 -17.75756896   2.87461527   6.65877192]
     [  5.77331701   7.71326491   9.9877786   -3.81972802  -5.86490109
       -6.68585542 -13.59478633  -7.66952834 -10.78863284   5.9248856 ]
     [ -3.05226511  -5.36347909   1.3377953   14.87752343  -0.21993387
      -13.47737126   1.39357385  -1.85004465   6.83400948   1.21105276]]
  • The values created by the generator are:
    predict_matrix = 
    [[[-0.9839389   0.18747564 -0.9449842  -0.66334486 -0.9822154 ]]
     [[ 0.9514655  -0.9985579   0.76473945 -0.9985249  -0.9828463 ]]
     [[-0.58794653 -0.9982161   0.9855345  -0.93976855 -0.9999758 ]]
     [[-0.9987122   0.9480774  -0.80395573 -0.999845    0.06755089]]]
  • So now I need to get the number of rows up to the same value as the real data
  • Ok, so here’s how that works. We use tf.Keras.Reshape(), which is pretty simple. You simply put the most of shape you want as the single argument and it. So for these experiments, I had ten rows of 5 features, plus an extra dimension. So you would think that reshape(10,5,1) would be what you want.
  • Au contraire! Keras wants to be able to have flexibility, so one dimension is left to vary. The argument is actually (5, 1). Here are two versions. First is a generator using a Dense network:
    def define_generator_Dense(self) -> Sequential:
        self.g_model_Dense = Sequential()
        self.g_model_Dense.add(Dense(4, activation='relu', kernel_initializer='he_uniform', input_dim=self.latent_dim))
        self.g_model_Dense.add(Dense(self.vector_size, activation='tanh')) # activation was linear
        self.g_model_Dense.add(Reshape((self.vector_size, 1)))
        print("g_model_Dense.output_shape = {}".format(self.g_model_Dense.output_shape))
        # compile model
        loss_func = tf.keras.losses.BinaryCrossentropy()
        opt_func = tf.keras.optimizers.Adam(0.001)
        self.g_model_Dense.compile(loss=loss_func, optimizer=opt_func)
        return self.g_model_Dense
  • Second is a network using Conv1D layers
    def define_generator_Dense_to_CNN(self) -> Sequential:
        self.g_model_Dense_CNN = Sequential()
        self.g_model_Dense_CNN.add(Dense(self.num_samples * self.vector_size, activation='relu', batch_input_shape=(self.num_samples, self.latent_dim)))
        self.g_model_Dense_CNN.add(Reshape(target_shape=(self.num_samples, self.vector_size)))
        self.g_model_Dense_CNN.add(Conv1D(filters=self.vector_size, kernel_size=self.num_samples, activation='tanh', batch_input_shape=(self.num_samples, self.vector_size, 1))) # activation was linear
        self.g_model_Dense_CNN.add(Reshape((self.vector_size, 1)))
        # compile model
        loss_func = tf.keras.losses.BinaryCrossentropy()
        opt_func = tf.keras.optimizers.Adam(0.001)
        self.g_model_Dense_CNN.compile(loss=loss_func, optimizer=opt_func)
        return self.g_model_Dense_CNN


  • Both evaluated correctly against the discriminator, so I should be able to train the whole GAN, once it’s assembled. But that is not something to start at 4:30 on a Friday afternoon!
    real predict = (10, 1)[[0.42996567]
     [0.5600004 ]
     [0.5098837 ]
     [0.4046895 ]
     [0.4196912 ]
     [0.5080263 ]]
    gdense_mat predict = (10, 1)[[0.48928624]
     [0.5       ]
     [0.4949373 ]
     [0.5       ]
     [0.5973854 ]
     [0.5       ]
     [0.5183723 ]
     [0.4212265 ]]
    gdcnn_mat predict = (10, 1)[[0.48057705]
     [0.5026125 ]
     [0.4902147 ]
     [0.5988    ]


Phil 5.13.20



  • Zach appears happy with the changes


  • The Arabic finetuning didn’t work. Drat.
  • This could, though….


GPT-2 Agents

  • Need to handle hints, takes, check, and checkmate:
    num_regex = re.compile('[^0-9]')
    alpha_regex = re.compile('[0-9]')
    hints_regex = re.compile('[KQNBRx+]')
    def parse_move(m):
        hints = []
        cleaned = hints_regex.sub('', m)
        if '++' in m:
        elif '+' in m:
        if 'x' in m:
        if len(cleaned) > 2:
            cleaned = cleaned[1:]
        piece = piece_regex.sub('', m)
        num = num_regex.sub('', cleaned)
        letter = alpha_regex.sub('', cleaned)
        return "{}/{}: piece = [{}], square = ({}, {}), hints = {}".format(m, cleaned, piece, letter, num, hints)
  • Roll this in tomorrow. Comments {Anything in curly brackets}? Also, it looks like there are more meta tags (Note player name with no comma!):
    [Black "Ding Liren"]
    [WhiteTitle "GM"]
    [BlackTitle "GM"]
    [Opening "Sicilian"]
    [Variation "Najdorf"]
    [WhiteFideId "2020009"]
    [BlackFideId "8603677"]


  • More with TF-GAN from this Google course on GANs
    • Mode Collapse is why the GAN keeps generating a single waveform
    • Need to contact Joel shor as Google. Sent a note on LinkedIn and a followup email to his Google account (from D:\Development\External\tf-gan\README)
  • GANSynth: Making music with GANs
    • In this post, we introduce GANSynth, a method for generating high-fidelity audio with Generative Adversarial Networks (GANs).
  • 10 Lessons I Learned Training GANs for one Year
  • Advanced Topics in GANs
  • As I lower the number of neurons in the generator, it starts to look better, but now there are odd artifacts in the untrained data :
    self.g_model.add(Dense(5, activation='relu', kernel_initializer='he_uniform', input_dim=self.latent_dimension))



  • Thought I’d try the TF-GAN examples but I get many compatability errors that make me thing that this does not work with TF 2.x. So I decided to try the Google Colab. Aaaaand that doesn’t work either:


  • Looking through Generative Deep Learning, it says that CNNs help make the discriminator better:
    • In the original GAN paper, dense layers were used in place of the convolutional layers. However, since then, it has been shown that convolutional layers give greater predictive power to the discriminator. You may see this type of GAN called a DCGAN (deep convolutional generative adversarial network) in the literature, but now essentially all GAN architectures contain convolutional layers, so the “DC” is implied when we talk about GANs. It is also common to see batch normalization layers in the discriminator for vanilla GANs, though we choose not to use them here for simplicity.
  • So, tf.keras.layers.Conv1D
  • 10:00 meeting with Vadim

Phil 5.11.20

Cut my hair for the second time. It looks ok from the front…

I’m also having dreams with crowds in them. Saturday night I dreamed I was at some job with a lot of people in a large building. Last night I dreamed I was sharing a dorm at the Naval Academy?

A foolproof way to shrink deep learning models

  • Train the model, prune its weakest connections, retrain the model at its fast, early training rate, and repeat, until the model is as tiny as you want. 

Graph Neural Networks (GNN)

  • Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with arbitrary depth.


  • Zach’s having issues getting the map to work on mobile
  • Need to start pulling off controlled entities like China and Diamond Princess
  • Made a duplicate of the trending code to play with

GPT-2 Agents

  • More PGNtoEnglish
  • I have pawns and knights moving!


  • With expanded text!
    • ‘Fred Van der Vliet moves white pawn from d2 to d4’
    • ‘Loek Van Wely moves black knight from g8 to f6’


  • Continue with NoiseGAN
  • Isolating noise. Done!


  • Now I need to subsample to produce the training and test sets. Seems to be working
  • Fitting the timeseries sampling into the GAN


  • Try training the GAN?


  • Community Spaces for Interdisciplinary Science and Engagement
    • Dr. Lisa Scheifele is an Associate Professor at Loyola University Maryland and head of the Build-a-Genome research network, where her research focuses on designing and programming cells for new and complex functions. She is also Executive Director at the Baltimore Underground Science Space (BUGSS) community lab. BUGSS provides unique and creative projects to members of the public who have few other opportunities to engage with modern science. As an informal and nontraditional science space, BUGSS’ activities blend biotechnology research, computational tools, artistic expression, and design principles to accomplish interdisciplinary projects driven by community interest and need.

Phil 5.8.20


  • Really have to fix the trending. Places like Brazil, where the disease is likely to be chronic, are not working any more
  • Aaron and I agree if the site’s not updated by 5/15 to pull it down

GPT-2 Agents

  • More PGNtoEnglish
  • Worked out way to search for pieces in a rules-based range. It’ll work for pawns, knights, and kings right now. Will need to add rooks, bishops and queens


  • Try finetuning the model on Arabic to see what happens. Don’t see the txt files?


  • The time taken for all the DB calls is substantial. I need to change the Measurements class so that there is a set of master Measurements that are big enough to subsample other Measurements from. Done. Much faster!
  • Start building noise query, possibly using a high pass filter? Otherwise, subtract the “real” signal from the simulated one
    • Starting with the subtraction, since I have to set up queries anyway, and this will help me debug them
    • Created NoiseGAN class that extends OneDGAN
    • Pulling over table building code from InfluxTestTrainBase()
    • Success!
    • "D:\Program Files\Python37\python.exe" D:/Development/Sandboxes/Influx2_ML/Influx2_ML/
      2020-05-08 14:45:36.077292: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudart64_101.dll
      query = from(bucket:"org_1_bucket") |> range(start:2020-04-13T13:30:00Z, stop:2020-04-13T13:40:00Z) |> filter(fn:(r) => r.type == "noisy_sin" and (r.period == "8"))
      vector size = 100, query returns = 590
    • Probably a good place to stop for the day
  • 10:00 Meeting. Vadim seems to be making good progress. Check in on Tuesday

Phil 5.7.20


  • Everything is silent again.

GPT-2 Agents

  • Continuing with PGNtoEnglish
    • Building out move text
    • Changing board to a dataframe, since I can display it as a table in pyplot – done!


  • Here’s the code for making the chesstable table in pyplot:
    import pandas as pd
    import matplotlib.pyplot as plt
    class Chessboard():
        def __init__(self):
        def reset(self):
            self.cols = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
            self.rows = [8, 7, 6, 5, 4, 3, 2, 1]
            self.board = df = pd.DataFrame(columns=self.cols, index=self.rows)
            for number in self.rows:
                for letter in self.cols:
          [number, letter] = pieces.NONE.value
        def populate_board(self):
  [1, 'a'] = pieces.WHITE_ROOK.value
  [1, 'h'] = pieces.WHITE_ROOK.value
  [1, 'b'] = pieces.WHITE_KNIGHT.value
  [1, 'g'] = pieces.WHITE_KNIGHT.value
  [1, 'c'] = pieces.WHITE_BISHOP.value
  [1, 'f'] = pieces.WHITE_BISHOP.value
  [1, 'd'] = pieces.WHITE_QUEEN.value
  [1, 'e'] = pieces.WHITE_KING.value
  [8, 'a'] = pieces.BLACK_ROOK.value
  [8, 'h'] = pieces.BLACK_ROOK.value
  [8, 'b'] = pieces.BLACK_KNIGHT.value
  [8, 'g'] = pieces.BLACK_KNIGHT.value
  [8, 'c'] = pieces.BLACK_BISHOP.value
  [8, 'f'] = pieces.BLACK_BISHOP.value
  [8, 'd'] = pieces.BLACK_KING.value
  [8, 'e'] = pieces.BLACK_QUEEN.value
            for letter in self.cols:
      [2, letter] = pieces.WHITE_PAWN.value
      [7, letter] = pieces.BLACK_PAWN.value
        def print_board(self):
            fig, ax = plt.subplots()
            # hide axes
            ax.table(cellText=self.board.values, colLabels=self.cols, rowLabels=self.rows, loc='center')


  • Continuing with the MLP sequence-to-sequence NN
  • Writing
  • Reading
    • Hmm. Just realized that the input vector being defined by the query is a bit problematic. I think I need to define the input vector size and then ensure that the query creates sufficient points. Fixed. It now stores the model with the specified input vector size:


  • And here’s the loaded model in newly-retrieved data:
  • Here’s the model learning two waveforms. Went from 400×2 neurons to 3200×2:
  • Combining with GAN
    • Subtract the sin from the noisy_sin to get the moise and train on that
  • Start writing paper? What are other venues beyond GVSETS?
  • 2:00 status meeting


  • 3:30 Meeting
  • 6:00 Meeting

Phil 5.6.20


  • I looked at the COVID-19-TweetIDs GitHub project, and it is in fact lists of ids:
  • These can work by appending that number to the string “”, like this:
  • The way to get the text in Python appears to be tweepy. This snippet from stackoverflow appears to show how to do it, but I haven’t verified yet.
    import tweepy
    consumer_key = xxxx
    consumer_secret = xxxx
    access_token = xxxx
    access_token_secret = xxxx
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    tweets = api.statuses_lookup(id_list) # id_list is the list of tweet ids
    tweet_txt = []
    for i in tweets:


GPT-2 Agents

  • Continuing with PGNtoEnglish
    • Figuring out how to parse the moves text, using the wonderful regex101 site
  • 4:30 meeting
    • We set up an Overleaf project with the goal to submit to the Harvard/Kennedy Misinformation Review
    • We talked about the GPT-2 as a way of clustering tweets. Going to try finetuning with some Arabic novels first to see if it can work in that language


  • Continuing with the MLP sequence-to-sequence NN
    • Getting the data to fit into nice, rectangular arrays, which is no straightforward, since the time window of the query can return a varying number of results. So I have to run the query, then trim the arrays down so that they are all the length of the shortest. Here’s the results:
  • I’ve got the training and prediction working pretty well. Stopping for the day
  • Tomorrow I’ll get the models to write out and read in
  • 2:00 status meeting
    • Two weeks to getting the sim running?

Phil 5.5.20


  • Just goes to show that you shouldn’t take regression fits as correct

GPT-2 Agents

  • More PGNtoEnglish
  • Discovered typing.TextIO. I love typing to death 🙂
  • Finished parsing meta information



  • Progress meeting with Vadim and Isaac
  • Train and save a 2-layer, 400 neuron MLP. No ensembles for now
  • Set up GAN to add noise


Phil 5.1.20

Geez, it’s May! What a weird time


  • Chatted with Zach. He’s bogged down in database issues, but I think it’s coming along

GPT-2 Agents

  • Upgrade TF, Torch, transformers, Nvidia, and CUDA on laptop
  • Set up input and output files
  • Pull char count of probe out and add that to the total generated
  • Try training on Moby Dick as per these instructions
    • The following example fine-tunes GPT-2 on WikiText-2. We’re using the raw WikiText-2 (no tokens were replaced before the tokenization). The loss here is that of causal language modeling.
      export TRAIN_FILE=/path/to/dataset/wiki.train.raw
      export TEST_FILE=/path/to/dataset/wiki.test.raw
      python \
          --output_dir=output \
          --model_type=gpt2 \
          --model_name_or_path=gpt2 \
          --do_train \
          --train_data_file=$TRAIN_FILE \
          --do_eval \

      This takes about half an hour to train on a single K80 GPU and about one minute for the evaluation to run. It reaches a score of ~20 perplexity once fine-tuned on the dataset.

  • Ran with this command
    python --output_dir=output .\gpt2data\moby_dick_model --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=.\gptdata\moby_dick_train.txt --do_eval --eval_data_file=.\gptdata\moby_dick_test.txt

    Which started the task correctly, but…

    RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 8.00 GiB total capacity; 6.26 GiB already allocated; 77.55 MiB free; 6.31 GiB reserved in total by PyTorch)

    Guess I’ll try running it on my work machine. If it runs there, I guess it’s time to upgrade my graphics card

  • That was not the problem! There is something going on with batch size. Added  per_gpu_train_batch_size=1
  • Couldn’t use links. os.isfile() chokes
  • The model doesn’t seem to be saved? Looks like it is:
    05/01/2020 09:43:49 - INFO - transformers.trainer -   Saving model checkpoint to output
    05/01/2020 09:43:49 - INFO - transformers.configuration_utils -   Configuration saved in output\config.json
    05/01/2020 09:43:50 - INFO - transformers.modeling_utils -   Model weights saved in output\pytorch_model.bin
    05/01/2020 09:43:50 - INFO - __main__ -   *** Evaluate ***
    05/01/2020 09:43:50 - INFO - transformers.trainer -   ***** Running Evaluation *****
    05/01/2020 09:43:50 - INFO - transformers.trainer -     Num examples = 97
    05/01/2020 09:43:50 - INFO - transformers.trainer -     Batch size = 16
    Evaluation: 100%|██████████| 7/7 [00:06<00:00,  1.00it/s]
    05/01/2020 09:43:57 - INFO - __main__ -   ***** Eval results *****
    05/01/2020 09:43:57 - INFO - __main__ -     perplexity = 43.311306196182095
  • Found it. It defaults to the output directory in transformers/examples
  • To get this version, which is a PyTorch model, you have to add the ‘from_pt=True‘ argument:
    model = TFGPT2LMHeadModel.from_pretrained("../data/moby_dick_model", pad_token_id=tokenizer.eos_token_id, from_pt=True)
  • And the results are great!
    I enjoy walking with my cute dog:
    	[0]: I enjoy walking with my cute dog, and then I like to take pictures! But, as for you, you will have to go all the way round for the proper weather! Here, I have some water in my belly! How am I
    	[1]: I enjoy walking with my cute dog when I walk in the yard, and when we have been going in, I am always excited to try a little bit of the wildest stuff. I like to see my dogs do it. I like
    	[2]: I enjoy walking with my cute dog because he has no fear of you leaving him alone. In that case, let me explain that I am a retired Sperm Whale in my Sperm Whale breeding herd. I was recently the leader of the
    Far out in the uncharted backwaters of the unfashionable end:
    	[0]: Far out in the uncharted backwaters of the unfashionable end of the Indian Ocean, you will see whales of many great variety. “Wherever they go, their mouths may be wide open, or they may be so packed
    	[1]: Far out in the uncharted backwaters of the unfashionable end of the planet. On his way, it seemed that he was about to embark upon something which no mortal could have foreseen; it being the Cape Horn of the Pacific
    	[2]: Far out in the uncharted backwaters of the unfashionable end. A curious discovery is made of the whale-whale. How much is he? I wonder how many sperm whales have there! I am still trying to get
    It was a pleasure to burn. :
    	[0]: It was a pleasure to burn. His teeth were the first thing to slide down to the side of his cheeks—a pointless thing—while my face stood there in this hideous position. It was my last, and only,
    	[1]: It was a pleasure to burn. But, as the day wore on, another peculiarity was discovered in the method. When this first method was advanced to be used for preparing the best lye, it was found that it was, instead
    	[2]: It was a pleasure to burn. “Sir, “aye, that’s true—” said I with a sort of exasperation. I then took one of the other boats and in a very similar
    It was a bright cold day in April, and the clocks were striking thirteen. :
    	[0]: It was a bright cold day in April, and the clocks were striking thirteen. It seemed that Captain Peleg had had just arrived, and was sitting in his Captain-Commander's cabin, and was trying to get up some time; but Pe
    	[1]: It was a bright cold day in April, and the clocks were striking thirteen. One of us, who had been living in the tent for six days, still felt like the moon. I saw him. I saw him again. He looked just like
    	[2]: It was a bright cold day in April, and the clocks were striking thirteen. “Good afternoon, sir, it was the very first Sabbath of the year, and the New Year is the first time the people of the world have an


  • Need to get the chess database and build a corpora. Working on a PGN to English translator. Doesn’t look toooooo bad


    • Continue with GANS. Maybe explore 1D CNNs?
    • The run with the high-frequency run actually looks pretty good:

      I think it may be a better use of my time to assemble all the components for a first pass proof-of concept

  • 10:00 Meeting with Vadim and Isaac
    • I walked through the whole controller architecture from the base class to the running version. Vadim will start implementing a Sim2 version using the base classes and the dictionary. Then we can work on writing to and reading from InfluxDB

Phil 4.30.20

Had some kind of power hiccup this morning and discovered that my computer was connected to the surge-suppressor part of the UPS. My box is now most unhappy as it recovers. On the plus side, computer recover from this sort of thing now.


  • Fixed the neighbor list and was pleasantly surprised that it worked for the states


  • Set up input and output files
  • Pull char count of probe out and add that to the total generated
  • Start looking into finetuning
    • Here are all the hugingface examples
      • export TRAIN_FILE=/path/to/dataset/wiki.train.raw
        export TEST_FILE=/path/to/dataset/wiki.test.raw
        python \
            --output_dir=output \
            --model_type=gpt2 \
            --model_name_or_path=gpt2 \
            --do_train \
            --train_data_file=$TRAIN_FILE \
            --do_eval \
      • source in GitHub
      • Tried running without any arguments as a sanity check, and got this: huggingface ImportError: cannot import name ‘MODEL_WITH_LM_HEAD_MAPPING’. Turns out that it won’t work without PyTorch being installed. Everything seems to be working now:
        usage: [-h] [--model_name_or_path MODEL_NAME_OR_PATH]
                                        [--model_type MODEL_TYPE]
                                        [--config_name CONFIG_NAME]
                                        [--tokenizer_name TOKENIZER_NAME]
                                        [--cache_dir CACHE_DIR]
                                        [--train_data_file TRAIN_DATA_FILE]
                                        [--eval_data_file EVAL_DATA_FILE]
                                        [--line_by_line] [--mlm]
                                        [--mlm_probability MLM_PROBABILITY]
                                        [--block_size BLOCK_SIZE] [--overwrite_cache]
                                        --output_dir OUTPUT_DIR
                                        [--overwrite_output_dir] [--do_train]
                                        [--do_eval] [--do_predict]
                                        [--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
                                        [--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
                                        [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                                        [--learning_rate LEARNING_RATE]
                                        [--weight_decay WEIGHT_DECAY]
                                        [--adam_epsilon ADAM_EPSILON]
                                        [--max_grad_norm MAX_GRAD_NORM]
                                        [--num_train_epochs NUM_TRAIN_EPOCHS]
                                        [--max_steps MAX_STEPS]
                                        [--warmup_steps WARMUP_STEPS]
                                        [--logging_dir LOGGING_DIR]
                                        [--logging_steps LOGGING_STEPS]
                                        [--save_steps SAVE_STEPS]
                                        [--save_total_limit SAVE_TOTAL_LIMIT]
                                        [--no_cuda] [--seed SEED] [--fp16]
                                        [--fp16_opt_level FP16_OPT_LEVEL]
                                        [--local_rank LOCAL_RANK] error: the following arguments are required: --output_dir

        And I still haven’t broken my text generation code. Astounding!

    • Moby Dick from Gutenberg
    • Chess
    • Covid tweets
    • Here’s the cite:
        title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
        author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},


  • Set up meeting with Issac and Vadim for control
  • Continue with GAN
    • Struggled with getting training to work for a while. I started by getting all the code to work, which included figuring out how the class labels worked (they just classify “real” vs “fake”. Then my results were terrible, basically noise. So I went back and parameterized the training and real data generation to try it on a smaller vector size. That seems to be working. Here’s the untrained model on a time series four elements long: Four_element_untrained
    • And here’s the result after 10,000 epochs and a batch size of 64: Four_element_trained
    • That’s clearly not an accident. So progress!
    • playing around with options  based on this post and changed my Adam value from 0.01 to 0.001, and the output function from linear to tanh based on this random blog post. Better! Four_element_trained
    • I do not understand the loss/accuracy behavior though

      I think this is a good starting point! This is 16 points, and clearly the real loss function is still improving: Four_element_trainedacc_loss

    • Adding more variety of inputs: GAN_trained
    • Trying adding layers. Nope, it generalized to a single sin wave
    • Trying a bigger latent space of 16 dimensions up from 5:GAN_trained
    • Splitting the difference and trying 8. Let’s see 5 again? GAN_trained
    • Hmmm. I think I like the 16 better. Let’s go back to that with a batch size of 128 rather than 64. Better? I think?
    • Let’s see what more samples does. Let’s try 100! Bad move. Let’s try 20, with a bigger random offset GAN_trained
    • Ok, as a last thing for the day, I’m going to try more epochs. Going from 10,000 to 50,000:
    • It definitely finds the best curve to forge. Have to think about that
  • Status report – done