Category Archives: Simulation

Phil 7.2.20

Emergence of polarized ideological opinions in multidimensional topic spaces

  • Opinion polarization is on the rise, causing concerns for the openness of public debates. Additionally, extreme opinions on different topics often show significant correlations. The dynamics leading to these polarized ideological opinions pose a challenge: How can such correlations emerge, without assuming them a priori in the individual preferences or in a preexisting social structure? Here we propose a simple model that reproduces ideological opinion states found in survey data, even between rather unrelated, but sufficiently controversial, topics. Inspired by skew coordinate systems recently proposed in natural language processing models, we solidify these intuitions in a formalism where opinions evolve in a multidimensional space where topics form a non-orthogonal basis. The model features a phase transition between consensus, opinion polarization, and ideological states, which we analytically characterize as a function of the controversialness and overlap of the topics. Our findings shed light upon the mechanisms driving the emergence of ideology in the formation of opinions.

DtZ has broken



  • Continue working on the trajectory. I think that a plot that works entirely on distance to target can result in spirals, so there needs to be some kind of system that looks at the distance to the center line first, and if there is a fail, move the last node from the trajectory list to a dirty list. Then the search restores the cur node to the previous, and continue the search with the trajectory and dirty list nodes ignored?
  • Found an example to fix: A6 – H7
    • get_closest_node() line = [337.0, 44.0, 581.0, 499.0], cur_node = h1, node_list = [‘a6’, ‘b6’, ‘c7’, ‘d7’, ‘e6’, ‘c5’, ‘b7’, ‘g7’, ‘h6’, ‘g6’, ‘c6’, ‘e7’, ‘f7’, ‘g8’, ‘f6’, ‘d8’, ‘a8’, ‘e8’, ‘d6’, ‘b4’, ‘b8’, ‘c8’, ‘c4’, ‘e5’, ‘d5’, ‘d4’, ‘b5’, ‘c3’, ‘e4’, ‘f5’, ‘f8’, ‘f4’, ‘g5’, ‘g4’, ‘h5’, ‘h4’, ‘f3’, ‘d3’, ‘c2’, ‘e3’, ‘d2’, ‘e2’, ‘b2’, ‘b1’, ‘c1’, ‘e1’, ‘d1’, ‘a1’, ‘f1’, ‘g3’, ‘h3’, ‘g2’, ‘f2’, ‘g1’, ‘h2’, ‘h1’]
    • It does fine until it gets to E6, where it chooses c5
    • Adding a target distance-based search if the distance to line search fails seems to have fixed it:
      nlist = list(nx.all_neighbors(self.gml_model, cur_node))
      print("\tneighbors = {}".format(nlist))
      dist_dict = {}
      sx, sy = self.get_center(cur_node)
      for n in nlist:
          if n not in node_list:
              newx, newy = self.get_center(n)
              newa = [newx, newy]
              print("\tline dist checking {} at {}".format(n, newa))
              x, y = self.point_to_line([l[0], l[1]], [l[2], l[3]], newa)
              ca = [x, y]
              ib = self.is_between([sx, sy], [l[2], l[3]], [x, y])
              if ib:
                  # option 1: Find the closest to the line
                  dist = np.linalg.norm(np.array(newa)-np.array(ca))
                  dist_dict[n] = dist
                  print("\tis BETWEEN = {}, dist = {}".format(ib, dist))
      if len(dist_dict) == 0:
          ta = [self.get_center(self.target_node)]
          for n in nlist:
              if n not in node_list:
                  newx, newy = self.get_center(n)
                  newa = [newx, newy]
                  print("\ttarget dist checking {} at {}".format(n, newa))
                  # option 2: Find the closest to the target node
                  dist = np.linalg.norm(np.array(newa)-np.array(ta))
                  dist_dict[n] = dist
                  print("\tis CLOSEST: dist = {}".format(dist))
  • Got legal trajectories working. Below is a set of jumps that are legal (rook to c1, bishop to e3 and then h6, then rook the rest of the way) I think I want to also sort based on closest distance to the current node.



  • Add InfluxDB streaming to DD
  • 10:00 Sim meeting
  • 2:00 Status meeting

Phil 5.14.20

GPT-2 Agents

  • Adding hints and meta information. Still need to handle special pawn moves and pull out comments
    • /[^{\}]+(?=})/g, from
  • Amusingly, my simple parser is now at 560 LOC and counting


  • Working on creating a discriminator using Conv1D layers.
    • With some help from Aaron, I got the discriminator working. There are some issues. I’m currently using batch_input_shape rather than input_shape, which beans that a pre-sized batch is compiled in. The second issue is that the discriminator requires a 3D vector to be fed in, which can’t be produced naturally with Dense/MLP. That means the Generator also has to use Conv1D at least at the output layer
    • I think this post: How to Develop 1D Convolutional Neural Network Models for Human Activity Recognition should help, but I don’t think I have the cognitive ability at the end of the day. Tomorrow

Phil 5.8.20


  • Really have to fix the trending. Places like Brazil, where the disease is likely to be chronic, are not working any more
  • Aaron and I agree if the site’s not updated by 5/15 to pull it down

GPT-2 Agents

  • More PGNtoEnglish
  • Worked out way to search for pieces in a rules-based range. It’ll work for pawns, knights, and kings right now. Will need to add rooks, bishops and queens


  • Try finetuning the model on Arabic to see what happens. Don’t see the txt files?


  • The time taken for all the DB calls is substantial. I need to change the Measurements class so that there is a set of master Measurements that are big enough to subsample other Measurements from. Done. Much faster!
  • Start building noise query, possibly using a high pass filter? Otherwise, subtract the “real” signal from the simulated one
    • Starting with the subtraction, since I have to set up queries anyway, and this will help me debug them
    • Created NoiseGAN class that extends OneDGAN
    • Pulling over table building code from InfluxTestTrainBase()
    • Success!
    • "D:\Program Files\Python37\python.exe" D:/Development/Sandboxes/Influx2_ML/Influx2_ML/
      2020-05-08 14:45:36.077292: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudart64_101.dll
      query = from(bucket:"org_1_bucket") |> range(start:2020-04-13T13:30:00Z, stop:2020-04-13T13:40:00Z) |> filter(fn:(r) => r.type == "noisy_sin" and (r.period == "8"))
      vector size = 100, query returns = 590
    • Probably a good place to stop for the day
  • 10:00 Meeting. Vadim seems to be making good progress. Check in on Tuesday

Phil 5.4.20

It is a Chopin sort of morning


  • Zach got maps and lists working over the weekend. Still a lot more to do though
  • Need to revisit the math to work over the past days

GPT-2 Agents

  • Working on PGN to English.
    • Added game class that contains all the information for a game and reads it in. Games are created and managed by the PGNtoEnglish class
  • Rebased the transformers project. It updates fast


  • Figure out how to save and load models. I’m really not sure what to save, since you need access to the latent space and the discriminator? So far, it’s:
    def save_models(self, directory:str, prefix:str):
        p = os.getcwd()
    def load_models(self, directory:str, prefix:str):
        p = os.getcwd()
        self.d_model = tf.keras.models.load_model("{}}".format(prefix))
        self.g_model = tf.keras.models.load_model("{}}".format(prefix))
        self.gan_model = tf.keras.models.load_model("{}}".format(prefix))
    • Here’s the initial run. Very nice for 10,000 epochs!


    • And here’s the results from the loaded model:


    • The discriminator works as well:
      real accuracy = 100.00%, fake accuracy = 100.00%
      real loss = 0.0154, fake loss = 0.0947%
    • An odd thing is that I can save the GAN model, but can’t load it?
      ValueError: An empty Model cannot be used as a Layer.

      I can rebuild it from the loaded generator and discriminator models though

  • Set up MLP to convert low-fidelity sin waves to high-fidelity
    • Get the training and test data from InfluxDB
      • input is square, output is sin, and the GAN should be noisy_sin minus sin. Randomly move the sample through the domain
    • Got the queries working:
    • Train and save a 2-layer, 400 neuron MLP. No ensembles for now
  • Set up GAN to add noise


  • Ask question about what the ACM and CHI are doing, beyond providing publication venues, to fight misinformation that lets millions of people find fabricated evidence that supports dangerous behavior.
  • Effects of Credibility Indicators on Social Media News Sharing Intent
    • In recent years, social media services have been leveraged to spread fake news stories. Helping people spot fake stories by marking them with credibility indicators could dissuade them from sharing such stories, thus reducing their amplification. We carried out an online study (N = 1,512) to explore the impact of four types of credibility indicators on people’s intent to share news headlines with their friends on social media. We confirmed that credibility indicators can indeed decrease the propensity to share fake news. However, the impact of the indicators varied, with fact checking services being the most effective. We further found notable differences in responses to the indicators based on demographic and personal characteristics and social media usage frequency. Our findings have important implications for curbing the spread of misinformation via social media platforms.

Phil 4.30.20

Had some kind of power hiccup this morning and discovered that my computer was connected to the surge-suppressor part of the UPS. My box is now most unhappy as it recovers. On the plus side, computer recover from this sort of thing now.


  • Fixed the neighbor list and was pleasantly surprised that it worked for the states


  • Set up input and output files
  • Pull char count of probe out and add that to the total generated
  • Start looking into finetuning
    • Here are all the hugingface examples
      • export TRAIN_FILE=/path/to/dataset/wiki.train.raw
        export TEST_FILE=/path/to/dataset/wiki.test.raw
        python \
            --output_dir=output \
            --model_type=gpt2 \
            --model_name_or_path=gpt2 \
            --do_train \
            --train_data_file=$TRAIN_FILE \
            --do_eval \
      • source in GitHub
      • Tried running without any arguments as a sanity check, and got this: huggingface ImportError: cannot import name ‘MODEL_WITH_LM_HEAD_MAPPING’. Turns out that it won’t work without PyTorch being installed. Everything seems to be working now:
        usage: [-h] [--model_name_or_path MODEL_NAME_OR_PATH]
                                        [--model_type MODEL_TYPE]
                                        [--config_name CONFIG_NAME]
                                        [--tokenizer_name TOKENIZER_NAME]
                                        [--cache_dir CACHE_DIR]
                                        [--train_data_file TRAIN_DATA_FILE]
                                        [--eval_data_file EVAL_DATA_FILE]
                                        [--line_by_line] [--mlm]
                                        [--mlm_probability MLM_PROBABILITY]
                                        [--block_size BLOCK_SIZE] [--overwrite_cache]
                                        --output_dir OUTPUT_DIR
                                        [--overwrite_output_dir] [--do_train]
                                        [--do_eval] [--do_predict]
                                        [--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
                                        [--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
                                        [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                                        [--learning_rate LEARNING_RATE]
                                        [--weight_decay WEIGHT_DECAY]
                                        [--adam_epsilon ADAM_EPSILON]
                                        [--max_grad_norm MAX_GRAD_NORM]
                                        [--num_train_epochs NUM_TRAIN_EPOCHS]
                                        [--max_steps MAX_STEPS]
                                        [--warmup_steps WARMUP_STEPS]
                                        [--logging_dir LOGGING_DIR]
                                        [--logging_steps LOGGING_STEPS]
                                        [--save_steps SAVE_STEPS]
                                        [--save_total_limit SAVE_TOTAL_LIMIT]
                                        [--no_cuda] [--seed SEED] [--fp16]
                                        [--fp16_opt_level FP16_OPT_LEVEL]
                                        [--local_rank LOCAL_RANK] error: the following arguments are required: --output_dir

        And I still haven’t broken my text generation code. Astounding!

    • Moby Dick from Gutenberg
    • Chess
    • Covid tweets
    • Here’s the cite:
        title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
        author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},


  • Set up meeting with Issac and Vadim for control
  • Continue with GAN
    • Struggled with getting training to work for a while. I started by getting all the code to work, which included figuring out how the class labels worked (they just classify “real” vs “fake”. Then my results were terrible, basically noise. So I went back and parameterized the training and real data generation to try it on a smaller vector size. That seems to be working. Here’s the untrained model on a time series four elements long: Four_element_untrained
    • And here’s the result after 10,000 epochs and a batch size of 64: Four_element_trained
    • That’s clearly not an accident. So progress!
    • playing around with options  based on this post and changed my Adam value from 0.01 to 0.001, and the output function from linear to tanh based on this random blog post. Better! Four_element_trained
    • I do not understand the loss/accuracy behavior though

      I think this is a good starting point! This is 16 points, and clearly the real loss function is still improving: Four_element_trainedacc_loss

    • Adding more variety of inputs: GAN_trained
    • Trying adding layers. Nope, it generalized to a single sin wave
    • Trying a bigger latent space of 16 dimensions up from 5:GAN_trained
    • Splitting the difference and trying 8. Let’s see 5 again? GAN_trained
    • Hmmm. I think I like the 16 better. Let’s go back to that with a batch size of 128 rather than 64. Better? I think?
    • Let’s see what more samples does. Let’s try 100! Bad move. Let’s try 20, with a bigger random offset GAN_trained
    • Ok, as a last thing for the day, I’m going to try more epochs. Going from 10,000 to 50,000:
    • It definitely finds the best curve to forge. Have to think about that
  • Status report – done

Phil 4.28.20


  • Upload paper to Overleaf – done!


  • Fix bug using this:
    slope, intercept, r_value, p_value, std_err = stats.linregress(xsub, ysub)
    # slope, intercept = np.polyfit(x, y, 1)
    yn = np.polyval([slope, intercept], xsub)
    steps = 0
    if slope < 0:
        steps = abs(y[-1] / slope)
    reg_x = []
    reg_y = []
    start = len(yl) - max_samples
    yval = intercept + slope * start
    for i in range(start, len(yl)-offset):
        yval += slope
  • Anything else?

GPT-2 Agents

  • Install and test GPT-2 Client
  • Failed spectacularly. It depends on a lot of TF1.x items, like There is an issue request in.
  • Checked out the project to see if anything could be done. “Fixed” the contrib library, but that just exposed other things. Uninstalled.
  • Tried using the upgrade tool described here, which did absolutely nothing, as near as I can tell


  • Continue figuring out GANs
  • Here are results using 2 latent dimensions, a matching hint, a line hint, and no hint
  • Here are results using 5 latent dimensions, a matching hint, a line hint, and no hint
  • Meeting at 10:00 with Vadim and Isaac
    • Wound up going over Isaac’s notes for Yaw Flip and learned a lot. He’s going to see if he can get the algorithm used for the maneuver. If so, we can build the control behavior around that. The goal is to minimize energy and indirectly fuel costs


Phil 4.27.20

Took the motorcycle for its weekly spin and rode past the BWI terminal. By far the most Zombie Apocalypse thing I’ve seen so far.

The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.


  • Reworked regression code to only use the last 14 days of data. It seems to take the slowing rate change into account better
  • That could be a nice interactive feature to add to the website. A js version of regression curve fitting is here.


  • Got Antonio’s revisions back and enbiggened the two chats for better readability

GPT-2 Agents

  • Going to try the GPT-2 Client and see how it works.
  • Whoops, needs TF 2.1. Upgraded that and the drivers – done


  • Step through the GAN code and look for ways of restricting the latent space to being near the simulation output
  • Here’s the GAN trying to fit a bit of a sin wave from the beginning of the dayGAN2Sin
  • And here’s the evolution of the GAN using hints and 5 latent dimensions from the end of the day: GAN_fit
  • And here are the accuracy outputs:
    epoch = 399, real accuracy = 87.99999952316284%, fake accuracy = 37.99999952316284%
    epoch = 799, real accuracy = 43.99999976158142%, fake accuracy = 56.99999928474426%
    epoch = 1199, real accuracy = 81.00000023841858%, fake accuracy = 25.999999046325684%
    epoch = 1599, real accuracy = 81.00000023841858%, fake accuracy = 40.99999964237213%
    epoch = 1999, real accuracy = 87.99999952316284%, fake accuracy = 25.999999046325684%
    epoch = 2399, real accuracy = 89.99999761581421%, fake accuracy = 20.000000298023224%
    epoch = 2799, real accuracy = 87.00000047683716%, fake accuracy = 46.00000083446503%
    epoch = 3199, real accuracy = 80.0000011920929%, fake accuracy = 47.999998927116394%
    epoch = 3599, real accuracy = 76.99999809265137%, fake accuracy = 43.99999976158142%
    epoch = 3999, real accuracy = 68.99999976158142%, fake accuracy = 30.000001192092896%
    epoch = 4399, real accuracy = 75.0%, fake accuracy = 33.000001311302185%
    epoch = 4799, real accuracy = 63.999998569488525%, fake accuracy = 28.00000011920929%
    epoch = 5199, real accuracy = 50.0%, fake accuracy = 56.00000023841858%
    epoch = 5599, real accuracy = 36.000001430511475%, fake accuracy = 56.00000023841858%
    epoch = 5999, real accuracy = 49.000000953674316%, fake accuracy = 60.00000238418579%
    epoch = 6399, real accuracy = 34.99999940395355%, fake accuracy = 58.99999737739563%
    epoch = 6799, real accuracy = 70.99999785423279%, fake accuracy = 43.00000071525574%
    epoch = 7199, real accuracy = 70.99999785423279%, fake accuracy = 30.000001192092896%
    epoch = 7599, real accuracy = 47.999998927116394%, fake accuracy = 50.0%
    epoch = 7999, real accuracy = 40.99999964237213%, fake accuracy = 52.99999713897705%
    epoch = 8399, real accuracy = 23.000000417232513%, fake accuracy = 82.99999833106995%
    epoch = 8799, real accuracy = 23.000000417232513%, fake accuracy = 75.0%
    epoch = 9199, real accuracy = 31.00000023841858%, fake accuracy = 69.9999988079071%
    epoch = 9599, real accuracy = 37.99999952316284%, fake accuracy = 68.00000071525574%
    epoch = 9999, real accuracy = 23.000000417232513%, fake accuracy = 83.99999737739563%
  • Found a bug in the short-regression code. Need to roll in the fix regression
  • Here’s the working code:
    slope, intercept, r_value, p_value, std_err = stats.linregress(xsub, ysub)
    # slope, intercept = np.polyfit(x, y, 1)
    yn = np.polyval([slope, intercept], xsub)
    steps = 0
    if slope < 0:
        steps = abs(y[-1] / slope)
    reg_x = []
    reg_y = []
    start = len(yl) - max_samples
    yval = intercept + slope * start
    for i in range(start, len(yl)-offset):
        yval += slope


Phil 4.22.20

  • Amsterdam, 24 April 2020​
  • This workshop aims to bring together researchers and practitioners from the emerging fields of Graph Representation Learning and Geometric Deep Learning. The workshop will feature invited talks and a poster session. There will be ample opportunity for discussion and networking.​
  • Invited talks will be live-streamed on YouTube:
  • Looking for an online seminar that presents the latest advances in reinforcement learning theory? You just found it! We aim to bring you a virtual seminar (approximately) every Tuesday at 5pm UTC featuring the latest work in theoretical reinforcement learning.


  • Added P-threshold to json file. I’m concerned that everyone is too busy to participate any more. Aaron hasn’t even asked about the project since he got better and is complaining about how overworked he is. Zach seems to be equally busy. If no one steps up by the end of the week, I think it’s time to either take over the project entirely or shut it down.


  • Started working on Antonio’s changes
  • Changed the MappApp so that the trajectory lines are blue


  • Finish CNN chapter
  • Enable Tensorflow profiling
    • Installed the plugin: pip install tensorboard_plugin_profile
    • Updated setup_tensorboard():
      def setup_tensorboard(dir_str: str, windows_slashes:bool = True) -> List:
          if windows_slashes:
              dir_str = dir_str.replace("/", "\\")
              print("no file {} at {}".format(dir_str, os.getcwd()))
          # use TensorBoard, princess Aurora!
          callbacks = [tf.keras.callbacks.TensorBoard(log_dir=dir_str, profile_batch = '500,510')]
          return callbacks
  • Huh. Looks like scipy.misc.imresize() and scipy.misc.imread() are both deprecated and out of the library. Trying opencv
    • pip install opencv-python
    • Here’s how I did it, with some debugging to varify that everything was working correctly thrown in:
      img_names = ['cat.jpg', 'steam-locomotive.jpg']
      img_list = []
      for name in img_names:
          img = cv2.imread(name)
          res = np.array(cv2.resize(img, dsize=(32, 32), interpolation=cv2.INTER_CUBIC))
          cv2.imwrite(name.replace(".jpg","_32x32.jpg"), res)
      imgs = np.transpose(img_list, (0, 2, 1, 3))
      imgs = np.array(img_list) / 255
  • This forced me to go down a transpose() in multiple dimensions rabbit hole that’s worth documenting. First, here’s code that takes some tiny images in an array and transposes them:
    import numpy as np
    img_list = [
        # image 1
        [[[10, 20, 30],
          [11, 21, 31],
          [12, 22, 32],
          [13, 23, 33]],
         [[255, 255, 255],
          [48, 45, 58],
          [101, 150, 205],
          [255, 255, 255]],
         [[255, 255, 255],
          [43, 56, 75],
          [77, 110, 157],
          [255, 255, 255]],
         [[255, 255, 255],
          [236, 236, 238],
          [76, 104, 139],
          [255, 255, 255]]],
        # image 2
        [[[100, 200, 300],
          [101, 201, 301],
          [102, 202, 302],
          [103, 203, 303]],
         [[159, 146, 145],
          [89, 74, 76],
          [207, 207, 210],
          [212, 203, 203]],
         [[145, 155, 164],
          [52, 40, 36],
          [166, 160, 163],
          [136, 132, 134]],
         [[61, 56, 60],
          [36, 32, 35],
          [202, 195, 195],
          [172, 165, 177]]]]
    np_imgs = np.array(img_list)
    print("np_imgs shape = {}".format(np_imgs.shape))
    imgs = np.transpose(img_list, (0, 2, 1, 3))
    print("imgs shape = {}".format(np_imgs.shape))
    #imgs = np.array(imgs) / 255
    print("pix 0: \n{}".format(np_imgs[0]))
    print("transposed pix 0: \n{}".format(imgs[0]))
    print("pix 1: \n{}".format(np_imgs[1]))
    print("transposed pix 1: \n{}".format(imgs[1]))
  • So, this is a complex matrix, with a shape of (2, 4, 4, 3). What we want to do is rotate the images (the inner 4, 4) by 90 degrees by transposing them. The way to understand Numpy’s transpose is that it interchanges two axis. The trick is understanding how.
  • For this matrix, applying a transpose that does nothing means writing this:
    imgs = np.transpose(img_list, (0, 1, 2, 3))
  • Think of it as an identity transpose. What we want to do is reverse the order of the inner 4, 4, which we do like this:
    imgs = np.transpose(img_list, (0, 2, 1, 3))
  • That’s it! Now the second “4” will be transposed with the first “4”. You can do this with any of the elements. So
    imgs = np.transpose(img_list, (3, 2, 1, 0))
  • Reverses everything!
  • Ok, so things are working, but the results are crap. Not really worrying about it for now because it’s CFAR and I always have this problem:
    ./images\airplane.jpg = [8] ship
    ./images\automobile.jpg = [0] airplane 
    ./images\bird.jpg = [4] deer
    ./images\cat.jpg = [0] airplane 
    ./images\cat2.jpg = [6] frog
    ./images\cat3.jpg = [8] ship
    ./images\deer.jpg = [8] ship
    ./images\dog.jpg = [2] bird
    ./images\horse.jpg = [8] ship
    ./images\ship.jpg = [0] airplane 
    ./images\steam-locomotive.jpg = [2] bird
    ./images\truck.jpg = [3] cat
    [8 0 4 0 6 8 8 2 8 0 2 3]


  • Meeting

Phil 4.17.20

Can You Beat COVID-19 Without a Lockdown? Sweden Is Trying

I dug into the predictions that we generate of Comparing Finland, Norway, and Sweden, it looks like something that Sweden did could result in about 2,600 people dying that don’t have to:




  • IRS proposal – done!
  • A better snippet: the best way to cheat on taxes is  to deliberately lie to the IRS about what you earned over a year, what you spent over a year, and the ways you would fill out those forms. This is where “time of year” really comes into play. The IRS assumes you worked on April 15 through the 15th of the following year in order to report and pay taxes on your actual income from April 15 through the following year. I’ve put some pictures and thoughts below. There are some really great readers who have put some excellent guides and resources out there on this topic. If you have any additional questions, please feel free to leave a comment below and I will do my best to answer them.
  • Another good snippet: The best way to cheat on taxes is  to set up an LLC or other tax-sheltered company that makes up for your sloth in paying business taxes. By doing this, you can deduct the business expenses and pay your taxes at a much lower tax rate, while also getting a tax refund. So, for example, if your net operating income for 2014 was $5,000 and you think you should owe about $2,000 in taxes for 2015, I suggest you set up a  S-Corporation   for 2015 that only owes $500 in taxes. Then, you can send the IRS a check for the difference between the $2,000 difference you owe them and the $5,000 net operating income for 2015.


  • Finish first pass? Done! And sent to Antonio!


Shortcut Learning in Deep Neural Networks

  • Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today’s machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning’s problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

Phil 4.14.20

Fix siding from wind!


  • I want to try taking a second derivative of the rates to see what it looks like. There may be common features in the pattern of rates, or of the slopes of the regressions changing over time
  • I’m also getting worried about countries that don’t report well. I’d like to be able to use rates from neighboring countries as some kind of check
  • Work with Zach on cleanup and map integration?

COVID Twitter

  • Finished ingesting the new data. It took almost 24 hours


  • Finished first pass of the introduction. Still at 14 pages


Phil 4.13.20

That was a very solitary weekend. I fixed some bikes, planted some herbs and vegetables, cleaned house, and procrastinated about pretty much everything else. I pinged Don and Wayne about D20 ideas, and got a ping for more info from Don, then silence. Everyone seems to be wrapped up tight in their worlds.

And for good reason. Maryland is looking grim:



  • Worked with Zach to get states in. It’s working!


COVID Twitter

  • Went looking for new data to ingest, but didn’t see anything new? It wasn’t there yet. Ingesting now
  • 1:30 Meeting


  • Reading through paper and pulling out all the parts from Simple Trick
  • Ping Antonio to let him know I’m working


  • Get absolute queries working in InfluxDB2. It took some looking, but here’s an example from the API reference on range(). Done!
    • Everything is in GMT. As usual, the parser is picky about the format, which is ISO-8601:
      range_args = "start:2020-04-13T13:30:00Z, stop:2020-04-13T13:30:10Z"
  • Start on TF2/GANs for converting square waves to noisy sin waves of varying frequencies using saved InfluxDB data
    • First, pull a square, sin, and noisy sin and plot using matplotlib so we know we have good vectors. Success!



Phil 4.3.20

Temp is up a bit this morning, which, of course, I’m overreacting to.

Need to get started on State information from here:

Generated some favicons from here:, which, of course we didn’t use

Getting close to something that we can release


  • Update Linux on laptop and try Influx there. Nope. The laptop is hosed. hosed
  • Grabbing another computer to configure. I mean, worst case, I can set up the work laptop as an Ubuntu box. I’d love to know if Influx would work FIRST, though. Looks like I have to. My old dev box won’t boot. Backing up.
  • Installed Debian on the work laptop. It seems to be booting? Nope:
  • I guess we’ll try Ubuntu again? Nope. Trying one more variant.
  • Trying lubuntu. It uses different drivers for some things, and so far hasn’t frozen or blocked yet. It works!
  • And now the Docker version (docker run –name influxdb -p 9999:9999 works too. Maybe because the system got upgraded?
  • 11:00 IRAD Meeting
    • Send note about NOAA being a customer for simulated anomalies for machine learning

Phil 3.31.2020

I need to go grocery shopping today. A friend of mine has come down with the virus. He’s in his 30’s, and I’m feeling vulnerable. I went down to the shop and dug up my painting masks. Turns out I have a few, so that’s what I’m going shopping with. Here’s why, from the NY Times:

When researchers conducted systematic review of a variety of interventions used during the SARS outbreak in 2003, they found that washing hands more than 10 times daily was 55 percent effective in stopping virus transmission, while wearing a mask was actually more effective — at about 68 percent. Wearing gloves offered about the same amount of protection as frequent hand-washing, and combining all measures — hand-washing, masks, gloves and a protective gown — increased the intervention effectiveness to 91 percent.

Podcast with BBC’s misinformation reporter:


  • A friend of mine who works in Whitehall has told me that the army are going to be on the streets this week arresting people who don’t listen to this podcast. If that sounds familiar, you’ll be aware that this crisis has already been fertile ground for disinformation. Marianna Spring is a BBC specialist reporter covering disinformation and social media. In this fascinating interview, Marianna reveals how disinformation and misinformation gets so widely shared, why we share it, how to spot it, what the trends are, how it differs around the world and so much more. This is a brilliant insight not just into the sharing of inaccurate information, but into human behaviour.



  • Changed the calculations from the linear regression to handle cases where the virus is under control, like China – first pass is done
  • Have the linear regression only go back some number of weeks/months. I’m worried about missing a second wave
  • Need to add a disclaimer about the quality of the predictions is dependent on the quality of the data, and that we expect that as poorer countries come online, these trends may be erratic and inaccurate.
  • Add an UNSET state. The ETS will only set the state if it is UNSET. This lets regression predictions to be used until the ETS is working well – done
  • I think showing the linear and ETS mean prediction is a good way to start including ETS values
  • Found the page that shows how to adjust parameters:


  • Try to create an image from the stored tar
  • Start setting up InfluxDB2

IRAD Meeting at 2:00

ML Group at 4:00

  • Put together a list of potential papers to present. No need, I’ll do infinitely wide networks
  • Had just a lovely online evening of figuring out how to use some (terrible!) webex tools, and trying to figure out Neural ODEs. It was an island of geeky normalcy for a few hours. This may be a more comprehensible writeup.

Phil 3.20.20

Yesterday, I looked at the confirmed cases from this dataset. Today, I thought I’d look at the death rates. These are actually from yesterday. Maybe I’ll update at the end of the day. Everything is in a logarithmic scale because it’s impossible to tell the difference between one crazy exponential rate and another (It may be small-world power law as well, as per here). This is also with China excluded:

I mean, that’s not a good picture. I can see why California went on full non-essential lockdown today – we seem to be on the same trajectory as Iran, assuming the difference in slope is not related to manipulated or poorly-gathered information. South Korea, as per reports, really has appeared to adjust the trajectory. Note though, that the adjusted curve still seems to be exponential, but at a lower value.

My sense right now is that the economic impacts (however those would be charted) are going to look similar, with some kind of time delay that relates to spare capacity, like savings. My sense is that this is going to be bigger than the 2008 financial meltdown, but maybe in some kind of slow motion?

Since I can work from home, and work on government contracts, I’ve been sending money to food banks and similar charities. Hopefully, the best ways to contribute will become clear as the situation settles into the new “normal”. For some more thinking on the economic impact, there’s a short interview with John Ioannidis, who wrote in this article:

One of the bottom lines is that we don’t know how long social distancing measures and lockdowns can be maintained without major consequences to the economy, society, and mental health. Unpredictable evolutions may ensue, including financial crisis, unrest, civil strife, war, and a meltdown of the social fabric. 

“A fiasco in the making? As the coronavirus pandemic takes hold, we are making decisions without reliable data” – StatNews, 3/17/2020

I tend to agree that the world at large is focusing on one, large immediate problem when it needs to be focusing on two large immediate problems. And that’s probably too much to expect.

8:00 – 4:30 ASRC GOES

  • More interesting use of ML to enhance simulations: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
    • We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,ϕ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.
  • Let’s see if we can get InfluxDB working in Docker and start to generate and store data
  • I found a wonderful thing! It looks like you can change the default settings for where applications and their data are saved! Here’s a screenshot of where in the settings:

Phil 3.19.20

I found the data sources for the dashboard in the previous few posts. Yes, everything still looks grim:

So rather than working on my dissertation, I thought I’d take a look at the data for the last 9(!) days in Excel:

This is for the USA. The data is sorted based on the cumulative total of new cases confirmed. If you look at the chart on the right, everything is in line with a pandemic in exponential growth. However, that’s not the whole story.

I like to color code the cells in my spreadsheets because colors help me visualize patterns in the data that I wouldn’t otherwise see. And one of the things that really stands out here is the red rows with one yellow cell on the left. These are all cases where the rate of confirmed new cases dropped to zero overnight. And they’re not near each other. They are in WA, NY, and CA. Is this a measuring problem or is something going right in these places?

Maybe we’ll find out more in the next few days. Now that I know how to get the data, I can do some of my own visualizations that look for outliers. I can also train up some sequence-to-sequence ML models to extrapolate trends.

One more thing. I had heard earlier (Twitter, I think?) that Vietnam was handling the crisis well. And it looks like it was, but things are back to being bad:

Ok, back to work

8:00 – 4:30 ASRC PhD, GOES

  • Working on the process section – done!
  • Working on the TACJ bookend – done! Made a new figure:
  • Submitted to Wayne. Here’s hoping it doesn’t fall through the cracks
  • Neuroevolution of Self-Interpretable Agents
    • Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task-critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.