Category Archives: Writing

Phil 4.28.20

ACSOS

  • Upload paper to Overleaf – done!

D20

  • Fix bug using this:
    slope, intercept, r_value, p_value, std_err = stats.linregress(xsub, ysub)
    # slope, intercept = np.polyfit(x, y, 1)
    yn = np.polyval([slope, intercept], xsub)
    
    steps = 0
    if slope < 0:
        steps = abs(y[-1] / slope)
    
    reg_x = []
    reg_y = []
    start = len(yl) - max_samples
    yval = intercept + slope * start
    for i in range(start, len(yl)-offset):
        reg_x.append(i)
        reg_y.append(yval)
        yval += slope
  • Anything else?

GPT-2 Agents

  • Install and test GPT-2 Client
  • Failed spectacularly. It depends on a lot of TF1.x items, like tensorflow.contrib.training. There is an issue request in.
  • Checked out the project to see if anything could be done. “Fixed” the contrib library, but that just exposed other things. Uninstalled.
  • Tried using the upgrade tool described here, which did absolutely nothing, as near as I can tell

GOES

  • Continue figuring out GANs
  • Here are results using 2 latent dimensions, a matching hint, a line hint, and no hint
  • Here are results using 5 latent dimensions, a matching hint, a line hint, and no hint
  • Meeting at 10:00 with Vadim and Isaac
    • Wound up going over Isaac’s notes for Yaw Flip and learned a lot. He’s going to see if he can get the algorithm used for the maneuver. If so, we can build the control behavior around that. The goal is to minimize energy and indirectly fuel costs

 

Phil 4.24.20

It is very wet today

radar

Spent far too much time trying to upload a picture to the graduation site. It appears to be broken

D20

  • Changed the CONTROLLED days to < 2, since things are generally looking better

ACSOS

  • Sent the revised draft to Antonio

GPT-2 Agents

  • Found what appears to be just what I’m looking for. Searching on GitHub for GPT-2 tensorflow led me to this project, GPT-2 Client. I’ll give that a try and see how it works. The developer, Rishabh Anand seems to have solid skills so I have some hope that this could work. I do  not have the energy to start this on a Friday and then switch to GANs for the rest of the day. Sunday looks like another wet one, so maybe then.

GOES

block_3_conv2More looking at layers. This is Imagenet’s block3_conv3

  • Advanced CNNs
  • Start GANS? Yes!
    • Got this version working. Now I need to step through it. But here are some plots of it learning:
    • I had dreams about this, so I’m going to record the thinking here:
      • An MLP should be able to get from a simple simulation (square wave) to a more accurate(?) simulation sin wave. The data set is various start points and frequency queries into the DB, with matching (“real”/noisy) as the test. My intuition is that the noise will be lost, so that’s the part we’re going to have to get back with the GAN.
      • So I think there is a two-step process
        • Train the initial NN that will produce the generalized solution
        • Use the output of the NN and the “real” data to train the GAN for fine tuning

Phil 4.23.20

Transformer Architecture: The Positional Encoding

  • In this article, I don’t plan to explain its architecture in depth as there are currently several great tutorials on this topic (herehere, and here), but alternatively, I want to discuss one specific part of the transformer’s architecture – the positional encoding.

D20

  • Add centroids for states – done
  • Return the number of neighbors as an argument – done
  • Chatted with Aaron and Zach. More desire to continue than abandon

ACSOS

  • More revisions. Swap steps for discussion and future work

GOES

    • IRS proposal went in yesterday
    • Continue with GANs
    • Using the VGG model now with much better results. Also figured out how to loads weights and read the probabilities in the output layer: vgg
    • Same thing using the pre-trained model from Keras:
      from tensorflow.keras.applications.vgg16 import VGG16
      # prebuild model with pre-trained weights on imagenet
      model = VGG16(weights='imagenet', include_top=True)
      model.compile(optimizer='sgd', loss='categorical_crossentropy')

      vggPretrained

    • Trying to visualize a layer using this code. And using that code as a starting point, I had to explore how to slice up the tensors in the right way. A CNN layer has a set of “filters” that contain a square set of pixels. The data is stored as an array of pixels at each x, y, coordinate, so I had to figure out how to get one image at a time. Here’s my toy:
      import numpy as np
      import matplotlib.pyplot as plt
      
      n_rows = 4
      n_cols = 8
      depth = 4
      
      my_list = []
      
      for r in range(1, n_rows):
          row = []
          my_list.append(row)
          for c in range(1, n_cols):
              cell = []
              row.append(cell)
              for d in range(depth):
                  cell.append(d+c*10+r*100)
      
      print(my_list)
      nl = np.array(my_list)
      for d in range(depth):
          print("\nlayer {} = \n{}".format(d, nl[:, :, d]))
          plt.figure(d)
          plt.imshow(nl[:, :, d], aspect='auto', cmap='plasma')
      
      plt.show()
    • This gets features from a cat image at one of the pooling layers. The color map is completely arbitrary:
      # get the features from this block
      features = model.predict(x)
      print(features.shape)
      farray = np.array(features[0])
      print("{}".format(farray[:, :, 0]))
      
      for d in range(4):
         plt.figure(d)
         plt.imshow(farray[:, :, d], aspect='auto', cmap='plasma')
    • But we get some cool pix!

Phil 4.20.20

GOES

  • Reading the Distill article on Gaussian processes (highlighted page here)
  • Copy over neural-tangents code from notebook to IDE
  • Working on regression
  • Ran into a problem with Tensorboard
    Traceback (most recent call last):
      File "d:\program files\python37\lib\runpy.py", line 193, in _run_module_as_main
        "__main__", mod_spec)
      File "d:\program files\python37\lib\runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "D:\Program Files\Python37\Scripts\tensorboard.exe\__main__.py", line 7, in 
      File "d:\program files\python37\lib\site-packages\tensorboard\main.py", line 75, in run_main
        app.run(tensorboard.main, flags_parser=tensorboard.configure)
      File "d:\program files\python37\lib\site-packages\absl\app.py", line 299, in run
        _run_main(main, args)
      File "d:\program files\python37\lib\site-packages\absl\app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 289, in main
        return runner(self.flags) or 0
      File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 305, in _run_serve_subcommand
        server = self._make_server()
      File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 409, in _make_server
        self.flags, self.plugin_loaders, self.assets_zip_provider
      File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 183, in standard_tensorboard_wsgi
        flags, plugin_loaders, data_provider, assets_zip_provider, multiplexer
      File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 272, in TensorBoardWSGIApp
        tbplugins, flags.path_prefix, data_provider, experimental_plugins
      File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 345, in __init__
        "Duplicate plugins for name %s" % plugin.plugin_name
    ValueError: Duplicate plugins for name projector
  • After poking around a bit online with the “Duplicate plugins for name %s” % plugin.plugin_name ValueError: Duplicate plugins for name projector, I found this diagnostic, which basically asked me to reinstall everything*. That didn’t work, so I went into the Python37\Lib\site-packages and deleted by hand. Tensorboard now runs, but now I need to upgrade my cuda so that I have cudart64_101.dll
    • Installed the minimum set of items from the Nvidia Package Launcher (cuda_10.1.105_418.96_win10.exe)
    • Installed the cuDNN drivers from here: https://developer.nvidia.com/rdp/cudnn-download
    • The regular (e.g. MNIST) demos work byt when I try the distribution code I got this error: tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘NcclAllReduce’. It turns out that there are only two viable MirroredStrategy operations, for windows, and the default is not one of them. These are the valid calls:
      distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
      distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
    • And this call is not
      # distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce()) # <-- not valid for Windows
  • Funny thing. After reinstalling and getting everything to work, I tried the diagnostic again. It seems it always says to reinstall everything
  • And Tensorboard is working! Here’s the call that puts data in the directory:
    linear_est = tf.estimator.LinearRegressor(feature_columns=feature_columns, model_dir = 'logs/boston/')
  • And when launched on the command line pointing at the same directory:
    D:\Development\Tutorials\Deep Learning with TensorFlow 2 and Keras\Chapter 3>tensorboard --logdir=.\logs\boston
    2020-04-20 11:36:42.999208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
    W0420 11:36:46.005735 18544 plugin_event_accumulator.py:300] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
    W0420 11:36:46.006743 18544 plugin_event_accumulator.py:312] Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
    Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
    TensorBoard 2.1.1 at http://localhost:6006/ (Press CTRL+C to quit)
  • I got this! tensoboard
  • Of course, we’re not done yet. When attempting to use the Keras callback, I get the following error: tensorflow.python.eager.profiler.ProfilerNotRunningError: Cannot stop profiling. No profiler is running. It turns out that you have to specify the log folder like this
      • command line:
        tensorboard --logdir=.\logs
      • in code:
        logpath = '.\\logs'

         

     

  • That seems to be working! RunningTBNN
  • Finished regression chapter

ASRC

  • Submitted RFI response for review

ACSOS

  • Got Antonio’s comments back

D20

  • Need to work on the math to find second bumps
    • If the rate has been < x% (maybe 2.5%), calculate an offset that leaves a value of 100 for each day. When the rate jumps more than y% (e.g. 100 – 120 = 20%), freeze that number until the rate settles down again and repeat the process
    • Change the number of samples to be the last x days
  • Work with Zach to get maps up?

ML seminar

Phil 4.17.20

Can You Beat COVID-19 Without a Lockdown? Sweden Is Trying

I dug into the predictions that we generate of daystozero.org. Comparing Finland, Norway, and Sweden, it looks like something that Sweden did could result in about 2,600 people dying that don’t have to:

FinNorSwe

D20

ASRC

  • IRS proposal – done!
  • A better snippet: the best way to cheat on taxes is  to deliberately lie to the IRS about what you earned over a year, what you spent over a year, and the ways you would fill out those forms. This is where “time of year” really comes into play. The IRS assumes you worked on April 15 through the 15th of the following year in order to report and pay taxes on your actual income from April 15 through the following year. I’ve put some pictures and thoughts below. There are some really great readers who have put some excellent guides and resources out there on this topic. If you have any additional questions, please feel free to leave a comment below and I will do my best to answer them.
  • Another good snippet: The best way to cheat on taxes is  to set up an LLC or other tax-sheltered company that makes up for your sloth in paying business taxes. By doing this, you can deduct the business expenses and pay your taxes at a much lower tax rate, while also getting a tax refund. So, for example, if your net operating income for 2014 was $5,000 and you think you should owe about $2,000 in taxes for 2015, I suggest you set up a  S-Corporation   for 2015 that only owes $500 in taxes. Then, you can send the IRS a check for the difference between the $2,000 difference you owe them and the $5,000 net operating income for 2015.

ASCOS

  • Finish first pass? Done! And sent to Antonio!

shortcuts

Shortcut Learning in Deep Neural Networks

  • Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today’s machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning’s problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

Phil 4.16.20

Fix siding!

SageMathMore on SageTex here

D20

  • Playing around with something to indicate the linear fit to the data. Trying P value
  • Updated UI code so that the P value will display on the next build
  • Hopefully we try the world map code today?

GOES

IMDB_embedding

  • Learning more about multiple inputs to embedding and had to get the keras.utils.plot_model working, which failed with this error: ImportError: Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work. So I pip installed both, and had the same problem.
  • Had problems running the distribution samples. Upgraded tf to version 2.1. No problems and better performance
  • Finished chapter 2

ACSOS

  • Struggled with picture placement. Moving on.
  • Finished first pass. I need to add more ABM text, but I’m down to 10 pages plus references!

Multi-input and multi-output models

  • Here’s a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams. Let’s consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc. The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.

 

Phil 4.15.20

Fix siding from wind!

D20

  • Talked to Aaron about taking a derivative of the regression slope to see what it looks like. There may be common features in the pattern of rates, or of the slopes of the regressions changing over time
  • Still worried about countries that don’t report well. I’d like to be able to use rates from neighboring countries as some kind of check
  • Got the first pass on a world map json file done
  • Spread of SARS-CoV-2 in the Icelandic Population
    • As of April 4, a total of 1221 of 9199 persons (13.3%) who were recruited for targeted testing had positive results for infection with SARS-CoV-2. Of those tested in the general population, 87 (0.8%) in the open-invitation screening and 13 (0.6%) in the random-population screening tested positive for the virus. In total, 6% of the population was screened. Most persons in the targeted-testing group who received positive tests early in the study had recently traveled internationally, in contrast to those who tested positive later in the study. Children under 10 years of age were less likely to receive a positive result than were persons 10 years of age or older, with percentages of 6.7% and 13.7%, respectively, for targeted testing; in the population screening, no child under 10 years of age had a positive result, as compared with 0.8% of those 10 years of age or older. Fewer females than males received positive results both in targeted testing (11.0% vs. 16.7%) and in population screening (0.6% vs. 0.9%). The haplotypes of the sequenced SARS-CoV-2 viruses were diverse and changed over time. The percentage of infected participants that was determined through population screening remained stable for the 20-day duration of screening.

ACSOS

  • Finished first pass of the lit review. Now at 13 pages

GOES

  • Start looking at GANs. Also work on fixing Optevolver for multiple CPUs
    • Starting Deep Learning with TensorFlow 2 and Keras: Regression, ConvNets, GANs, RNNs, NLP, and more with TensorFlow 2 and the Keras API, 2nd Edition. Chapter six is GANs, which is what I’m interested in, but I’m ok with getting some review in first.
    • Working on embeddings with the IMDB sentiment analysis project. It’s the first time I’ve seen an embedding layer which is 1) Cool, and 2) Something to play with. I’d noticed when I was working with Word2Vec for my research that embeddings didn’t seem to change shape much as a function of the number of dimensions. It seemed like a lot of information was being kept at very low dimensions, like three, rather than the more accepted 128 or so:

place-embeddings

    • Well, this example gave me an opportunity to test that with some accuracy numbers. Here’s what I get:

EmbeddingDimensions

    • That is super interesting. It basically means that model building, testing, and visualization can happen at low dimensions. That makes everything faster, and with about a 10% improvement likely as one of the last steps.
    • Continuing with book.
  • Wrote up a response to Mike M’s questions about the white paper. Probably pointless, and has pretty much wasted my afternoon. And it was pointless! Now what?
  • Slides for John?

Phil 4.14.20

Fix siding from wind!

D20

  • I want to try taking a second derivative of the rates to see what it looks like. There may be common features in the pattern of rates, or of the slopes of the regressions changing over time
  • I’m also getting worried about countries that don’t report well. I’d like to be able to use rates from neighboring countries as some kind of check
  • Work with Zach on cleanup and map integration?

COVID Twitter

  • Finished ingesting the new data. It took almost 24 hours

ACSOS

  • Finished first pass of the introduction. Still at 14 pages

GOES

Phil 4.13.20

That was a very solitary weekend. I fixed some bikes, planted some herbs and vegetables, cleaned house, and procrastinated about pretty much everything else. I pinged Don and Wayne about D20 ideas, and got a ping for more info from Don, then silence. Everyone seems to be wrapped up tight in their worlds.

And for good reason. Maryland is looking grim:

Maryland_4_13_2020

D20

  • Worked with Zach to get states in. It’s working!

D20USA

COVID Twitter

  • Went looking for new data to ingest, but didn’t see anything new? It wasn’t there yet. Ingesting now
  • 1:30 Meeting

ACSOS

  • Reading through paper and pulling out all the parts from Simple Trick
  • Ping Antonio to let him know I’m working

GOES

  • Get absolute queries working in InfluxDB2. It took some looking, but here’s an example from the API reference on range(). Done!
    • Everything is in GMT. As usual, the parser is picky about the format, which is ISO-8601:
      range_args = "start:2020-04-13T13:30:00Z, stop:2020-04-13T13:30:10Z"
  • Start on TF2/GANs for converting square waves to noisy sin waves of varying frequencies using saved InfluxDB data
    • First, pull a square, sin, and noisy sin and plot using matplotlib so we know we have good vectors. Success!

Waveforms

Fika

Phil 4.3.20

Temp is up a bit this morning, which, of course, I’m overreacting to.

Need to get started on State information from here: https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv

Generated some favicons from here: https://favicon.io/favicon-generator/, which, of course we didn’t use

Getting close to something that we can release

GOES:

  • Update Linux on laptop and try Influx there. Nope. The laptop is hosed. hosed
  • Grabbing another computer to configure. I mean, worst case, I can set up the work laptop as an Ubuntu box. I’d love to know if Influx would work FIRST, though. Looks like I have to. My old dev box won’t boot. Backing up.
  • Installed Debian on the work laptop. It seems to be booting? Nope:
  • I guess we’ll try Ubuntu again? Nope. Trying one more variant.
  • Trying lubuntu. It uses different drivers for some things, and so far hasn’t frozen or blocked yet. It works!
  • And now the Docker version (docker run –name influxdb -p 9999:9999 quay.io/influxdb/influxdb:2.0.0-beta) works too. Maybe because the system got upgraded?
  • 11:00 IRAD Meeting
    • Send note about NOAA being a customer for simulated anomalies for machine learning

Phil 4.1.20

Working from home has a different rhythm. I work in segments with home chores mixed in. Today I’m doing this at 6:00, along with some coding. Then some morning exercise, breakfast, and work till noon. Ride, lunch and more work till about 3:00. By that time my brain is broken, and I take a break and do light chores. Today I may finally get my road bike ready for spring. Then simple work like commenting for a few hours. In the evenings I find I like watching shows about competent people fixing things and making them better. Bitchin’ Rides is extremely soothing.

D20:

  • Fixing dates
  • integrating the estimated deaths from rate and current deaths as area under the curve until zero.
  • Work on documentation. Also make sure word wrap works
  • This. Is. Bad.

Italy_4_1_2020

  • Once more, this is Italy. What I’ve done is round-tripped the rates to produce an estimate of total deaths. If calculating rates is taking the derivative, calculating a death prediction is integration. So, if the calculations are right, and Italy is at zero new deaths around April 17th, the toll is around 27 thousand total deaths. That’s 0.04% of the population. If those numbers hold for the US at 327 million, that’s a total of 145,550. The White House is estimating numbers of 100,000 to 240,000, which means their average prediction is that we will fare worse than Italy.
  • Fixed bugs, worked with Zach, made progress. Aaron is starting to appear again!

GOES

  • Tweak John’s slides
  • More on saving and restoring docker containers. I think I’m close. Then install InfluxDB and test if I can see the dashboard
  • Still having problems. I can create, run, add, delete, and tag the images, but I can’t run them. I think I’m getting ahead of myself. Back to reading

teminal

So it turns out that I was doing everything right but the load. Here’s how it works

  1. docker run container -it –name imagename some-os /bin/sh
  2. Install what needs to be installed. Poke around, save things, etc
  3. docker container commit imagename modified-os
  4. docker save modified-os> modified-os.tar
  5. docker rmi modified-os
  6. docker load < modified-os.tar
  7. docker run container -it –name imagename modified-os /bin/sh

teminal2

 

 

Phil 3.30.20

Today’s study in contrasts: Italy and the US:

COVID-19 projections for the US, from the The Institute for Health Metrics and Evaluation (IHME):

IHME

Work on converting the ETS json file into spreadsheets to evaluate thresholds and labels – spreadsheet conversion is working. done! Now I need to figure out what those ETS parameters do!

Add a short bit to the D20 writeup that explains why linear interpolation isn’t the best option, and why we went with ETS – done

Work with Zach to get the website up today – working

Work this article into the exploit-space writeup: Why Is Cybersecurity Not a Human-Scale Problem Anymore?. Wow, actually, the company (Balbix) that was founded by the author (Gaurav Banga) seems to be doing most of what I was going to write about. Sent Darren a note to see if I should continue

Got a note from ProQuest saying my file needed to have blank pages at the beginning and end of the document. Fixed. And accepted!

  • Congratulations. Your submission, xxxxx has cleared all of the necessary checks and will soon be delivered to ProQuest for publishing.

Ok, back to Docker and building an InfluxDB image. Wow, that seems like a lifetime ago I was doing this

  • To save a custom image, create the container from a base image and then docker save image_name > image_name.tar. This puts it wherever you run the command in the system, Linux or Windows

#COVID-19 meeting at 1:30 today – proposal’s in. We have twitter data from January

SDaaS meeting at 4:00 today – postponed

Phil 3.12.20

7:00 – 6:00 ASRC GOES

Phil 3.11.20

7:00 – 5:00 ASRC GOES

  • A couple more paragraphs in the revisions
  • Working on the SDaaS paper. Getting close to finished
  • Mission meeting
    • Update status to delay deliverables
    • Still waiting on data
    • Simulation running – demo tomorrow
    • Evaluate against known yaw flip
    • White papers for John D
    • 20 sims so far
    • Need to install Influx, dammit!
    • Paragraph on 400 hrs
    • Paragraph on schedule
  • Sent Erik paragraphs

Phil 3.10.20

7:00 ASRC PhD

  • Good chat with Aaron M last night. I’ve incorporated comments into the new chapter
  • Put together a Saudi #corona doc
  • Meeting with Don today at 1:00
  • ML group in Hampden today

GOES

  • More SDaaS paper. Maybe finish first draft today?