Category Archives: Writing

Phil 4.28.20

ACSOS

Upload paper to Overleaf – done!

D20

Fix bug using this:

slope, intercept, r_value, p_value, std_err = stats.linregress(xsub, ysub)
# slope, intercept = np.polyfit(x, y, 1)
yn = np.polyval([slope, intercept], xsub)

steps = 0
if slope < 0:
    steps = abs(y[-1] / slope)

reg_x = []
reg_y = []
start = len(yl) - max_samples
yval = intercept + slope * start
for i in range(start, len(yl)-offset):
    reg_x.append(i)
    reg_y.append(yval)
    yval += slope

Anything else?

GPT-2 Agents

Install and test GPT-2 Client
Failed spectacularly. It depends on a lot of TF1.x items, like tensorflow.contrib.training. There is an issue request in.
Checked out the project to see if anything could be done. “Fixed” the contrib library, but that just exposed other things. Uninstalled.
Tried using the upgrade tool described here, which did absolutely nothing, as near as I can tell

GOES

Continue figuring out GANs
Here are results using 2 latent dimensions, a matching hint, a line hint, and no hint
Here are results using 5 latent dimensions, a matching hint, a line hint, and no hint
Meeting at 10:00 with Vadim and Isaac
- Wound up going over Isaac’s notes for Yaw Flip and learned a lot. He’s going to see if he can get the algorithm used for the maneuver. If so, we can build the control behavior around that. The goal is to minimize energy and indirectly fuel costs

Phil 4.24.20

It is very wet today

radar

Spent far too much time trying to upload a picture to the graduation site. It appears to be broken

D20

Changed the CONTROLLED days to < 2, since things are generally looking better

ACSOS

Sent the revised draft to Antonio

GPT-2 Agents

Found what appears to be just what I’m looking for. Searching on GitHub for GPT-2 tensorflow led me to this project, GPT-2 Client. I’ll give that a try and see how it works. The developer, Rishabh Anand seems to have solid skills so I have some hope that this could work. I do not have the energy to start this on a Friday and then switch to GANs for the rest of the day. Sunday looks like another wet one, so maybe then.

GOES

More looking at layers. This is Imagenet’s block3_conv3

Advanced CNNs
Start GANS? Yes!
- Got this version working. Now I need to step through it. But here are some plots of it learning:

- I had dreams about this, so I’m going to record the thinking here:
  - An MLP should be able to get from a simple simulation (square wave) to a more accurate(?) simulation sin wave. The data set is various start points and frequency queries into the DB, with matching (“real”/noisy) as the test. My intuition is that the noise will be lost, so that’s the part we’re going to have to get back with the GAN.
  - So I think there is a two-step process
    - Train the initial NN that will produce the generalized solution
    - Use the output of the NN and the “real” data to train the GAN for fine tuning

Phil 4.23.20

Transformer Architecture: The Positional Encoding

In this article, I don’t plan to explain its architecture in depth as there are currently several great tutorials on this topic (here, here, and here), but alternatively, I want to discuss one specific part of the transformer’s architecture – the positional encoding.

D20

Add centroids for states – done
Return the number of neighbors as an argument – done
Chatted with Aaron and Zach. More desire to continue than abandon

ACSOS

More revisions. Swap steps for discussion and future work

GOES

- IRS proposal went in yesterday
- Continue with GANs
- Using the VGG model now with much better results. Also figured out how to loads weights and read the probabilities in the output layer:
- Same thing using the pre-trained model from Keras:
```
from tensorflow.keras.applications.vgg16 import VGG16
# prebuild model with pre-trained weights on imagenet
model = VGG16(weights='imagenet', include_top=True)
model.compile(optimizer='sgd', loss='categorical_crossentropy')
```
- Trying to visualize a layer using this code. And using that code as a starting point, I had to explore how to slice up the tensors in the right way. A CNN layer has a set of “filters” that contain a square set of pixels. The data is stored as an array of pixels at each x, y, coordinate, so I had to figure out how to get one image at a time. Here’s my toy:
```
import numpy as np
import matplotlib.pyplot as plt

n_rows = 4
n_cols = 8
depth = 4

my_list = []

for r in range(1, n_rows):
    row = []
    my_list.append(row)
    for c in range(1, n_cols):
        cell = []
        row.append(cell)
        for d in range(depth):
            cell.append(d+c*10+r*100)

print(my_list)
nl = np.array(my_list)
for d in range(depth):
    print("\nlayer {} = \n{}".format(d, nl[:, :, d]))
    plt.figure(d)
    plt.imshow(nl[:, :, d], aspect='auto', cmap='plasma')

plt.show()
```
- This gets features from a cat image at one of the pooling layers. The color map is completely arbitrary:
```
# get the features from this block
features = model.predict(x)
print(features.shape)
farray = np.array(features[0])
print("{}".format(farray[:, :, 0]))

for d in range(4):
   plt.figure(d)
   plt.imshow(farray[:, :, d], aspect='auto', cmap='plasma')
```
- But we get some cool pix!

Phil 4.20.20

GOES

Reading the Distill article on Gaussian processes (highlighted page here)
Copy over neural-tangents code from notebook to IDE
Working on regression

Ran into a problem with Tensorboard

Traceback (most recent call last):
  File "d:\program files\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\program files\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Program Files\Python37\Scripts\tensorboard.exe\__main__.py", line 7, in 
  File "d:\program files\python37\lib\site-packages\tensorboard\main.py", line 75, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "d:\program files\python37\lib\site-packages\absl\app.py", line 299, in run
    _run_main(main, args)
  File "d:\program files\python37\lib\site-packages\absl\app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 289, in main
    return runner(self.flags) or 0
  File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 305, in _run_serve_subcommand
    server = self._make_server()
  File "d:\program files\python37\lib\site-packages\tensorboard\program.py", line 409, in _make_server
    self.flags, self.plugin_loaders, self.assets_zip_provider
  File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 183, in standard_tensorboard_wsgi
    flags, plugin_loaders, data_provider, assets_zip_provider, multiplexer
  File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 272, in TensorBoardWSGIApp
    tbplugins, flags.path_prefix, data_provider, experimental_plugins
  File "d:\program files\python37\lib\site-packages\tensorboard\backend\application.py", line 345, in __init__
    "Duplicate plugins for name %s" % plugin.plugin_name
ValueError: Duplicate plugins for name projector

After poking around a bit online with the “Duplicate plugins for name %s” % plugin.plugin_name ValueError: Duplicate plugins for name projector, I found this diagnostic, which basically asked me to reinstall everything*. That didn’t work, so I went into the Python37\Lib\site-packages and deleted by hand. Tensorboard now runs, but now I need to upgrade my cuda so that I have cudart64_101.dll
- Installed the minimum set of items from the Nvidia Package Launcher (cuda_10.1.105_418.96_win10.exe)
- Installed the cuDNN drivers from here: https://developer.nvidia.com/rdp/cudnn-download
- The regular (e.g. MNIST) demos work byt when I try the distribution code I got this error: tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘NcclAllReduce’. It turns out that there are only two viable MirroredStrategy operations, for windows, and the default is not one of them. These are the valid calls:
```
distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
```
- And this call is not
```
# distribution = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.NcclAllReduce()) # <-- not valid for Windows
```
Funny thing. After reinstalling and getting everything to work, I tried the diagnostic again. It seems it always says to reinstall everything

And Tensorboard is working! Here’s the call that puts data in the directory:

linear_est = tf.estimator.LinearRegressor(feature_columns=feature_columns, model_dir = 'logs/boston/')

And when launched on the command line pointing at the same directory:

D:\Development\Tutorials\Deep Learning with TensorFlow 2 and Keras\Chapter 3>tensorboard --logdir=.\logs\boston
2020-04-20 11:36:42.999208: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
W0420 11:36:46.005735 18544 plugin_event_accumulator.py:300] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
W0420 11:36:46.006743 18544 plugin_event_accumulator.py:312] Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.1.1 at http://localhost:6006/ (Press CTRL+C to quit)

I got this!
Of course, we’re not done yet. When attempting to use the Keras callback, I get the following error: tensorflow.python.eager.profiler.ProfilerNotRunningError: Cannot stop profiling. No profiler is running. It turns out that you have to specify the log folder like this
- - command line:
```
tensorboard --logdir=.\logs
```
  - in code:
```
logpath = '.\\logs'
```
That seems to be working!
Finished regression chapter

ASRC

Submitted RFI response for review

ACSOS

Got Antonio’s comments back

D20

Need to work on the math to find second bumps
- If the rate has been < x% (maybe 2.5%), calculate an offset that leaves a value of 100 for each day. When the rate jumps more than y% (e.g. 100 – 120 = 20%), freeze that number until the rate settles down again and repeat the process
- Change the number of samples to be the last x days
Work with Zach to get maps up?

ML seminar

Worked on getting the neural tangents notebook running on my box but jaxlib is not ported to Windows. Sigh. Find another paper.
Maybe this? Bringing Stories Alive: Generating Interactive Fiction Worlds

Phil 4.17.20

Can You Beat COVID-19 Without a Lockdown? Sweden Is Trying

I dug into the predictions that we generate of daystozero.org. Comparing Finland, Norway, and Sweden, it looks like something that Sweden did could result in about 2,600 people dying that don’t have to:

D20

Create distance-based sort lists for countries.
- Get Lat/Lon centroids from country data (gis.stackexchange.com/questions/71921/list-of-central-coordinates-centroid-for-all-countries)
- For each country
  - Find longitude offset
  - Subtract from all countries and wrap if past -180/+180
  - Calculate distance
  - Sort list
- Add to dict and create json file

ASRC

IRS proposal – done!
A better snippet: the best way to cheat on taxes is to deliberately lie to the IRS about what you earned over a year, what you spent over a year, and the ways you would fill out those forms. This is where “time of year” really comes into play. The IRS assumes you worked on April 15 through the 15th of the following year in order to report and pay taxes on your actual income from April 15 through the following year. I’ve put some pictures and thoughts below. There are some really great readers who have put some excellent guides and resources out there on this topic. If you have any additional questions, please feel free to leave a comment below and I will do my best to answer them.
Another good snippet: The best way to cheat on taxes is to set up an LLC or other tax-sheltered company that makes up for your sloth in paying business taxes. By doing this, you can deduct the business expenses and pay your taxes at a much lower tax rate, while also getting a tax refund. So, for example, if your net operating income for 2014 was $5,000 and you think you should owe about $2,000 in taxes for 2015, I suggest you set up a S-Corporation for 2015 that only owes $500 in taxes. Then, you can send the IRS a check for the difference between the $2,000 difference you owe them and the $5,000 net operating income for 2015.

ASCOS

Finish first pass? Done! And sent to Antonio!

Shortcut Learning in Deep Neural Networks

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today’s machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning’s problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

Phil 4.16.20

Fix siding!

Phil 4.15.20

Fix siding from wind!

D20

Talked to Aaron about taking a derivative of the regression slope to see what it looks like. There may be common features in the pattern of rates, or of the slopes of the regressions changing over time
Still worried about countries that don’t report well. I’d like to be able to use rates from neighboring countries as some kind of check
Got the first pass on a world map json file done
Spread of SARS-CoV-2 in the Icelandic Population
- As of April 4, a total of 1221 of 9199 persons (13.3%) who were recruited for targeted testing had positive results for infection with SARS-CoV-2. Of those tested in the general population, 87 (0.8%) in the open-invitation screening and 13 (0.6%) in the random-population screening tested positive for the virus. In total, 6% of the population was screened. Most persons in the targeted-testing group who received positive tests early in the study had recently traveled internationally, in contrast to those who tested positive later in the study. Children under 10 years of age were less likely to receive a positive result than were persons 10 years of age or older, with percentages of 6.7% and 13.7%, respectively, for targeted testing; in the population screening, no child under 10 years of age had a positive result, as compared with 0.8% of those 10 years of age or older. Fewer females than males received positive results both in targeted testing (11.0% vs. 16.7%) and in population screening (0.6% vs. 0.9%). The haplotypes of the sequenced SARS-CoV-2 viruses were diverse and changed over time. The percentage of infected participants that was determined through population screening remained stable for the 20-day duration of screening.

ACSOS

Finished first pass of the lit review. Now at 13 pages

GOES

Start looking at GANs. Also work on fixing Optevolver for multiple CPUs
- Starting Deep Learning with TensorFlow 2 and Keras: Regression, ConvNets, GANs, RNNs, NLP, and more with TensorFlow 2 and the Keras API, 2nd Edition. Chapter six is GANs, which is what I’m interested in, but I’m ok with getting some review in first.
- Working on embeddings with the IMDB sentiment analysis project. It’s the first time I’ve seen an embedding layer which is 1) Cool, and 2) Something to play with. I’d noticed when I was working with Word2Vec for my research that embeddings didn’t seem to change shape much as a function of the number of dimensions. It seemed like a lot of information was being kept at very low dimensions, like three, rather than the more accepted 128 or so:

place-embeddings

- Well, this example gave me an opportunity to test that with some accuracy numbers. Here’s what I get:

EmbeddingDimensions

- That is super interesting. It basically means that model building, testing, and visualization can happen at low dimensions. That makes everything faster, and with about a 10% improvement likely as one of the last steps.
- Continuing with book.
Wrote up a response to Mike M’s questions about the white paper. Probably pointless, and has pretty much wasted my afternoon. And it was pointless! Now what?
Slides for John?

Phil 4.14.20

Fix siding from wind!

D20

I want to try taking a second derivative of the rates to see what it looks like. There may be common features in the pattern of rates, or of the slopes of the regressions changing over time
I’m also getting worried about countries that don’t report well. I’d like to be able to use rates from neighboring countries as some kind of check
Work with Zach on cleanup and map integration?

COVID Twitter

Finished ingesting the new data. It took almost 24 hours

ACSOS

Finished first pass of the introduction. Still at 14 pages

GOES

Start looking at GANs. Also work on fixing Optevolver for multiple CPUs
- Starting Deep Learning with TensorFlow 2 and Keras: Regression, ConvNets, GANs, RNNs, NLP, and more with TensorFlow 2 and the Keras API, 2nd Edition. Chapter six is GANs, which is what I’m interested in, but I’m ok with getting some review in first.
- Downloaded code samples. Turns out I have a Packt account
- Chapter 1!
Slides for John?

Phil 4.13.20

That was a very solitary weekend. I fixed some bikes, planted some herbs and vegetables, cleaned house, and procrastinated about pretty much everything else. I pinged Don and Wayne about D20 ideas, and got a ping for more info from Don, then silence. Everyone seems to be wrapped up tight in their worlds.

And for good reason. Maryland is looking grim:

D20

Worked with Zach to get states in. It’s working!

COVID Twitter

Went looking for new data to ingest, but didn’t see anything new? It wasn’t there yet. Ingesting now
1:30 Meeting

ACSOS

Reading through paper and pulling out all the parts from Simple Trick
Ping Antonio to let him know I’m working

GOES

Get absolute queries working in InfluxDB2. It took some looking, but here’s an example from the API reference on range(). Done!
- Everything is in GMT. As usual, the parser is picky about the format, which is ISO-8601:
```
range_args = "start:2020-04-13T13:30:00Z, stop:2020-04-13T13:30:10Z"
```
Start on TF2/GANs for converting square waves to noisy sin waves of varying frequencies using saved InfluxDB data
- First, pull a square, sin, and noisy sin and plot using matplotlib so we know we have good vectors. Success!

Waveforms

Fika

Phil 4.3.20

Temp is up a bit this morning, which, of course, I’m overreacting to.

Need to get started on State information from here: https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv

Generated some favicons from here: https://favicon.io/favicon-generator/, which, of course we didn’t use

Getting close to something that we can release

GOES:

Update Linux on laptop and try Influx there. Nope. The laptop is hosed.
Grabbing another computer to configure. I mean, worst case, I can set up the work laptop as an Ubuntu box. I’d love to know if Influx would work FIRST, though. Looks like I have to. My old dev box won’t boot. Backing up.
Installed Debian on the work laptop. It seems to be booting? Nope:
I guess we’ll try Ubuntu again? Nope. Trying one more variant.
Trying lubuntu. It uses different drivers for some things, and so far hasn’t frozen or blocked yet. It works!
And now the Docker version (docker run –name influxdb -p 9999:9999 quay.io/influxdb/influxdb:2.0.0-beta) works too. Maybe because the system got upgraded?
11:00 IRAD Meeting
- Send note about NOAA being a customer for simulated anomalies for machine learning

Phil 4.1.20

Working from home has a different rhythm. I work in segments with home chores mixed in. Today I’m doing this at 6:00, along with some coding. Then some morning exercise, breakfast, and work till noon. Ride, lunch and more work till about 3:00. By that time my brain is broken, and I take a break and do light chores. Today I may finally get my road bike ready for spring. Then simple work like commenting for a few hours. In the evenings I find I like watching shows about competent people fixing things and making them better. Bitchin’ Rides is extremely soothing.

D20:

Fixing dates
integrating the estimated deaths from rate and current deaths as area under the curve until zero.
Work on documentation. Also make sure word wrap works
This. Is. Bad.

Once more, this is Italy. What I’ve done is round-tripped the rates to produce an estimate of total deaths. If calculating rates is taking the derivative, calculating a death prediction is integration. So, if the calculations are right, and Italy is at zero new deaths around April 17th, the toll is around 27 thousand total deaths. That’s 0.04% of the population. If those numbers hold for the US at 327 million, that’s a total of 145,550. The White House is estimating numbers of 100,000 to 240,000, which means their average prediction is that we will fare worse than Italy.
Fixed bugs, worked with Zach, made progress. Aaron is starting to appear again!

GOES

Tweak John’s slides
More on saving and restoring docker containers. I think I’m close. Then install InfluxDB and test if I can see the dashboard
Still having problems. I can create, run, add, delete, and tag the images, but I can’t run them. I think I’m getting ahead of myself. Back to reading

teminal

So it turns out that I was doing everything right but the load. Here’s how it works

docker run container -it –name imagename some-os /bin/sh
Install what needs to be installed. Poke around, save things, etc
docker container commit imagename modified-os
docker save modified-os> modified-os.tar
docker rmi modified-os
docker load < modified-os.tar
docker run container -it –name imagename modified-os /bin/sh

teminal2

Phil 3.30.20

Today’s study in contrasts: Italy and the US:

COVID-19 projections for the US, from the The Institute for Health Metrics and Evaluation (IHME):

Work on converting the ETS json file into spreadsheets to evaluate thresholds and labels – spreadsheet conversion is working. done! Now I need to figure out what those ETS parameters do!

Add a short bit to the D20 writeup that explains why linear interpolation isn’t the best option, and why we went with ETS – done

Work with Zach to get the website up today – working

Work this article into the exploit-space writeup: Why Is Cybersecurity Not a Human-Scale Problem Anymore?. Wow, actually, the company (Balbix) that was founded by the author (Gaurav Banga) seems to be doing most of what I was going to write about. Sent Darren a note to see if I should continue

Got a note from ProQuest saying my file needed to have blank pages at the beginning and end of the document. Fixed. And accepted!

Congratulations. Your submission, xxxxx has cleared all of the necessary checks and will soon be delivered to ProQuest for publishing.

Ok, back to Docker and building an InfluxDB image. Wow, that seems like a lifetime ago I was doing this

To save a custom image, create the container from a base image and then docker save image_name > image_name.tar. This puts it wherever you run the command in the system, Linux or Windows

#COVID-19 meeting at 1:30 today – proposal’s in. We have twitter data from January

SDaaS meeting at 4:00 today – postponed

Phil 3.12.20

7:00 – 6:00 ASRC GOES

Checked that abstracts are OK by April 3rd for the IEEE journal, sent reply to Jon N.
1st meeting for Saudi #COVID19 is monday at noon
Working on revisions. Going to add thoughts about embedding spaces and big data
- Finished assumptions
Working on SDaaS – done!
NSOF meeting
Docker on Windows WSL: https://nickjanetakis.com/blog/setting-up-docker-for-windows-and-wsl-to-work-flawlessly
InfluxDB 2 in a docker container: https://dev.to/influxdata/spinning-up-influxdb-2-0-alpha-with-docker-14k8

Phil 3.11.20

7:00 – 5:00 ASRC GOES

A couple more paragraphs in the revisions
Working on the SDaaS paper. Getting close to finished
Mission meeting
- Update status to delay deliverables
- Still waiting on data
- Simulation running – demo tomorrow
- Evaluate against known yaw flip
- White papers for John D
- 20 sims so far
- Need to install Influx, dammit!
- Paragraph on 400 hrs
- Paragraph on schedule
Sent Erik paragraphs

Phil 3.10.20

7:00 ASRC PhD

Good chat with Aaron M last night. I’ve incorporated comments into the new chapter
Put together a Saudi #corona doc
Meeting with Don today at 1:00
ML group in Hampden today

GOES

More SDaaS paper. Maybe finish first draft today?

viztales

Dimension reduction, State, Orientation, and Speed

Category Archives: Writing

Phil 4.28.20

Phil 4.24.20

Phil 4.23.20

Phil 4.20.20

Phil 4.17.20

Phil 4.16.20

Phil 4.15.20

Phil 4.14.20

Phil 4.13.20

Phil 4.3.20

Phil 4.1.20

Phil 3.30.20

Phil 3.12.20

Phil 3.11.20

Phil 3.10.20