Phil 4.8.20


  • Talk to Zach about chart size bug?
  • We are going to need a top level dashboard, something like number of countries in the DANGER, WARNING, and CONTROLLED buckets

COVID Twitter

  • Continue getting spreadsheets ingested.
  • Got the first one in, trying all of them now. Had to remember about INSERT IGNORE
  • It’s chugging along!


  • Got the db behaving! influxWithData
  • The first and most important thing is that you have to multiply unixtime by 1,000,000,000 for it to work. Got that from this page in the 1.7 guide
  • Second is how tags can be added in code:
    p = Point(, t[0]).time(t[1])
    for key, val in self.tags_dict.items():
        p.tag(key, val)

    That’s pretty nice.

  • Another nice feature that I discovered looking through the code is that there is a to_line_protocol() method, which produces correct lines. It looks like the InfluxDB parser doesn’t like spaces. Here’s an example of correct lines that I am reading in:
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.0 1586352302000000000
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.09983341664682815 1586352312000000000
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.19866933079506122 1586352322000000000
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.29552020666133955 1586352332000000000
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.3894183423086505 1586352342000000000
    measure_1,tagKey_1=tagValue_11,tagKey_2=tagValue_12,tagKey_3=tagValue_13 val_1=0.479425538604203 1586352352000000000

    The reason that I’m reading in data is that the direct, SYNCHRONOUS writes to the database are pretty slow. Looking into that.

  • Coming up next, queries

Phil 4.7.20


  • Talk to Zach about chart size bug?
  • We are going to need a top level dashboard, something like number of countries in the DANGE, WARNING, and CONTROLLED buckets
  • Should look into using scipy’s linregress method to get accuracy values – done!

COVID Twitter

    • Read xls files into db (using this)
    • Wow, you can recursively get files in three lines, including the import:
      import glob
      for filename in glob.iglob("./" + '**/*.xls', recursive=True):
    • Had to do a bunch of things to get Arabic to score correctly. I think I need to set the database to:
      alter database covid_misinfo character set utf8 collate utf8_general_ci;

      , then set the table to utf-8, like so

      DROP TABLE IF EXISTS `table_tweets`;
      /*!40101 SET @saved_cs_client     = @@character_set_client */;
      /*!40101 SET character_set_client = utf8 */;
      CREATE TABLE `table_tweets` (
        `GUID` bigint(20) NOT NULL,
        `date` datetime NOT NULL,
        `URL` varchar(255) DEFAULT NULL,
        `contents` mediumtext NOT NULL,
        `translation` varchar(255) DEFAULT NULL,
        `author` varchar(255) DEFAULT NULL,
        `name` varchar(255) DEFAULT NULL,
        `country` varchar(255) DEFAULT NULL,
        `city` varchar(255) DEFAULT NULL,
        `category` varchar(255) DEFAULT NULL,
        `emotion` varchar(255) DEFAULT NULL,
        `source` varchar(255) DEFAULT NULL,
        `gender` varchar(16) DEFAULT NULL,
        `posts` int(11) DEFAULT NULL,
        `followers` int(11) DEFAULT NULL,
        `following` int(11) DEFAULT NULL,
        `influence_score` float DEFAULT NULL,
        `post_title` varchar(255) DEFAULT NULL,
        `post_type` varchar(255) DEFAULT NULL,
        `image_url` varchar(255) DEFAULT NULL,
        `brand` varchar(255) DEFAULT NULL,
        PRIMARY KEY (`GUID`)

      Anyway, it’s now working! (RT @naif_khalaf رحلة تطوير لقاح وقائي لمرض كورونا. استغرقت ٤ سنوات من المعمل لحيوانات التجارب للدراسات الحقلية على الإبل ثم للدراسة السريرية الأولية على البشر المتطوعين. ولازالت مستمرة.


  • Write, visualize, and query test data
        • Writing seems to be working? I don’t get any errors, but I can’t see anything show up
        • Here’s an example of the data in what I think is correct line format:
          measure_1, tagKey_1=tagValue_11 val_1=0.0 1586270395
          measure_1, tagKey_1=tagValue_11 val_1=0.09983341664682815 1586270405
          measure_1, tagKey_1=tagValue_11 val_1=0.19866933079506122 1586270415
          measure_1, tagKey_1=tagValue_11 val_1=0.2955202066613396 1586270425
          measure_1, tagKey_1=tagValue_11 val_1=0.3894183423086505 1586270435
          measure_1, tagKey_1=tagValue_11 val_1=0.479425538604203 1586270445
          measure_1, tagKey_1=tagValue_11 val_1=0.5646424733950355 1586270455
          measure_1, tagKey_1=tagValue_11 val_1=0.6442176872376911 1586270465
          measure_1, tagKey_1=tagValue_11 val_1=0.7173560908995228 1586270475
          measure_1, tagKey_1=tagValue_11 val_1=0.7833269096274834 1586270485
          measure_1, tagKey_1=tagValue_11 val_1=0.8414709848078965 1586270495
          measure_1, tagKey_1=tagValue_11 val_1=0.8912073600614354 1586270505

          Here’s how I’m writing it:

          def to_influx(self, client:InfluxDBClient, bucket_name:str, org_name:str):
              write_api = client.write_api(write_options=SYNCHRONOUS)
              for i in range(len(self.measurement_list)):
                  t = self.measurement_list[i]
                  for key, val in self.tags_dict.items():
                      p = Point(, val).field(self.keyfield, t[0])
                      write_api.write(bucket=bucket_name, record=p)
                      print("writing {}, {}={}, {}={} {}".format(, key, val, self.keyfield, t[0], t[1]))

          That seems to work. Here’s the output while it’s storing:

          writing measure_10, tagKey_1=tagValue_101, val_10=-0.34248061846961253 1586277701
          writing measure_10, tagKey_1=tagValue_101, val_10=-0.2469736617366209 1586277691​
          writing measure_10, tagKey_1=tagValue_101, val_10=-0.1489990258141953 1586277681​
          writing measure_10, tagKey_1=tagValue_101, val_10=-0.04953564087836742 1586277671​
          writing measure_10, tagKey_1=tagValue_101, val_10=0.05042268780681122 1586277661

          I get no warnings or errors, but the Data Explorer is blank: influxdb

        • Oh, you have to use Unix Timestamps in milliseconds (timestamp * 1000):
          mm.add_value(val, ts*1000)
        • Ok, it’s working, but my times are wrong wrong_times


  • 1:00 IRAD meeting

ML Seminar

Phil 4.6.20

Based on a chat with David K, I’m going to see if I can add a field for the detail view that says whether the estimate is better or worse than yesterday’s. Something like “Today’s estimate is x days better/worse than yesterday

  • And it seems to be working. Need to get it on the website next

Get twitter parser to MySQL converter built

  • Created the table in mySQL
  • Dumped the .sql file (with just the table) to src/data

Continue to set up influx on laptop.

    • Set a fixed IP address – done! In lubuntu, it’s done through Settings->Advanced Network Configuration. I tried just setting the address manually, but it didn’t like that. So I let the dhcp automatically find an address and didn’t delete the static one, and now I can reach both?
      Pinging with 32 bytes of data:
      Reply from bytes=32 time=1ms TTL=64
      Reply from bytes=32 time<1ms TTL=64
      Reply from bytes=32 time<1ms TTL=64
      Reply from bytes=32 time<1ms TTL=64 Ping statistics for     Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds:     Minimum = 0ms, Maximum = 1ms, Average = 0ms C:\Users\Phil>ping
      Pinging with 32 bytes of data:
      Reply from bytes=32 time=297ms TTL=64
      Reply from bytes=32 time<1ms TTL=64
      Reply from bytes=32 time<1ms TTL=64
      Reply from bytes=32 time<1ms TTL=64
      Ping statistics for
          Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
      Approximate round trip times in milli-seconds:
          Minimum = 0ms, Maximum = 297ms, Average = 74ms
    • And I’m connected over the local network! influxRunning
    • Generate and store a set of square and sin waves, store and retrieve them.
    • Built a generator and can save to a file, but it looks like I need to use the API? Here’s the Python page.
    • How to do it?
      import rx
      from rx import operators as ops
      from influxdb_client import InfluxDBClient, Point, WriteOptions
      from influxdb_client.client.write_api import SYNCHRONOUS
      _client = InfluxDBClient(url="http://localhost:9999", token="my-token", org="my-org")
      _write_client = _client.write_api(write_options=WriteOptions(batch_size=500,
      Write Line Protocol formatted as string
      _write_client.write("my-bucket", "my-org", "h2o_feet,location=coyote_creek water_level=1.0 1")
      _write_client.write("my-bucket", "my-org", ["h2o_feet,location=coyote_creek water_level=2.0 2",
                                                  "h2o_feet,location=coyote_creek water_level=3.0 3"])
  • COVID Misinfo meeting
    • Talked about the lateral thinking paper, and that we basically want to automate that.
    • We’re going to put some tweet threads together for misinfo, ambiguous, and trustworthy and have a small Twitter party next week
    • Finish getting the extractor running. There will be other tables as we start to figure things out

And I am done for the day. I hate this fucking timeline



Phil 4.5.20

The initial version of DaysToZero is up! Working on adding states now


Got USA data working. New York looks very bad:


Evaluating the fake news problem at the scale of the information ecosystem

  • “Fake news,” broadly defined as false or misleading information masquerading as legitimate news, is frequently asserted to be pervasive online with serious consequences for democracy. Using a unique multimode dataset that comprises a nationally representative sample of mobile, desktop, and television consumption, we refute this conventional wisdom on three levels. First, news consumption of any sort is heavily outweighed by other forms of media consumption, comprising at most 14.2% of Americans’ daily media diets. Second, to the extent that Americans do consume news, it is overwhelmingly from television, which accounts for roughly five times as much as news consumption as online. Third, fake news comprises only 0.15% of Americans’ daily media diet. Our results suggest that the origins of public misinformedness and polarization are more likely to lie in the content of ordinary news or the avoidance of news altogether as they are in overt fakery.

Phil 4.3.20

Temp is up a bit this morning, which, of course, I’m overreacting to.

Need to get started on State information from here:

Generated some favicons from here:, which, of course we didn’t use

Getting close to something that we can release


  • Update Linux on laptop and try Influx there. Nope. The laptop is hosed. hosed
  • Grabbing another computer to configure. I mean, worst case, I can set up the work laptop as an Ubuntu box. I’d love to know if Influx would work FIRST, though. Looks like I have to. My old dev box won’t boot. Backing up.
  • Installed Debian on the work laptop. It seems to be booting? Nope:
  • I guess we’ll try Ubuntu again? Nope. Trying one more variant.
  • Trying lubuntu. It uses different drivers for some things, and so far hasn’t frozen or blocked yet. It works!
  • And now the Docker version (docker run –name influxdb -p 9999:9999 works too. Maybe because the system got upgraded?
  • 11:00 IRAD Meeting
    • Send note about NOAA being a customer for simulated anomalies for machine learning

Phil 4.2.20

Wake up, shower, write some code. The linear estimate is now integrated with the predictions. I think that tells the story well. Rather than Italy, let’s look at Switzerland:


Now I need to put together a punch list of final issues (I’m worried most about load times and performance under high demand), get the URL(s) and publish!

Punch list is done. Zach is getting the domains.

Here’s the UI description: help diagram

GOES: 8 hours

  • Status report for March
  • InfluxDB!
  • Hmm – I appear to have broken Docker? No just being dumb with commands. Here’s what I needed:
    • docker container run -it –name influx2 ubuntu /bin/sh
    • Success! teminal3
  • Time to RTFM:
  • getting my Ubuntu image current, using
    • apt update
    • apt upgrade
    • apt wget
    • wget
    • tar -xvzf influxdb_2.0.0-beta.7_linux_amd64.tar.gz
    • created a user so I can have things like tab complete (adduser). Created phil with regular test pwd
    • Hmm influxd isn’t on the path. Going to try running it in its directory
    • Things are happening! teminal4
    • But the webserver isn’t visible at localhost:9999 (ERR_CONNECTION_REFUSED). Drat! Still happens when I run as root
  • Town hall meeting

Phil 4.1.20

Working from home has a different rhythm. I work in segments with home chores mixed in. Today I’m doing this at 6:00, along with some coding. Then some morning exercise, breakfast, and work till noon. Ride, lunch and more work till about 3:00. By that time my brain is broken, and I take a break and do light chores. Today I may finally get my road bike ready for spring. Then simple work like commenting for a few hours. In the evenings I find I like watching shows about competent people fixing things and making them better. Bitchin’ Rides is extremely soothing.


  • Fixing dates
  • integrating the estimated deaths from rate and current deaths as area under the curve until zero.
  • Work on documentation. Also make sure word wrap works
  • This. Is. Bad.


  • Once more, this is Italy. What I’ve done is round-tripped the rates to produce an estimate of total deaths. If calculating rates is taking the derivative, calculating a death prediction is integration. So, if the calculations are right, and Italy is at zero new deaths around April 17th, the toll is around 27 thousand total deaths. That’s 0.04% of the population. If those numbers hold for the US at 327 million, that’s a total of 145,550. The White House is estimating numbers of 100,000 to 240,000, which means their average prediction is that we will fare worse than Italy.
  • Fixed bugs, worked with Zach, made progress. Aaron is starting to appear again!


  • Tweak John’s slides
  • More on saving and restoring docker containers. I think I’m close. Then install InfluxDB and test if I can see the dashboard
  • Still having problems. I can create, run, add, delete, and tag the images, but I can’t run them. I think I’m getting ahead of myself. Back to reading


So it turns out that I was doing everything right but the load. Here’s how it works

  1. docker run container -it –name imagename some-os /bin/sh
  2. Install what needs to be installed. Poke around, save things, etc
  3. docker container commit imagename modified-os
  4. docker save modified-os> modified-os.tar
  5. docker rmi modified-os
  6. docker load < modified-os.tar
  7. docker run container -it –name imagename modified-os /bin/sh




Phil 3.31.2020

I need to go grocery shopping today. A friend of mine has come down with the virus. He’s in his 30’s, and I’m feeling vulnerable. I went down to the shop and dug up my painting masks. Turns out I have a few, so that’s what I’m going shopping with. Here’s why, from the NY Times:

When researchers conducted systematic review of a variety of interventions used during the SARS outbreak in 2003, they found that washing hands more than 10 times daily was 55 percent effective in stopping virus transmission, while wearing a mask was actually more effective — at about 68 percent. Wearing gloves offered about the same amount of protection as frequent hand-washing, and combining all measures — hand-washing, masks, gloves and a protective gown — increased the intervention effectiveness to 91 percent.

Podcast with BBC’s misinformation reporter:


  • A friend of mine who works in Whitehall has told me that the army are going to be on the streets this week arresting people who don’t listen to this podcast. If that sounds familiar, you’ll be aware that this crisis has already been fertile ground for disinformation. Marianna Spring is a BBC specialist reporter covering disinformation and social media. In this fascinating interview, Marianna reveals how disinformation and misinformation gets so widely shared, why we share it, how to spot it, what the trends are, how it differs around the world and so much more. This is a brilliant insight not just into the sharing of inaccurate information, but into human behaviour.



  • Changed the calculations from the linear regression to handle cases where the virus is under control, like China – first pass is done
  • Have the linear regression only go back some number of weeks/months. I’m worried about missing a second wave
  • Need to add a disclaimer about the quality of the predictions is dependent on the quality of the data, and that we expect that as poorer countries come online, these trends may be erratic and inaccurate.
  • Add an UNSET state. The ETS will only set the state if it is UNSET. This lets regression predictions to be used until the ETS is working well – done
  • I think showing the linear and ETS mean prediction is a good way to start including ETS values
  • Found the page that shows how to adjust parameters:


  • Try to create an image from the stored tar
  • Start setting up InfluxDB2

IRAD Meeting at 2:00

ML Group at 4:00

  • Put together a list of potential papers to present. No need, I’ll do infinitely wide networks
  • Had just a lovely online evening of figuring out how to use some (terrible!) webex tools, and trying to figure out Neural ODEs. It was an island of geeky normalcy for a few hours. This may be a more comprehensible writeup.

Phil 3.30.20

Today’s study in contrasts: Italy and the US:

COVID-19 projections for the US, from the The Institute for Health Metrics and Evaluation (IHME):


Work on converting the ETS json file into spreadsheets to evaluate thresholds and labels – spreadsheet conversion is working. done! Now I need to figure out what those ETS parameters do!

Add a short bit to the D20 writeup that explains why linear interpolation isn’t the best option, and why we went with ETS – done

Work with Zach to get the website up today – working

Work this article into the exploit-space writeup: Why Is Cybersecurity Not a Human-Scale Problem Anymore?. Wow, actually, the company (Balbix) that was founded by the author (Gaurav Banga) seems to be doing most of what I was going to write about. Sent Darren a note to see if I should continue

Got a note from ProQuest saying my file needed to have blank pages at the beginning and end of the document. Fixed. And accepted!

  • Congratulations. Your submission, xxxxx has cleared all of the necessary checks and will soon be delivered to ProQuest for publishing.

Ok, back to Docker and building an InfluxDB image. Wow, that seems like a lifetime ago I was doing this

  • To save a custom image, create the container from a base image and then docker save image_name > image_name.tar. This puts it wherever you run the command in the system, Linux or Windows

#COVID-19 meeting at 1:30 today – proposal’s in. We have twitter data from January

SDaaS meeting at 4:00 today – postponed

Phil 3.28.20

From today’s spreadsheet: countries_2020-03-28_07-38



NY Times is starting to use rates as well Some U.S. Cities Could Have Coronavirus Outbreaks Worse Than Wuhan’s

Interesting chat as an expert(?) on developing code in the future

Working on the ssh transfer in code using paramiko. This seems to be a good one.

It works!

import paramiko

filename = "C:/Development/Sandboxes/DaysToZero/data/external/countries_2020-03-28_07-38.xlsx"
remote_dir = "/home/"
client = paramiko.SSHClient()
client.connect('', username='some_login', password='some_password')
ftp_client = client.open_sftp()
#l = ftp_client.listdir()
ftp_client.put(filename, remote_dir)

I should try to put all the pieces together, but I am just done, and am stress-scrolling through Twitter, which really doesn’t help. Getting away from the computer for a while

Phil 3.27.20

Working with Zach and Aaron on the app. I think we’ll have something by this weekend

  • Added a starting zero on the regression
  • Added the regression to the json file, and posted to see if Zach can reach
  • Set up the hooks for export to excel workbook, with one tab per active country. I’ll work on that later today – done! countries

Got clarification from Wayne on some edits. Going to turn those around this morning and try to submit before COB today. Maryland is at 580 confirmed cases as of yesterday. I’d expect to see nearly 800 when they update the site this morning. Sent over all the edits. It’s in!




ProQuest submission site.

Phil 3.26.20

Updated the proposal

Found an example of diversity injection in the wild: Here’s a story about it from The Correspondent.

Working on the parser today

  • Tried using my ExcelUtils, which are barfing on all the text in the csv
  • Discovered DictReader from the csv library, which works perfectly!
  • Throwing away rows that have less than three data points
  • Collecting rows into countries – done
  • Parsing out dates and values – done
  • Working on getting totals – done
  • Working on calculating rates – done
  • Seeing if I can do a least squares regression to calculate a first pass -done? It doesn’t seem to quite work right on the actual data
  • Aaron added his pieces in and everything seems to be kind of working

Phil 3.25.20

Waking up to the news these days makes me want to stay in bed with the radio off

Working on automating the process of downloading the spreadsheet, parsing out the countries, and calculating daily rates. The goal is to have a website up this weekend so you can see how your country is doing.


  • Set up converter class – done
  • download spreadsheet – done
  • parse out countries – working on it
  • Made mockups of the mobile and webpage displays, and refined a few times based on comments

Got notes for Chapter 11 from Wayne. Switching gears and rolling that in. Put in changes for all the items I could read. There are a few still outstanding. I’ll submit tonight if Wayne doesn’t come back for a discussion.

Back to Docker. Need to connect to the WLS. Done!


  • AIMS – status for all, plus technichal glitches. We’ll try Teams next time. Vadim has made GREAT progress. We might be able to get a real Yaw Flip soon as well
  • A2P – Infor demo. Meh.

Stampede theory proposal deadline was delayed a couple of days

Phil 3.24.20

Well, I’ve got more predictions using death rates as described in this post. Based on the latest dataset from here (Github), I’ve created a spreadsheet that does a linear (least squares) extrapolation for when the number of new deaths per day drops to zero:


China is in this group as a sanity check, and as you can see, it’s very near zero, and so is South Korea. Italy, Germany, Spain, Iran, and Indonesia are currently all in the middle, with 2-3 weeks to go if nothing changes. France, the Netherlands, and Switzerland are far enough out that I think these may be low confinance values. The UK seems to be doing terribly. The worst performers are Belgium and the US, whose death rates are still going up, as indicated by the “-1” in the “days till” column.

Here are plots of the data used to calculate the table. Due to the way that excel labels axis, I don’t have dates for the x-axis for all the charts. They all end on the seme date (3/21, the last day in the dataset with the two days I need to calculate rates), but some of them have fewer tata points so that the time before the outbreak doesn’t influence the calculations:


Working on Hours for misinfo proposal


  • Had a good chat with Biruh about InfluxDB running in Docker. Since I’m running the Windows version, things are different enough that I’m going to need to download a linux distro image and run my own version of InfluxDB2 inside that. Which means I need to get smarter on Docker and making a custom image, etc. Got this book. We’ll see how that goes today.

ML Webex meeting

BART is the new BERT!


Phil 3.23.20

I think I found a way of looking at COVID-19 data in a way that makes intuitive sense to me – growth rate. Let’s revisit the scary dashboard:

This is a very dramatic presentation of information, and a good way of getting a sense of how things are going right now, which is to say, um… not well.

But if we look at the data (from here), we can break it down in different ways. I’m going to focus on the daily death rate. In other words, what is the percentage of deaths from one day to the next?

These still look horrible, but they do not appear to be getting worse. The curves are flattening. What happens if we look at the same data as a rate problem though?

That looks very different. After a big initial spike, both countries have a rate of decrease that fits pretty well to a linear trend. So what do we get if we plug the current rate of increase in and solve for zero? In other words, when are there no more new cases?

Italy’s current rate is 11.89%, or 0.1189. Iran is 7.66% or 0.0766. Using those values we get some good news:

  • Italy: 27 days, or April 19th
  • Iran: 15 days, or April 7th

Ok, so let’s look at the US. There’s not really enough data to do this on a state-by state basis yet, but there is plenty for the whole country:

This is not good. Our rate of increase is more than either Iran’s or Italy’s rate of decrease. At this point, there is literally no end in sight.

Ok, let’s look at the world as a whole:

Also not good. Things clearly improved as China got a handle on its outbreak, but the trends are now going the other way as the disease spreads out into the reset of the world. It’s clearly going to be a bumpy ride.

I’d like to point out that there is no good way to tell here what caused these trends to change. It could be treatments, or it could be susceptibility. Italy and Iran did not take the level of action that China did, yet if trends continue, they will be clear in about a month. Well know more as the restrictions loosen, and there is or isn’t a new upturn.

Ok, Back to work

10:00 – ASRC GOES

  • Getting InfluxDB to work in Docker
  • Use cases for John and whitepaper for Darren?
  • Noon research Meeting
  • 4:00 ML seminar?