Author Archives: pgfeldman

Phil 11.12.20

The Next Decade Could Be Even Worse

  • A historian believes he has discovered iron laws that predict the rise and fall of societies. He has bad news.

GPT-2 Agents

  • Tried Sim’s model, it’s very nice!
  • Created a base class for creating and parsing tweets
  • Found a regex that will find any text between two tokens. Thanks, stackoverflow!
  • Here’s an example. I need to look into how large the meta information should be before it starts affecting the trajectory
On July of 2020, @MikenzieCromwell (screen name "Mikenzie Cromwell", 838 followers) posted a tweet from Boston, MA. They were using Twitter Web App. The post had 0 replies, 0 quotes, 1 retweets, and 3 favorites. "An example of the importance of the #mentalhealth community's response to #COVID19 is being featured in the @WorldBank survey. Check out the latest #MentalHealthResponse survey data on the state of mental health services in the wake of the pandemic. https://t.co/9qrq4G4XJi" 

GOES

  • More Replayer
  • Got the vertex manipulation! It’s hard to get at it though the geometry, but if you just save the LineSegs object,
ls = LineSegs(name)
self.prim_dict[name] = ls
  • you can manipulate that directly
ls:LineSegs = self.lp.get_LineSeg("test_line")
ls.setVertex(1, x, y, z)
  • Meeting with Vadim at 10:00. We found some pretty bad code that sets the torques on the reaction wheels

Book

  • Write and send letters – done!
  • More Moby Dick

Phil 11.11.20

I did something bad with my data yesterday. This is the correct version (I hope)

https://public.flourish.studio/visualisation/4303726/

GPT-2 Agents

  • Generating some results for Friday
  • Splitting the results on the probes. It looks like the second tweet in a series is better formed. That kind of makes sense, because the second tweet is based on the first. That leads to an interesting idea. Maybe we should try building chains of text using the result from the previous
  • Generating text with 1000 chars and parsing it, throwing away the first and last element in the list. I can also parse out the tweet, location, and sentiment:
[1]: In December of 2019, @svsvzz (21046 followers, 21784 following) posted a retweet, mentioning @ArticleSpot. It was sent from Saudi Arabia. @svsvzz wrote: "RT @ArticleSpot New update # Comment_study is coming..". The retweet was categorized as "Neutral". 
     Location = Saudi Arabia
     Sentiment = Neutral
     Tweet = RT @ArticleSpot New update # Comment_study is coming..
 [2]: In December of 2019, @HussainALhamad (2340 followers, 29 following) posted a retweet, mentioning @ejazah_ksa. It was sent from Riyadh, Saudi Arabia. @HussainALhamad wrote: "RT @ejazah_ksa Poll: Do you support #Suspension of studying in #Riyadh tomorrow, Monday? If you support (Retweet) If you do not support (Like)". The retweet was categorized as "Positive". 
     Location = Riyadh, Saudi Arabia
     Sentiment = Positive
     Tweet = RT @ejazah_ksa Poll: Do you support #Suspension of studying in #Riyadh tomorrow, Monday? If you support (Retweet) If you do not support (Like)
 [3]: In December of 2019, @mahfouz_nour (11 followers, 57 following) posted a tweet. She wrote: "♪ And the rest of the news about a news that the study was suspended in the study ♪ ♪ And God bless you ♪ ♪ Now ♪". The tweet was categorized as "Negative". 
     Location = None
     Sentiment = Negative
     Tweet = ♪ And the rest of the news about a news that the study was suspended in the study ♪ ♪ And God bless you ♪ ♪ Now ♪
 [4]: In December of 2019, @tansh99huda99 (1211 followers, 519 following) posted a retweet, mentioning @HashKSA. @tansh99huda99 wrote: "RT @HashKSA # comments on Monday at all schools for students in #Dahan, and the decision does not include teachers' and teachers' levels.". The retweet was categorized as "Neutral". 
     Location = None
     Sentiment = Neutral
     Tweet = RT @HashKSA # comments on Monday at all schools for students in #Dahan, and the decision does not include teachers' and teachers' levels.

Created some slides. I think they look pretty good:

Phil 11/10/20

America Is a Lot Sicker Than We Wanted to Believe

  • Nearly half of the voters have seen Trump in all of his splendor—his infantile tirades, his disastrous and lethal policies, his contempt for democracy in all its forms—and they decided that they wanted more of it.

Added some code that makes it easier to compare countries and states and produced an animated GIF. I’m more concerned about Maryland now!

https://public.flourish.studio/visualisation/4302655/

Book

  • Letter to Stuart Kauffman
  • Letter to Frans de Waal

GPT-2 Agents

  • Created an animated GIF of English, Chinese, and Arabic countries for the Friday presentation
  • DB-based topic text extraction
  • 3:30 Meeting

GOES

  • Work on Replayer

Phil 11.9.20

Went down to DC yesterday. So weird to see the White House behind multiple sets of walls, like a US base in Afghanistan

Dentist at 3:00

GPT-2 Agents

  • Working on generating a new normalized data set. It needs to be mush smaller to get results by the end of the week. Done. It takes a couple of passes through the data to get totals needed for percentages, but it seems to be working well
  • Restarted training
  • Topic extraction from Tweet content

GOES

  • Started working on 3D view of what’s going on with the two frames. I think I’m just going to have to start over with a a smaller codebase though, if Vadim can’t find what’s going on in his code.
  • 1:30 Meeting with Vadim

Phil 11.8.20

I went on a long ride yesterday and the Trump signs that I’ve seen since 2015 are largely gone. The “Don’t tread on me” flags were still there. I wonder if it’s Trump fatigue?

GPT-2 Agents

  • I had a brief flicker of a power failure that killed the training. I need to figure out how to restart from a checkpoint
  • Need to normalize tweets by month and also subsample
  • Topic extraction directly from DB

Phil 11.6.20

Took off yesterday and had a great time enjoying the weather and riding up to Gettysburg. And not thinking too much

Via Washington Post (Captured 11.6.2020 0600 EST

There is also this

And later, this

GOES

  • 10:00 Meeting with Vadim. If there is no progress, I’m going to build a display that plays back the recorded data so we can see what’s going on. He can’t make it, so postponed to Monday. I’ll start on the app after my other paperwork

NESDIS

  • More paperwork

GPT-2 agents

  • Respond to JF’s email and set up a meeting – done
  • Hopefully finish the current training run.
  • Adjust the code so that a query for the count of each month is made, and the results are normalized over each month. That means a query that gets the count and min/max row_ids
  • Work on NMF and LDA topics for MER tweets. Do topics for the whole dataset (just the tweets), then select the ones that seem to make sense, and use them on the model-generated text. Create wordclouds from the topic labels
    • Got the NMF and LDA code into separate files. Need to change the get_content() method to pull directly from the db

Book

  • Meeting with micelle

Phil 4.11.20

Did not want to wake up to this (Washington Post website as of 6:10 EST):

And the refrigerator was too narrow and too tall. Had to redo the cabinets. Sheesh.

GOES

  • GVSETS continues today

GPT-2 Agents

  • Working on the queries to build the test-train set. I’d like to start training against the medium model today
  • I am doing things like this today: re.compile(r'(.?)(\<.?>)|$’)
  • Created the training sets

Phil 11.3.20

Election day! Cue ominous music

Listening to Drinking with Historians episode with Drew McKevitt, and this came up:

..when when this came up with my students so my students they’re just an awesome group. I had this is two years ago i would say you know probably 18 of them probably 15 of them identified as like gun rights people right so they were they they owned guns themselves or they were all in favor of gun rights. And like every time the AR-15 came up and they’re just like “that’s this is just stupid culture war like that’s why people buy the gun.” These are like these are students who themselves are like gun rights people or gun owners and like that’s why people buy it it’s a culture war thing. I mean look at that guy in St Louis right who who pulled out his AR-15 and was aiming it at the protesters. Like what the hell is that guy doing with that gun? It’s total culture war stuff. It’s you know as we’d say like virtue signaling or something like that and that that’s what it is it’s saying. Like this it’s it’s it’s voting by buying a gun.”

https://www.youtube.com/watch?v=iAcKv03GfdE&list=PLvTW8ARBmW029eOiTQA8GSVBKmSGHXpIp&index=12

GOES

Book

  • Adding some content from Chimpanzee Politics and working with the Moby Dick section. I think that’s coming together now. Need to talk about how Starbuck continues to have his own thoughts, but goes along with the consensus of the ship, even though he suspects that this will kill him

GPT-2 Agents

  • Meeting with Sim and Shimei. We’re going to focus on getting some preliminary results for the 13th
  • Looked at the results for the medium model and all agree that it looks much better. Going to train the

Phil 11.2.20

There appears to be some kind of imminent election for national office

My presentation is tomorrow, just before lunch, in the Modeling Simulation and Software Technical Session.

GOES

  • Status report
  • Over the weekend, I started to play with what happens when the AngleController isn’t getting any angles.
https://viztales.com/wp-content/uploads/2020/11/image.png
  • What seems to be happening is an underdamped condition where the reaction wheels start oscillating and take off after a while. The left side of the charts generally seem to be something that could be handled with thresholding, but the fact that the system doesn’t recover and continue to oscillate around its zero pitch/roll/yaw position makes me thing that something funny is going on. I think the next thing to do is to plot these values in 3D to see if the reaction wheel axis contributions are being calculated correctly with respect to the normal

#COVID

  • Was able to use the medium GPT2 model on the middle eastern and Trump rally data, but ran out of GPU memory if I tried to use the large model
  • Trying out some probes on the Trump medium model. I think it could be really interesting to combine the rally data with other Trump (and maybe other?) sources (like tweets) to build out the dataset
  • A nice topic modelling notebook with NMF and LDA

Book

  • Organizing the chimpanzee section, and working on the Moby Dick part

Phil 10.30.20

So yesterday the oven broke, and today it’s the fridge

GPT-2 Agents

  • Try finetuning the large model on the large dev machine. While trying out the gpt2-large model (because the gpt2-xl model didn’t work), I had an odd problem. When I tried to finetune the model using my local file (which I had saved earlier), the system choked on the lack of a pytorch_model.bin, which must not download when you just want to use the model itself. My guess is that if you don’t specify the file, it will probably download and work, but my default drive is a small SSD, and I don’t want to load it up.
  • To see what was going on, I used a script that I had used before to find where transformers stores the model:
from transformers.file_utils import hf_bucket_url, cached_path

pretrained_model_name = f'gpt2-large'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)
print(resolved_archive_file)
  • That printed out the following:

C:\Users\Phil/.cache\torch\transformers\eeb916d81211b381b5ca53007b5cbbd2f5b12ff121e42e938751d1fee0e513f6.999a50942f8e31ea6fa89ec2580cb38fa40e3db5aa46102d0406bcfa77d9142d

  • After renaming to pytorch_model.bin and moving it to my model directory, the finetuning is now working!
  • At around 10:30 last night, the checkpoints had filled up my 1TB data drive! Tried a bunch of things to use a checkpoint for restarting from chackpoint. Pointing the model to the checkpoint seems to be the right answer, but it was missing the vocab and merges.txt files. Tried to pull that over from the original model, but now I get a:
Traceback (most recent call last):
   File "run_language_modeling.py", line 355, in 
     main()
   File "run_language_modeling.py", line 319, in main
     trainer.train(model_path=model_path)
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 621, in train
     train_dataloader = self.get_train_dataloader()
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 417, in get_train_dataloader
     train_sampler = self._get_train_sampler()
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 402, in _get_train_sampler
     if self.args.local_rank == -1
   File "D:\Program Files\Python37\lib\site-packages\torch\utils\data\sampler.py", line 104, in init
     "value, but got num_samples={}".format(self.num_samples))
 ValueError: num_samples should be a positive integer value, but got num_samples=0
  • Not sure what to do next. Going to try restarting and cleaning out the earlier checkpoints as the code runs
  • Another thing that I’m thinking about is the non-narrative nature of the tweets, due to the lack of threading, so I also pulled down the Kagle repository for Trump rally speeches, and am going to see if I can use that. I think that they are particularly interesting because Trump is very attuned to the behavior of the crown during a rally and will “try out” lines to see if they work and adjust what he is talking about. It should reflect what his base is thinking over the time period
  • Need to start thinking about a short presentation for Nov 13

GOES

  • Figure out how to taper the beginning and end of the reference frame rotation
  • Add method to adjust the RW contributions. Look at the original spreadsheet and see what the difference is
  • Added the tapering and fooled around a lot exploring how the system is behaving. I think the next step is to see why the vehicle doesn’t recover its pitch
https://viztales.com/wp-content/uploads/2020/10/image-20.png

Phil 10.29.20

Are we really at ‘Zeta’? We run out of Greek leters at omega… Hurricane Zeta batters a storm-weary Gulf Coast

It occurs to me that there could be a new military AI ethics paper about ML and hierarchical/dominance bias

GOES

  • Adjust the TopController so that the reference angle changes ramp up and down
  • Look at adjusting the contributions? Maybe square? Add a method for experimentation

GPT-2 Agents

  • Start the Chinese database using the csv file
  • Boy did I struggle with storing Chinese in a MySQL table. With the Arabic version, I had to change everything to UTF8, which turns out in MySQL and MariaDB, to not quite be UTF-8. For Chinese, you have to use utf8mb4,as described it this awesome post: How to support full Unicode in MySQL databases
  • The short answer is to first, set the DB to the appropriate character set:
alter database <your database here> default character set utf8mb4 collate utf8mb4_general_ci;
  • Then make sure that the table is correct. IntelliJ defaults to the latin charset, so I had to do this manually by creating the table in Intellij, dumping it, and editing as follows:
CREATE TABLE table_posts (
   rowid int(11) NOT NULL AUTO_INCREMENT,
   id varchar(16) DEFAULT NULL,
   user_id varchar(32) DEFAULT NULL,
   created_at datetime DEFAULT NULL,
   crawl_time datetime DEFAULT NULL,
   like_num int(11) DEFAULT NULL,
   repost_num int(11) DEFAULT NULL,
   comment_num int(11) DEFAULT NULL,
   content text,
   translation text,
   origin_weibo varchar(16) DEFAULT NULL,
   geo_info text,
   PRIMARY KEY (rowid)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
  • Then I sourced the file back intpo the db, so this worked:
Insert into table_posts (`id`, user_id, created_at, crawl_time, like_num, repost_num, comment_num, content, origin_weibo, geo_info) values("IiF4ShXQZ", "d058566643d6657e", "2019-12-01 00:00:17", "2020-04-22 21:21:30", 1, 0, 0, "《药品管理法》《疫苗管理法》👏👏", NULL, NULL);
  • When selected:
1,IiF4ShXQZ,d058566643d6657e,2019-12-01 00:00:17,2020-04-22 21:21:30,1,0,0,《药品管理法》《疫苗管理法》👏👏,,,
  • Emojis and everything!

ML Group – Just statuses

JuryRoom – Alex and Tamahau are getting ready to submit their thesis’ so mostly helping with that.

Phil 10.28.20

GOES

  • Fix E-QiP paperwork – done?
  • Fix the RW clamp code. Also, think about how to reduce the velocity of the RW as it nears it’s goal. Some kind of linear function where the scalar is velocity/threshold_velocity, where velocity < threshold velocity or something like that

JuryRoom

  • Read Alex’s document and add comments
  • Put some guidance in for Tamahau’s discussion section

Book

  • Placed some thematic guidance at the top
  • Moved all the chimp stuff into it’s own section.
  • Work on Moby-Dick. Now at mobydick/section5 in the summary.

DHS PLANXS – Need to review the AI/ML section

Phil10.27.20

Watching this, on Yannic Kilcher’s transformer channel :

GOES

  • Finish E-QiP paperwork – Nope. Gotta fix some things
  • 2:00 Meeting with Vadim.
    • I think we don’t need to determine the sign of the rotation, since there is a single axis of rotation. Going to see if that helps. Nope
    • Just discovered that the deg/rad for get_angle() were reversed! AND THAT FIXED IT
https://viztales.com/wp-content/uploads/2020/10/image-18.png
  • Did a little extra bug hunting and added a commanded rate limiter, which needs to be tweaked.

GPT-2 Agents

  • Sim has the Melville model, hopefully there will be some results as well. There are and it looks pretty good. I also had her do some longer texts to see what effect that has. It doesn’t appear to be that much?
  • 3:30 Meeting
    • Work through options for US model. Show the results from the Arabic full tweet probe. One of the neat things about that approach is that the process can be automated as long as there is access to the DB. It will require a way of breaking the results on the training boundary
    • This makes me want to revisit the MD model and see how taking sentences at random could show alternate(?) plot trajectories. Hmm. Not as clear as I thought it would be.
    • Need to start on the Chinese translator/parser
    • Did some interesting word clouds based on the Arabic tweets

Book

  • Move all the chimp stuff into it’s own section. Start to bring in other animal behavior?
  • Work on Moby-Dick. Found a nice summary here: sparknotes.com/lit/mobydick. Continued writing. Now at mobydick/section5 in the summary.

DHS PLANXS – Need to review the AI/ML section

Phil 10.26.20

I did a thing on Reddit

GOES

  • Time for a E-QiP of paperwork – Can’t log in. Yay! Fixed. Updating, rather than starting from scratch, which is nice
  • Looking at rotation code. Vadim didn’t update. Now he has
  • 2:00 Meeting with Vadim.
    • I think we don’t need to determine the sign of the roatation, since there is a single axis of rotation. Going to see if that helps.

GPT-2 Agents

Book

  • Move all the chimp stuff into it’s own section. Start to bring in other animal behavior?
  • Work on Moby-Dick. Found a nice summary here: sparknotes.com/lit/mobydick

#COVID

  • 3:00 Meeting to go over results and figure out what to do next
    • Went over a lot. I also tried a tweet as the prompt, and it generated very specific, diverging results. I think this is a good mechanism. It could even be that we search through the original corpora to find tweets that reflect what we are interested in, and use those as probes. Uploaded a lot to the shared folder
    • Next meeting is on Nov 3rd (cue ominous music) to discuss next steps

Phil 10.23.20

This never stops being horrible

https://public.flourish.studio/visualisation/3603910/

Book

  • Still spending a lot of on chimpanzee behavior. Writing more about how alliance-building works and starting to set up (finally!) Moby-Dick
  • 4:00 Meeting with Michelle

GOES

  • Got my E-QIP number. Time for a lot of paperwork
  • 10:00 Meeting with Vadim. Good progress, but we’re not *quite* there yet. More on Monday
  • 1:30 Fingerprinting. Leave at 12:30? Done!