Phil 11.8.20

I went on a long ride yesterday and the Trump signs that I’ve seen since 2015 are largely gone. The “Don’t tread on me” flags were still there. I wonder if it’s Trump fatigue?

GPT-2 Agents

  • I had a brief flicker of a power failure that killed the training. I need to figure out how to restart from a checkpoint
  • Need to normalize tweets by month and also subsample
  • Topic extraction directly from DB

Phil 11.6.20

Took off yesterday and had a great time enjoying the weather and riding up to Gettysburg. And not thinking too much

Via Washington Post (Captured 11.6.2020 0600 EST

There is also this

And later, this

GOES

  • 10:00 Meeting with Vadim. If there is no progress, I’m going to build a display that plays back the recorded data so we can see what’s going on. He can’t make it, so postponed to Monday. I’ll start on the app after my other paperwork

NESDIS

  • More paperwork

GPT-2 agents

  • Respond to JF’s email and set up a meeting – done
  • Hopefully finish the current training run.
  • Adjust the code so that a query for the count of each month is made, and the results are normalized over each month. That means a query that gets the count and min/max row_ids
  • Work on NMF and LDA topics for MER tweets. Do topics for the whole dataset (just the tweets), then select the ones that seem to make sense, and use them on the model-generated text. Create wordclouds from the topic labels
    • Got the NMF and LDA code into separate files. Need to change the get_content() method to pull directly from the db

Book

  • Meeting with micelle

Phil 4.11.20

Did not want to wake up to this (Washington Post website as of 6:10 EST):

And the refrigerator was too narrow and too tall. Had to redo the cabinets. Sheesh.

GOES

  • GVSETS continues today

GPT-2 Agents

  • Working on the queries to build the test-train set. I’d like to start training against the medium model today
  • I am doing things like this today: re.compile(r'(.?)(\<.?>)|$’)
  • Created the training sets

Phil 11.3.20

Election day! Cue ominous music

Listening to Drinking with Historians episode with Drew McKevitt, and this came up:

..when when this came up with my students so my students they’re just an awesome group. I had this is two years ago i would say you know probably 18 of them probably 15 of them identified as like gun rights people right so they were they they owned guns themselves or they were all in favor of gun rights. And like every time the AR-15 came up and they’re just like “that’s this is just stupid culture war like that’s why people buy the gun.” These are like these are students who themselves are like gun rights people or gun owners and like that’s why people buy it it’s a culture war thing. I mean look at that guy in St Louis right who who pulled out his AR-15 and was aiming it at the protesters. Like what the hell is that guy doing with that gun? It’s total culture war stuff. It’s you know as we’d say like virtue signaling or something like that and that that’s what it is it’s saying. Like this it’s it’s it’s voting by buying a gun.”

https://www.youtube.com/watch?v=iAcKv03GfdE&list=PLvTW8ARBmW029eOiTQA8GSVBKmSGHXpIp&index=12

GOES

Book

  • Adding some content from Chimpanzee Politics and working with the Moby Dick section. I think that’s coming together now. Need to talk about how Starbuck continues to have his own thoughts, but goes along with the consensus of the ship, even though he suspects that this will kill him

GPT-2 Agents

  • Meeting with Sim and Shimei. We’re going to focus on getting some preliminary results for the 13th
  • Looked at the results for the medium model and all agree that it looks much better. Going to train the

Phil 11.2.20

There appears to be some kind of imminent election for national office

My presentation is tomorrow, just before lunch, in the Modeling Simulation and Software Technical Session.

GOES

  • Status report
  • Over the weekend, I started to play with what happens when the AngleController isn’t getting any angles.
https://viztales.com/wp-content/uploads/2020/11/image.png
  • What seems to be happening is an underdamped condition where the reaction wheels start oscillating and take off after a while. The left side of the charts generally seem to be something that could be handled with thresholding, but the fact that the system doesn’t recover and continue to oscillate around its zero pitch/roll/yaw position makes me thing that something funny is going on. I think the next thing to do is to plot these values in 3D to see if the reaction wheel axis contributions are being calculated correctly with respect to the normal

#COVID

  • Was able to use the medium GPT2 model on the middle eastern and Trump rally data, but ran out of GPU memory if I tried to use the large model
  • Trying out some probes on the Trump medium model. I think it could be really interesting to combine the rally data with other Trump (and maybe other?) sources (like tweets) to build out the dataset
  • A nice topic modelling notebook with NMF and LDA

Book

  • Organizing the chimpanzee section, and working on the Moby Dick part

Phil 10.30.20

So yesterday the oven broke, and today it’s the fridge

GPT-2 Agents

  • Try finetuning the large model on the large dev machine. While trying out the gpt2-large model (because the gpt2-xl model didn’t work), I had an odd problem. When I tried to finetune the model using my local file (which I had saved earlier), the system choked on the lack of a pytorch_model.bin, which must not download when you just want to use the model itself. My guess is that if you don’t specify the file, it will probably download and work, but my default drive is a small SSD, and I don’t want to load it up.
  • To see what was going on, I used a script that I had used before to find where transformers stores the model:
from transformers.file_utils import hf_bucket_url, cached_path

pretrained_model_name = f'gpt2-large'
archive_file = hf_bucket_url(
    pretrained_model_name,
    filename='pytorch_model.bin',
    use_cdn=True,
)
resolved_archive_file = cached_path(archive_file)
print(resolved_archive_file)
  • That printed out the following:

C:\Users\Phil/.cache\torch\transformers\eeb916d81211b381b5ca53007b5cbbd2f5b12ff121e42e938751d1fee0e513f6.999a50942f8e31ea6fa89ec2580cb38fa40e3db5aa46102d0406bcfa77d9142d

  • After renaming to pytorch_model.bin and moving it to my model directory, the finetuning is now working!
  • At around 10:30 last night, the checkpoints had filled up my 1TB data drive! Tried a bunch of things to use a checkpoint for restarting from chackpoint. Pointing the model to the checkpoint seems to be the right answer, but it was missing the vocab and merges.txt files. Tried to pull that over from the original model, but now I get a:
Traceback (most recent call last):
   File "run_language_modeling.py", line 355, in 
     main()
   File "run_language_modeling.py", line 319, in main
     trainer.train(model_path=model_path)
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 621, in train
     train_dataloader = self.get_train_dataloader()
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 417, in get_train_dataloader
     train_sampler = self._get_train_sampler()
   File "D:\Program Files\Python37\lib\site-packages\transformers\trainer.py", line 402, in _get_train_sampler
     if self.args.local_rank == -1
   File "D:\Program Files\Python37\lib\site-packages\torch\utils\data\sampler.py", line 104, in init
     "value, but got num_samples={}".format(self.num_samples))
 ValueError: num_samples should be a positive integer value, but got num_samples=0
  • Not sure what to do next. Going to try restarting and cleaning out the earlier checkpoints as the code runs
  • Another thing that I’m thinking about is the non-narrative nature of the tweets, due to the lack of threading, so I also pulled down the Kagle repository for Trump rally speeches, and am going to see if I can use that. I think that they are particularly interesting because Trump is very attuned to the behavior of the crown during a rally and will “try out” lines to see if they work and adjust what he is talking about. It should reflect what his base is thinking over the time period
  • Need to start thinking about a short presentation for Nov 13

GOES

  • Figure out how to taper the beginning and end of the reference frame rotation
  • Add method to adjust the RW contributions. Look at the original spreadsheet and see what the difference is
  • Added the tapering and fooled around a lot exploring how the system is behaving. I think the next step is to see why the vehicle doesn’t recover its pitch
https://viztales.com/wp-content/uploads/2020/10/image-20.png

Phil 10.29.20

Are we really at ‘Zeta’? We run out of Greek leters at omega… Hurricane Zeta batters a storm-weary Gulf Coast

It occurs to me that there could be a new military AI ethics paper about ML and hierarchical/dominance bias

GOES

  • Adjust the TopController so that the reference angle changes ramp up and down
  • Look at adjusting the contributions? Maybe square? Add a method for experimentation

GPT-2 Agents

  • Start the Chinese database using the csv file
  • Boy did I struggle with storing Chinese in a MySQL table. With the Arabic version, I had to change everything to UTF8, which turns out in MySQL and MariaDB, to not quite be UTF-8. For Chinese, you have to use utf8mb4,as described it this awesome post: How to support full Unicode in MySQL databases
  • The short answer is to first, set the DB to the appropriate character set:
alter database <your database here> default character set utf8mb4 collate utf8mb4_general_ci;
  • Then make sure that the table is correct. IntelliJ defaults to the latin charset, so I had to do this manually by creating the table in Intellij, dumping it, and editing as follows:
CREATE TABLE table_posts (
   rowid int(11) NOT NULL AUTO_INCREMENT,
   id varchar(16) DEFAULT NULL,
   user_id varchar(32) DEFAULT NULL,
   created_at datetime DEFAULT NULL,
   crawl_time datetime DEFAULT NULL,
   like_num int(11) DEFAULT NULL,
   repost_num int(11) DEFAULT NULL,
   comment_num int(11) DEFAULT NULL,
   content text,
   translation text,
   origin_weibo varchar(16) DEFAULT NULL,
   geo_info text,
   PRIMARY KEY (rowid)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
  • Then I sourced the file back intpo the db, so this worked:
Insert into table_posts (`id`, user_id, created_at, crawl_time, like_num, repost_num, comment_num, content, origin_weibo, geo_info) values("IiF4ShXQZ", "d058566643d6657e", "2019-12-01 00:00:17", "2020-04-22 21:21:30", 1, 0, 0, "《药品管理法》《疫苗管理法》👏👏", NULL, NULL);
  • When selected:
1,IiF4ShXQZ,d058566643d6657e,2019-12-01 00:00:17,2020-04-22 21:21:30,1,0,0,《药品管理法》《疫苗管理法》👏👏,,,
  • Emojis and everything!

ML Group – Just statuses

JuryRoom – Alex and Tamahau are getting ready to submit their thesis’ so mostly helping with that.

Phil 10.28.20

GOES

  • Fix E-QiP paperwork – done?
  • Fix the RW clamp code. Also, think about how to reduce the velocity of the RW as it nears it’s goal. Some kind of linear function where the scalar is velocity/threshold_velocity, where velocity < threshold velocity or something like that

JuryRoom

  • Read Alex’s document and add comments
  • Put some guidance in for Tamahau’s discussion section

Book

  • Placed some thematic guidance at the top
  • Moved all the chimp stuff into it’s own section.
  • Work on Moby-Dick. Now at mobydick/section5 in the summary.

DHS PLANXS – Need to review the AI/ML section

Phil10.27.20

Watching this, on Yannic Kilcher’s transformer channel :

GOES

  • Finish E-QiP paperwork – Nope. Gotta fix some things
  • 2:00 Meeting with Vadim.
    • I think we don’t need to determine the sign of the rotation, since there is a single axis of rotation. Going to see if that helps. Nope
    • Just discovered that the deg/rad for get_angle() were reversed! AND THAT FIXED IT
https://viztales.com/wp-content/uploads/2020/10/image-18.png
  • Did a little extra bug hunting and added a commanded rate limiter, which needs to be tweaked.

GPT-2 Agents

  • Sim has the Melville model, hopefully there will be some results as well. There are and it looks pretty good. I also had her do some longer texts to see what effect that has. It doesn’t appear to be that much?
  • 3:30 Meeting
    • Work through options for US model. Show the results from the Arabic full tweet probe. One of the neat things about that approach is that the process can be automated as long as there is access to the DB. It will require a way of breaking the results on the training boundary
    • This makes me want to revisit the MD model and see how taking sentences at random could show alternate(?) plot trajectories. Hmm. Not as clear as I thought it would be.
    • Need to start on the Chinese translator/parser
    • Did some interesting word clouds based on the Arabic tweets

Book

  • Move all the chimp stuff into it’s own section. Start to bring in other animal behavior?
  • Work on Moby-Dick. Found a nice summary here: sparknotes.com/lit/mobydick. Continued writing. Now at mobydick/section5 in the summary.

DHS PLANXS – Need to review the AI/ML section

Phil 10.26.20

I did a thing on Reddit

GOES

  • Time for a E-QiP of paperwork – Can’t log in. Yay! Fixed. Updating, rather than starting from scratch, which is nice
  • Looking at rotation code. Vadim didn’t update. Now he has
  • 2:00 Meeting with Vadim.
    • I think we don’t need to determine the sign of the roatation, since there is a single axis of rotation. Going to see if that helps.

GPT-2 Agents

Book

  • Move all the chimp stuff into it’s own section. Start to bring in other animal behavior?
  • Work on Moby-Dick. Found a nice summary here: sparknotes.com/lit/mobydick

#COVID

  • 3:00 Meeting to go over results and figure out what to do next
    • Went over a lot. I also tried a tweet as the prompt, and it generated very specific, diverging results. I think this is a good mechanism. It could even be that we search through the original corpora to find tweets that reflect what we are interested in, and use those as probes. Uploaded a lot to the shared folder
    • Next meeting is on Nov 3rd (cue ominous music) to discuss next steps

Phil 10.23.20

This never stops being horrible

https://public.flourish.studio/visualisation/3603910/

Book

  • Still spending a lot of on chimpanzee behavior. Writing more about how alliance-building works and starting to set up (finally!) Moby-Dick
  • 4:00 Meeting with Michelle

GOES

  • Got my E-QIP number. Time for a lot of paperwork
  • 10:00 Meeting with Vadim. Good progress, but we’re not *quite* there yet. More on Monday
  • 1:30 Fingerprinting. Leave at 12:30? Done!

Phil 10.22.20

Identifying viral bots and cyborgs in social media

  • For this research, I have applied techniques from complexity theory, especially information entropy, as well as network graph analysis and community detection algorithms to identify clusters of viral bots and cyborgs (human users who use software to automate and amplify their social posts) that differ from typical human users on Twitter and Facebook. I briefly explain these approaches below, so deep prior knowledge of these areas is not necessary. In addition to commercial bots focused on promoting click traffic, I discovered competing armies of pro-Trump and anti-Trump political bots and cyborgs. During August 2017, I found that anti-Trump bots were more successful than pro-Trump bots in spreading their messages. In contrast, during the NFL protest debates in September 2017, anti-NFL (and pro-Trump) bots and cyborgs achieved greater successes and virality than pro-NFL bots.

Social Botnet Community Detection: A Novel Approach based on Behavioral Similarity in Twitter Network using Deep Learning

  • Detecting social bots and identifying social botnet communities are extremely important in online social networks (OSNs). In this paper, we first construct a weighted signed Twitter network graph based on the behavioral similarity and trust values between the participants (i.e., OSN accounts) as weighted edges. The behavioral similarity is analyzed from the viewpoints of tweet-content similarity, shared URL similarity, interest similarity, and social interaction similarity for identifying similar types of behavior (malicious or not) among the participants in the Twitter network; whereas the participant’s trust value is determined by a random walk model. Next, we design two algorithms – Social Botnet Community Detection (SBCD) and Deep Autoencoder based SBCD (called DA-SBCD) – where the former detects social botnet communities of social bots with malicious behavioral similarity, while the latter reconstructs and detects social botnet communities more accurately in presence of different types of malicious activities. Finally, we evaluate the performance of proposed algorithms with the help of two Twitter datasets. Experimental results demonstrate the efficacy of our algorithms with better performance than existing schemes in terms of normalized mutual information (NMI), precision, recall and F-measure. More precisely, the DA-SBCD algorithm achieves about 90% precision and exhibits up to 8% improvement on NMI.

#COVID

  • Need to finish installing all the bits for TF, PT, and HF and see how well the model inference works

GPT-2 Agents

  • Working on Chinese translation and topic extraction. Got most parts working on the Chinese translation, but need to figure out how to use split() on utf-8

GOES

  • 10:00 meeting with Vadim. Cleaned up a lot of the data dictionary and started to look for how we can possibly have a reverse yaw
  • 2:00 Meeting with Jason. Basically a blend of code walkthrough and demo. Seems pretty solid

Phil 10.21.20

Something on attention: How Lyft predicts a rider’s destination for better in-app experience

  • We tackle the destination recommendation problem using the rider’s historical rides. The main idea is to limit candidate recommendations to addresses where the rider has previously taken a Lyft ride to or from. Within this candidate set, we use an attention mechanism (discussed in more detail below), to determine which locations are most relevant to the current session.

A Game Designer’s Analysis Of QAnon

  • Even Q-Anon was only one of several “anons” including FBIanon and CIAanon, etc, etc. Q rose to the top, so it got its own YouTube channels. That tested, so it moved to Reddit. The theories that didn’t work, disappeared while others got up-voted. It’s ingenious. It’s AI with a group-think engine. The group, lead by the puppet masters, decide what is the most entertaining and gripping explanation, and that is amplified. It’s a Slenderman board gone amok.

Book

  • Still working on the relationship between communication, hierarchy, aggression and cult behavior

GOES

  • 10:00 and 1:30 Meeting with Vadim. Everything is almost working. There seems to be a sign problem, where the rotations are the opposite of what they should be. Need to clean up the ddict and then put in more useful stuff for figuring this out and plotting it

GPT-2 Agents

  • Installed Python 3.8 on Dreamhost. Looking to serve up the GPT-2 model if possible

Phil 10.20.20

Two types of aggression in human evolution

  • Two major types of aggression, proactive and reactive, are associated with contrasting expression, eliciting factors, neural pathways, development, and function. The distinction is useful for understanding the nature and evolution of human aggression. Compared with many primates, humans have a high propensity for proactive aggression, a trait shared with chimpanzees but not bonobos. By contrast, humans have a low propensity for reactive aggression compared with chimpanzees, and in this respect humans are more bonobo-like. The bimodal classification of human aggression helps solve two important puzzles. First, a long-standing debate about the significance of aggression in human nature is misconceived, because both positions are partly correct. The Hobbes–Huxley position rightly recognizes the high potential for proactive violence, while the Rousseau–Kropotkin position correctly notes the low frequency of reactive aggression. Second, the occurrence of two major types of human aggression solves the execution paradox, concerned with the hypothesized effects of capital punishment on self-domestication in the Pleistocene. The puzzle is that the propensity for aggressive behavior was supposedly reduced as a result of being selected against by capital punishment, but capital punishment is itself an aggressive behavior. Since the aggression used by executioners is proactive, the execution paradox is solved to the extent that the aggressive behavior of which victims were accused was frequently reactive, as has been reported. Both types of killing are important in humans, although proactive killing appears to be typically more frequent in war. The biology of proactive aggression is less well known and merits increased attention.

GPT-2 Agents

  • Look at topic extraction?
  • Moving models from the data directory to the model directory and sending them to svn
  • 3:30 Meeting
    • Action items:
      • Try translating a few Chinese tweets and comparing them against Google
      • Continue with topic extraction

GOES

  • 9:00 Meeting with Vadim – good progress. Found an important bug

Book

  • Still grappling with cults. While listening to the Drinking with Historians episode on patriotism, with Benjamin Railton. I learned about the Cornerstone Speech, which was Confederate Vice President Alexander H. Stephens’ justification for the South’s rebellion. This line particularly stands out:
    • Our new government is founded upon exactly the opposite idea; its foundations are laid, its corner-stone rests, upon the great truth that the negro is not equal to the white man; that slavery subordination to the superior race is his natural and normal condition. This, our new government, is the first, in the history of the world, based upon this great physical, philosophical, and moral truth.
  • Working out from the idea that a hierarchy depends on violence or the threat of violence, then all this is justified. You do slavery because you can. And the fact that you can, defines your place as superior in the hierarchy.
  • So, at the most primal (literally chimpanzee-level), violence is the most basic mechanism to determine hierarchy. Since a good place in the hierarchy has tremendous benefits, it is worth fighting for, and potentially dying for. The flip side of that is also worth killing for. And I think I can see how that can be a stable state for group activity. It could even be viewed as a behavior attractor that is wired into us as a hierarchical species. The balancing act for us is to work out how to not wind up in this state, even though it seems to be an attractive answer, particularly for those who feel as though they are loosing their high place in the hierarchy.

Phil 10.19.20

GPT-2 Agents

  • Updating text to include
date, label, author, followers, following, influence_score, country, city, category, gender, post_type, translation
  • I think it’s better. A sample now reads
In December of 2019, @sparrow_e2 (7 followers, 250 following) posted a retweet, mentioning @b0_r9. It was sent from Saudi Arabia. @sparrow_e2 wrote: "RT @b0_r9 watch. ♪ Squirrels drowned in #Jeddah ♪ ♪ Now here shows the role ♪ ♪ Study suspension and drainage ♪ https://t.co/m3kIOZ0Gdk". The retweet was categorized as "Neutral".
  • Started training on the new dataset. Done! Here are some results:
  • “In December of 2019”
____MZN wrote "The United Arab Emirates today officially announces that the first case of the new Koruna virus has been confirmed, and the Ministry of Health and Social Protection confirms that the health status of the infected is stable and under medical observation.."
  • “In February of 2020”
ixoG5 wrote "RT @EremNews, a Yemeni student, crying out for not being evacuated and her colleagues from Wuhan, China. She addresses Yemeni officials: If you are unable to protect 180 students, how will you protect a country? #Term_News # Yemen # COVID19 #Coronaviruschina https://t.co/6wcevarxvQ."
  • “In March of 2020”
____2____2___2 wrote "If God wills, the Corona virus will remain invisible, because there is no way to detect it with the naked eye. God knows that because the veil of the Muslim nation remains the law of the polytheists, that even the hurricanes that hit the earth are not seen by God's wrath.."
  • “Is corona the will of Allah? “
In February of 2020, @Bader_Zayed wrote "RT @makkahnp Corona virus in China... the death toll rose to 80 and more than 11 thousand infections # Makkah Newspaper https://t.co/lxW8L9uE8X." In February of 2020, @Hn6YmR9cV4H1fv6 wrote "RT @jafarAbdeel A &quot;Chinese Muslim&quot; woman talks about how Muslims in China deal with # Corona disease https://t.co/vzW1dQy2Xm."

GOES

  • 2:00 Meeting with Vadim – nope
  • GFC Environmental training