Monthly Archives: March 2021

Phil 3.10.21

Zach found a cool article: The Genius Neuroscientist Who Might Hold the Key to True AI

Free energy is the difference between the states you expect to be in and the states your sensors tell you that you are in. Or, to put it another way, when you are minimizing free energy, you are minimizing surprise.

GPT-Agents

Running the training for the new models
Added the meta-summary spreadsheet:

https://viztales.com/wp-content/uploads/2021/03/image-10.png

Need to re-run these tests on the new models using more runs and no rank testing

SBIR

9:30 Meeting – Looks like I need to get 50% coverage? Maybe in medical?
More Pytorch tutorial
Need to upgrade the ASRC box to 1.8 when it finishes training the current models
Found my svncopy.bat file. It’s in JavaUtils2

GOES

3:00 Meeting

Phil 3.9.2021

Quotebank is a dataset of 178 million unique, speaker-attributed quotations that were extracted from 196 million English news articles crawled from over 377 thousand web domains between August 2008 and April 2020. The quotations were extracted and attributed using Quobert, a distantly and minimally supervised end-to-end, language-agnostic framework for quotation attribution.

Stanford Cable TV News Analyzer The Stanford Cable TV Analyzer enables you to write queries that compute the amount of time people appear and the amount of time words are heard in cable TV news. In this tutorial we will go over the basics of how to use the tool to write simple queries.

GPT Agents

Finished experiments and generated spreadsheets.
Uploading everything to DropBox
3:00 Meeting
- Create datasets from tweets that have [‘%kung flu%’, ‘%kungflu%’, ‘%china virus%’, ‘%chinavirus%’, ‘%coronavirus%’, ‘%covid%’, ‘%sars-cov-2%’] and train models from these. The idea is to examine how this type of polarized training can influence the response of the model. Related work on Microsoft’s Tay
- Create a meta-sheet for all the spreadsheet summaries
- Rather than look at rankings, go back to the cumulative stats on multiple runs with top K set to the range of ranks that we want to look at, then take a look at the first n words. This addresses the token problem

SBIR

Set up proxy (2:00)?
Write up curves embedding code
Start on simplest possible autoregressing Transformer using curve data
Started on the PyTorch Quickstart. Everything is installed properly and Cuda is visible

Phil 3.8.21

GSAW today

The community is very much on the implementation part of ML. Aerospace corporation is doing some really nice work merging synthetic and actual data to detect threat anomalies. Slingshot is doing really nice data fusion
I had an interesting ide come to me during the panel. It might be possible to train a large Transformer model on all mission telemetry from launch to sunset for all satellites. Then you could do zero-shot detection on new data, just like the GPT-3 does.

GPT-Agents

Working on getting the meta information back to the summary tab – done
Run all models – done
I think I know how I want to try the mapping.
- Use a prompt that should produce a list of nouns in order
- Have the temp set reasonably high and for repetition to be low
- Look at the output text and look for a N-N-N… pattern. Select those as nodes and stop when the pattern changes
- Repeat and increment the edge weight for each redundant connection
- Trim the leaf nodes with low counts

SBIR

Ping Clay about how much of my time I can bill based on current rates
Create generic multidimensional vectors for training
Yannic Kilcher’s walkthrough of Attention Is All You Need

Phil 3.6.21

https://twitter.com/noahtren/status/1368114923956535296

Arkipelago.space is a searchable map of interesting things on the Internet. The content is taken from a web crawl of 70,000 webpages originating from high-quality, human-curated links via Curius.app. A neural network uses the text content of each page to determine which pages should appear near each other on the map.

It seems to be a bunch of students playing around with cool things

Huggingface has lots of models to handle speech tagging!

Main model page: huggingface.co/models?filter=sequence-tagger-model&pipeline_tag=token-classification
The standard phrase chunking model for English that ships with Flair: huggingface.co/flair/chunk-english
The fast 4-class NER model for English that ships with Flair: huggingface.co/flair/ner-english-fast
The standard part-of-speech tagging model for English that ships with Flair: huggingface.co/flair/pos-english

Phil 3.5.21

This is a lot like self-attention in Transformers: How social learning amplifies moral outrage expression in online social networks

Moral outrage shapes fundamental aspects of human social life and is now widespread in online social networks. Here, we show how social learning processes amplify online moral outrage expressions over time. In two pre-registered observational studies of Twitter (7,331 users and 12.7 million total tweets) and two pre-registered behavioral experiments (N = 240), we find that positive social feedback for outrage expressions increases the likelihood of future outrage expressions, consistent with principles of reinforcement learning. We also find that outrage expressions are sensitive to expressive norms in users’ social networks, over and above users’ own preferences, suggesting that norm learning processes guide online outrage expressions. Moreover, expressive norms moderate social reinforcement of outrage: in ideologically extreme networks, where outrage expression is more common, users are less sensitive to social feedback when deciding whether to express outrage. Our findings highlight how platform design interacts with human learning mechanisms to impact moral discourse in digital public spaces.

Book

2:00 Meeting with Michelle

GPT-Agents

Finish summary table – Mostly done. Needs tweaking
3:30 Meeting

GOES

11:00 Meeting
Continue working on data generation – generating faulty rw sims!

Phil 3.4.21

I wonder if any crazy things are going to happen today? Capitol Police say intelligence shows militia group may be plotting to breach the Capitol

GPT-Agents

In EccoToXlsx, add code to iterate over all the samples from a prompt and add selected token ranks for the selected columns to a summary Dict. Compute mean and variance (95% intervals?), display the table and plot a candlestick plot.
Set up a mapping directory in GPT-2 Agents. Do some test pulls using the Python API. I think the goal should be to populate a database that is similar to the gpt2_chess db table_moves (from, to, probe, response),

Combined with table_output from gpt_experiments (experiment_id, root_id, tag, before_regex, and after_regex):

Book

Work on chapters

GOES

Work on fast sim
- Finish moving code from frame3d_test file to FastRCSGenerator. Keep the plots too, just to make sure everything’s working. Done
- Realized that the pitch/roll/yaw calculations were being done by ODE, so I had to get them back from the quaternion. It turns out that pyquaternion has yaw_pitch_roll(), but I can’t get to it? Added it to the VecData code
  - Figured it out. The @property decorator means no parens. You treat a method as a variable
- I don’t think I’m incrementally updating setting the quaternion right.
- Turns out I was rotating twice and storing the incremental steps as the rotations. Fixed!

Phil 2.3.21

Panel Study Of The MAGA Movement

WaPo summary article: What explains MAGA supporters’ commitment to Trump and his conspiratorial and racist views? The answer is “status threat,” or the belief that one’s way of life or status is undermined by social and cultural change. As we’ve shown elsewhere, those who are attracted to reactionary movements like MAGA are often motivated by anxiety about possible cultural dispossession — seeing their social and cultural dominance eclipsed by other groups.

This is pretty cool! Not sure if it will work right, but…? Configure remote Python interpreters

Book

Work on chapters

GPT-Agents

Finished all the models!
Set up experiments that run through each model for each set of terms and set of probes. Batch size of 50

SBIR

GOES

Sitting in on GSAW keynote
Vadim has made progress! 11:00 Meeting
2:00 Meeting
Work on fast sim
- Created data_generators project in PyBullet
- Copied ScriptReaderScratch to FastRCSGenerator
- Copied over the classes in least_squares_rotations (VecData, Rwheel, Rwheels, and Frame3D) and made them their own files
- wrote up a frame3d_test file to exercise the classes and make sure that I haven’t broken anything. Everything still works!
Get connected to repo?
More on setting up a BERT-style (autoencoding) transformer for time series. Vector of sin waves at different frequencies first

JuryRoom

5:00 Meeting? Or just online?

Phil 3.2.21

Respond to Alden’s email done

AI Coffee Break with Letitia

GOES

Status report! Done!
Create a new class based on utils/ScriptReaderScratch that uses the the code from least_squares_rotations.py to create data for training
Attend the GSAW welcome and overview at 11:50 – missed it
Create a more generic generator based on timeseriesML2\generators that will create a numpy ndarray of n-dimensional times series data. Could also use a Dataframe and have labels.
- Randomized start, within a range
- Adjustable noise
- Adjustable time step
- Different function for each row
- Input file driven
- Saves to csv (with a header that describes the data?) or an excel file for humans. Use the to_excel() code from EccoToXlsx for this

GPT Agents

Run an Ecco experiment and create spreadsheets using the chess data – done

https://viztales.com/wp-content/uploads/2021/03/image-3.png

After that, back up the gpt_experiments and commit to svn – done
Make sure that the following are on the laptop for the 3:00 Meeting -done
- updated gpt_experiments
- small_feb2021
Uploading trained models to svn. When the last one is done, zip the whole batch and put it on DropBox
I think I know how to contribute to a project that I am not a member. I need to clone the project to my repo and work on that version. When I’m at a state that I like, then I can do a pull request. That means there are going to be one version of the source project in External and my branch in Sandboxes

Phil 3.1.2021

I reran my monthly COVID-19 visualizations. Here’s my sample of countries. The UK is at the top of the ‘badly handled’ cluster, which includes the USA, Italy, Sweden, France and Switzerland. Germany is a bit better, and Canada really seems to be keeping things under control. The bottom cluster ranges from Finland to Senegal to China. Effective policy doesn’t seem to be related to government, wealth, population or location:

https://public.flourish.studio/visualisation/4504138/

And here’s all 50 states plus territories. I switch between Republican and Democratic governors at the end. You can see that there’s not much difference except for Georgia. Something has gone horribly wrong there:

https://public.flourish.studio/visualisation/4303726/

GPT Agents

Running Ecco trend analysis with the new model that Sim made
- I think there is a multiple embedding problem that we’ll need to address.
- It looks really good though…

https://viztales.com/wp-content/uploads/2021/03/image-1.png

Still training monthly models. At October 2020 now. It takes a bit under 10 hours to train most models