Free energy is the difference between the states you expect to be in and the states your sensors tell you that you are in. Or, to put it another way, when you are minimizing free energy, you are minimizing surprise.
Running the training for the new models
Added the meta-summary spreadsheet:
Need to re-run these tests on the new models using more runs and no rank testing
9:30 Meeting – Looks like I need to get 50% coverage? Maybe in medical?
More Pytorch tutorial
Need to upgrade the ASRC box to 1.8 when it finishes training the current models
Quotebank is a dataset of 178 million unique, speaker-attributed quotations that were extracted from 196 million English news articles crawled from over 377 thousand web domains between August 2008 and April 2020. The quotations were extracted and attributed using Quobert, a distantly and minimally supervised end-to-end, language-agnostic framework for quotation attribution.
Stanford Cable TV News AnalyzerThe Stanford Cable TV Analyzer enables you to write queries that compute the amount of time people appear and the amount of time words are heard in cable TV news. In this tutorial we will go over the basics of how to use the tool to write simple queries.
Finished experiments and generated spreadsheets.
Uploading everything to DropBox
Create datasets from tweets that have [‘%kung flu%’, ‘%kungflu%’, ‘%china virus%’, ‘%chinavirus%’, ‘%coronavirus%’, ‘%covid%’, ‘%sars-cov-2%’] and train models from these. The idea is to examine how this type of polarized training can influence the response of the model. Related work on Microsoft’s Tay
Create a meta-sheet for all the spreadsheet summaries
Rather than look at rankings, go back to the cumulative stats on multiple runs with top K set to the range of ranks that we want to look at, then take a look at the first n words. This addresses the token problem
Set up proxy (2:00)?
Write up curves embedding code
Start on simplest possible autoregressing Transformer using curve data
The community is very much on the implementation part of ML. Aerospace corporation is doing some really nice work merging synthetic and actual data to detect threat anomalies. Slingshot is doing really nice data fusion
I had an interesting ide come to me during the panel. It might be possible to train a large Transformer model on all mission telemetry from launch to sunset for all satellites. Then you could do zero-shot detection on new data, just like the GPT-3 does.
Working on getting the meta information back to the summary tab – done
Run all models – done
I think I know how I want to try the mapping.
Use a prompt that should produce a list of nouns in order
Have the temp set reasonably high and for repetition to be low
Look at the output text and look for a N-N-N… pattern. Select those as nodes and stop when the pattern changes
Repeat and increment the edge weight for each redundant connection
Trim the leaf nodes with low counts
Ping Clay about how much of my time I can bill based on current rates
Create generic multidimensional vectors for training
Arkipelago.space is a searchable map of interesting things on the Internet. The content is taken from a web crawl of 70,000 webpages originating from high-quality, human-curated links via Curius.app. A neural network uses the text content of each page to determine which pages should appear near each other on the map.
It seems to be a bunch of students playing around with cool things
Huggingface has lots of models to handle speech tagging!
Moral outrage shapes fundamental aspects of human social life and is now widespread in online social networks. Here, we show how social learning processes amplify online moral outrage expressions over time. In two pre-registered observational studies of Twitter (7,331 users and 12.7 million total tweets) and two pre-registered behavioral experiments (N = 240), we find that positive social feedback for outrage expressions increases the likelihood of future outrage expressions, consistent with principles of reinforcement learning. We also find that outrage expressions are sensitive to expressive norms in users’ social networks, over and above users’ own preferences, suggesting that norm learning processes guide online outrage expressions. Moreover, expressive norms moderate social reinforcement of outrage: in ideologically extreme networks, where outrage expression is more common, users are less sensitive to social feedback when deciding whether to express outrage. Our findings highlight how platform design interacts with human learning mechanisms to impact moral discourse in digital public spaces.
In EccoToXlsx, add code to iterate over all the samples from a prompt and add selected token ranks for the selected columns to a summary Dict. Compute mean and variance (95% intervals?), display the table and plot a candlestick plot.
Set up a mapping directory in GPT-2 Agents. Do some test pulls using the Python API. I think the goal should be to populate a database that is similar to the gpt2_chess db table_moves (from, to, probe, response),
Combined with table_output from gpt_experiments (experiment_id, root_id, tag, before_regex, and after_regex):
Work on chapters
Work on fast sim
Finish moving code from frame3d_test file to FastRCSGenerator. Keep the plots too, just to make sure everything’s working. Done
Realized that the pitch/roll/yaw calculations were being done by ODE, so I had to get them back from the quaternion. It turns out that pyquaternion has yaw_pitch_roll(), but I can’t get to it? Added it to the VecData code
Figured it out. The @property decorator means no parens. You treat a method as a variable
I don’t think I’m incrementally updating setting the quaternion right.
Turns out I was rotating twice and storing the incremental steps as the rotations. Fixed!
WaPo summary article: What explains MAGA supporters’ commitment to Trump and his conspiratorial and racist views? The answer is “status threat,” or the belief that one’s way of life or status is undermined by social and cultural change. As we’ve shownelsewhere, those who are attracted to reactionary movements like MAGA are often motivated by anxiety about possible cultural dispossession — seeing their social and cultural dominance eclipsed by other groups.
Create a new class based on utils/ScriptReaderScratch that uses the the code from least_squares_rotations.py to create data for training
Attend the GSAW welcome and overview at 11:50 – missed it
Create a more generic generator based on timeseriesML2\generators that will create a numpy ndarray of n-dimensional times series data. Could also use a Dataframe and have labels.
Randomized start, within a range
Adjustable time step
Different function for each row
Input file driven
Saves to csv (with a header that describes the data?) or an excel file for humans. Use the to_excel() code from EccoToXlsx for this
Run an Ecco experiment and create spreadsheets using the chess data – done
After that, back up the gpt_experiments and commit to svn – done
Make sure that the following are on the laptop for the 3:00 Meeting -done
Uploading trained models to svn. When the last one is done, zip the whole batch and put it on DropBox
I think I know how to contribute to a project that I am not a member. I need to clone the project to my repo and work on that version. When I’m at a state that I like, then I can do a pull request. That means there are going to be one version of the source project in External and my branch in Sandboxes
I reran my monthly COVID-19 visualizations. Here’s my sample of countries. The UK is at the top of the ‘badly handled’ cluster, which includes the USA, Italy, Sweden, France and Switzerland. Germany is a bit better, and Canada really seems to be keeping things under control. The bottom cluster ranges from Finland to Senegal to China. Effective policy doesn’t seem to be related to government, wealth, population or location:
And here’s all 50 states plus territories. I switch between Republican and Democratic governors at the end. You can see that there’s not much difference except for Georgia. Something has gone horribly wrong there:
Running Ecco trend analysis with the new model that Sim made
I think there is a multiple embedding problem that we’ll need to address.
It looks really good though…
Still training monthly models. At October 2020 now. It takes a bit under 10 hours to train most models