Category Archives: Machine Learning

Phil 8.13.20

Ride through the park today and ask about pavilion rental – done

Māori Pronunciation

Iñupiaq (Inupiatun)

GPT-2 Agents

  • Rewrite intro, including the finding that these texts seem to be matched in some way. Done
  • Uploaded the new version to ArXiv. Should be live by tomorrow
  • Read Language Models as Knowledge Bases, and added to the lit review.
  • Discovered Antoine Bosselut, who was lead author on the following papers.Need to add them to the future work section
    • Dynamic Knowledge Graph Construction for Zero-shot Commonsense Question Answering
      • Understanding narratives requires dynamically reasoning about the implicit causes, effects, and states of the situations described in text, which in turn requires understanding rich background knowledge about how the social and physical world works. At the core of this challenge is how to access contextually relevant knowledge on demand and reason over it.
        In this paper, we present initial studies toward zero-shot commonsense QA by formulating the task as probabilistic inference over dynamically generated commonsense knowledge graphs. In contrast to previous studies for knowledge integration that rely on retrieval of existing knowledge from static knowledge graphs, our study requires commonsense knowledge integration where contextually relevant knowledge is often not present in existing knowledge bases. Therefore, we present a novel approach that generates contextually relevant knowledge on demand using generative neural commonsense knowledge models.
        Empirical results on the SocialIQa and StoryCommonsense datasets in a zero-shot setting demonstrate that using commonsense knowledge models to dynamically construct and reason over knowledge graphs achieves performance boosts over pre-trained language models and using knowledge models to directly evaluate answers.
    • COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
      • We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward automatic commonsense completion is the development of generative models of commonsense knowledge, and propose COMmonsEnse Transformers (COMET) that learn to generate rich and diverse commonsense descriptions in natural language. Despite the challenges of commonsense modeling, our investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs. Empirical results demonstrate that COMET is able to generate novel knowledge that humans rate as high quality, with up to 77.5% (ATOMIC) and 91.7% (ConceptNet) precision at top 1, which approaches human performance for these resources. Our findings suggest that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.


  • 10:00 sim status meeting – planning to fully evaluate off-axis rotation by Monday, then characterize Rwheel contribution, adjust the control system and start commanding vehicle rotations by the end of the week? Seems ambitions, but what the hell.
  • 2:00 status meeting
  • Anything about GVSETS? Yup: Meeting Wed 9/16/2020 9:00 AM – 10:00 AM


  • 5:30 meeting. Discuss proposal and additional meetings


  • Transfer more content

Phil 8.12.20

4:00 meeting with Marissa


  • Nearly 150k tweets translated
  • Meeting today at 3:00
    • For next week, create a spreadsheet with 150 random tweets for translation tests (Pearson’s correlation?)


  • Creating code that dumps probe results to files for better word clouds – done.
  • Boy, that produced some unexpected results. I changed the probe to “The man/woman walked into the room.”, and ran 100 probes for each, initializing with the same seed on the 117 million parameter GPT-2, which seems plenty large for these types of experiments.
  • The big surprise if how similar the start of the text is for each, then how it diverges. Here’s the first result for each probe (all text examples are ordered man, then woman):
    • This was about the time that an old friend of mine made his move and was going to be a huge help for me. I thought to myself and began crying. The thought of an old friend crying seemed almost too much, but I didn’t want to hurt him as much. My mind was racing, but all I could think about was how much the boy was hurting right now. He still wasn’t strong,
    • This was about the time that an accident really occurred. She had just turned up a couple hours earlier at the home they owned and hadn’t seen her since. She had a bad haircut and she couldn’t sit still and look after herself. When the accident happened, she didn’t even have her hair cut. She was so exhausted, she started crying. She was so exhausted and crying. After
  • And here’s the last:
    • He spoke, “I can’t believe there’s this situation here. This is very, very weird.” I told him I felt scared, and he replied, “You’re probably right about that, but I don’t know what it’s about.” He didn’t respond, and I was left thinking, “I’m sure I’m not alone in this crazy situation.” The next day, I went back to meet him. “He’s
    • She spoke, “I can’t believe there’s this situation here. This is very, very painful.” I told her I felt scared and I was upset. She said “I’ll tell you what I saw when I saw him.” I told her “He’s so skinny but the black hair looks gorgeous, he looks beautiful. “She started crying and began giving me some food and asking if I were alright. “
  • Here’s another good pairing:
    • It wasn’t that it was a bad situation. He just wasn’t feeling it. He felt that he wasn’t going to get laid, and if anything, he didn’t think it would help him get off. “We’ll go, then,” the woman said. There was still an argument at the back, but now it wasn’t too much worse. The woman had been arguing with the man, but the man was not
    • It wasn’t that it was a bad situation. That just wasn’t the case. She was just a little shy and reserved and didn’t really need anything to do with it. I had been on the phone with Amy for the last week. When I found out that she wanted to join me in our recent trip I was pretty bummed out. That’s when Amy started to feel bad about herself. For
  • The pairs don’t always begin with the same words. This pair still seems to have a legal/criminal context:
    • He handed out a number of hand-written handbags, which he claimed could be used to carry out a search that began when he first moved into the room. The door was then locked and the man went into the bathroom, which he said was the safest place. It was the first time the intruder had ever been caught, and the man said it was the second time they’d been spotted outside the residence. The man was
    • The two officers were holding their guns. As the woman made her way to her seat, she saw two men on a motorcycle walking towards her. She asked the man why he was not in the car with her. The man explained that he was afraid of the two men driving. The officers explained that she had to have sex and to stay with the men. The woman was terrified of the officers as the men drove away with their cameras and other equipment
  • It’s like the model start in similar places, but pointing in a slightly different direction. It seems to be important to run probes in identical sequences to have more insight into the way the model is perceiving the probes.


  • 1:30 Meeting with Vadim. He’s don an on-axis mass test and will do an off-axis test next. I showed him the quaternion frame tracker
  • 2:00 Status meeting


  • Start moving chapters! Making progress!

Phil 8.11.20

Zero-Shot Learning in Modern NLP

  • In this post, I will present a few techniques, both from published research and our own experiments at Hugging Face, for using state-of-the-art NLP models for sequence classification without large annotated training sets.

Found a really good dashboard for US economic indicators:



  • I think I realize my problem about the second axis. It’s not rotating around the origin, so the vectors that I’m using to create the rotation vectors are not right.
  • Fixed! Here are some rotations (180 around Z, 90 around x, and 360 around z, 180 around x)

GPT-2 Agents

  • I did 11 runs of S/He walked into the room and made word clouds:
  • I’m going to re-run this on my GPT-2 so I can have a larger N. Just need to do some things to the test code to output to a file


  • Finished the last review. The last paper was an ontological model with no computation in it
  • Uploaded and finished!

ML seminar

  • I have access to the Twitter data now. Need to download and store it in the db
  • Presentation next week

Phil 8.10.20

Really good weekend. I feel almost recharged to this time last week 🙂

Language Models as Knowledge Bases (Via Shimei)

  • Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as “fill-in-the-blank” cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at this https URL.


  • Currently at 103, 951 tweets translated


  • Write reference section – done


  • I need to do an incremental rotation to track the reference points from last week
  • Still having problems with the secondary rotation. I’m clearly doing something basic wrong
  • Meeting with Vadim

GPT-2 Agents

  • Create a word cloud for multiple passes of “She came into the room”
  • Add something about the place for qualitative research in a Language model sociology. Outliers are the places that the models learn to ignore. So traditional research will be the way that these marginalized populations are not forgotten.
  • Screwed up the ArXiv bibliography submission. Fixed


  • Started reading the last paper, which is on <shudder> ontologies

Phil 8.7.20


  • The Arabic translation program is chunking along. It’s translated over 27,000 tweets so far. I think I’m seeing the power and risks of AI/ML in this tiny example. See, I’ve been programming since the late 1970’s, in many, many, languages and environments, and the common thread in everything I’ve done was the idea of deterministic execution.  That’s the idea that you can, if you have the time and skills, step through a program line by line in a debugger and figure out what’s going on. It wasn’t always true in practice, but the idea was conceptually sound.
  • This translation program is entirely different. To understand why, it helps to look at the code:


  • This is the core of the code. It looks a lot like code I’ve written over the years. I open a database, get some lines, manipulate them, and put them back. Rinse, lather, repeat.
  • That manipulation, though…
  • The six lines in yellow are the Huggingface API, which allow me to access Microsoft’s Marian Neural Machine Translation models, and have them use the pretrained models generated by the University of Helsinki. The one I’m using translates Arabic (src = ‘ar’) to English (trg = ‘en’). The lines that do the work are in the inner loop:
    batch = tok.prepare_translation_batch(src_texts=[d['contents']])
    gen = model.generate(**batch)  # for forward pass: model(**batch)
    words: List[str] = tok.batch_decode(gen, skip_special_tokens=True)
  • The first line is straightforward. It converts the Arabic words to tokens (numbers) that the language model works in. The last line does the reverse, converting result tokens to english.
  • The middle line is the new part. The input vector of tokens is goes to the input layer of the model, where they get sent through a 12-layer, 512-hidden, 8-heads, ~74M parameter model. Tokens that can be converted to English pop put the other side. I know (roughly) how it works at the neuron and layer level, but the idea of stepping through the execution of such a model to understand the translation process is meaningless.
  • In the time it took to write this, its translated about 1,000 more tweets. I can have my Arabic-speaking friends to a sanity check on a sample of these words, but we’re going to have to trust the overall behavior of the model to do our research in, because some of these systems only work on English text.
  • So we’re trusting a system that we cannot verify to to research at a scale that would otherwise be impossible. If the model is good enough, the results should be valid. If the model behaves poorly, then we have bad science. The problem is right now there is only one Arabic to English translation model available, so there is no way to statistically examine the results for validity.
  • And I guess that’s really how we’ll have to proceed in this new world where ML becomes just another API. Validity of results will depend on diversity on model architectures and training sets. That may occur naturally in some areas, but in others, there may only be one model, and we may never know the influences that it has on us.


  • More quaternions. Need to do multiple axis movement properly. Can you average two quaternions and have something meaningful?
  • Here’s the reference frame with two rotations based off of the origin, so no drift. Now I need to do an incremental rotation to track these points:


GPT-2 Agents

  • Start digging into knowledge graphs

Phil 8.6.20

Coronavirus: The viral rumours that were completely wrong (BBC)

An ocean of Books (Google Arts & Culture Experiments)


Hopfield Networks is All You Need

  • We show that the transformer attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield network can store exponentially (with the dimension) many patterns, converges with one update, and has exponentially small retrieval errors. The number of stored patterns is traded off against convergence speed and retrieval error. The new Hopfield network has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they operate in higher layers in metastable states. The gradient in transformers is maximal for metastable states, is uniformly distributed for global averaging, and vanishes for a fixed point near a stored pattern. Using the Hopfield network interpretation, we analyzed learning of transformer and BERT models. Learning starts with attention heads that average and then most of them switch to metastable states. However, the majority of heads in the first layers still averages and can be replaced by averaging, e.g. our proposed Gaussian weighting. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information created in lower layers. These heads seem to be a promising target for improving transformers. Neural networks with Hopfield networks outperform other methods on immune repertoire classification, where the Hopfield net stores several hundreds of thousands of patterns. We provide a new PyTorch layer called “Hopfield”, which allows to equip deep learning architectures with modern Hopfield networks as a new powerful concept comprising pooling, memory, and attention. GitHub: this https URL

Can GPT-3 Make Analogies?. By Melanie Mitchell | by Melanie Mitchell | Aug, 2020 | Medium


  • Going to try to get the translator working and inserting best effort into the DB. They we can make queries for the good results. Done! Here’s a shot of it chunking away. About one translation a second:



  • Work on quaternion frame tracking
  • This might help with visualization:
  • Updating my work box. Had a weird experience upgrading pip. It hit a permissions issue and failed out without rolling back. I had to use to get it back
  • Looking good:



  • 5:30(?) meeting
  • Project grant application


  • Write review – done. One to go!


Phil 8.5.20

Wajanat’s defense at 10:00!

Train your TensorFlow model on Google Cloud using TensorFlow Cloud


How QAnon Creates a Dangerous Alternate Reality

  • Game designer Adrian Hon says the conspiracy theory parallels the immersive worlds of alternate reality games.

GPT-2 Agents

  • Finish the results section – done!. Need to do Discussion (done!), Future Work (done!), and Conclusions(done!)
  • Looked on Scholar for “language model sociology GPT” and didn’t find anything, so I’m hopeful that this is still a pretty novel idea


  • Add in more content to the Overleaf project


  • 2:00 Meeting

#COVID group 4:30

  • Write translator code for tomorrow and get that running

Read paper 5 – done. Started great but no results section!

Phil 8.4.20

Vadim is on vacation, so I’m going to focus on my paper. When I get back to the angle interpolation, I need to make sure that I can rotate a point in and plane using the cross product vector + angle technique. I’m pretty sure that having the start vec X stop vec gives me a right hand vector which should have the direction I want to rotate built in. Anyway, that’s your job to figure out, future self!

Talking to Stacy about podcasts,and listened to her suggestion of Unladylike, For some reason, that made me think of the accessibility of the arguments and suggestions for how to make feminism work. There is this paper, Past, Present and Future of User Interface Software Tools,  that talks about this idea of threshold (the amount of work to achieve basic competency) and ceiling (the maximum capability of the system). Political systems are a population-scale interface, and these concepts should apply?

GPT-2 Agents

  • Add something to graph creation that talks about how the network has a roughly topological relationship to the chessboard. The orientation can be rotated or flipped,and it resembles a rubber sheet, but adjacent parts are generally adjacent.
  • Write up navigation results section. Introduce what it means to navigate, then the algorithms, then the plot on the chessboard of the two legal routes. Note that the two moves are linear diagonals in the actual and reconstructed chessboard
  • In the discussion, emphasize how the chess language model is an embodiment of human bias that is encoded in the trajectories that are chosen, like the two-square first (rook)  move
  • Learning how to do pseudocode in LaTeX. Trying out algorithm2e. I think it actually looks pretty good.
  • Mostly finished the results section. Need to do Discussion, Future Work, and Conclusions tomorrow

ML Seminar

  • Good meeting. I might have access to Twitter COVID data!
  • I also realized that it is August and not September. Which means that instead of a week until submission, I have a MONTH and a week until submission

Write review for paper #4 – done! Two to go

Phil 8.3.20

I found Knuth’s version of “how to write a paper”!


GPT-2 Agents

  • Writing paper


  • Status report – done
  • More quaternions. Got the reference frame doing what I want:


  • Here it’s starting at -45 (rotated around the Y axis) and 0, rotated around the Z. The Z axis is rotated 10 degrees per step. When Z is between 90 and 180, Y is rotated to 0. When Z > 180, Y is set to 45
  • I’ve started to add the tracking, and it’s close-ish:


ICTAI 2020

  • Starting next paper – finished reading. It’s pretty bad…

Phil 7.29.20

Call bank – 1-800-399-5919 Opt.2

Mindfulness is the intentional use of attention – Stanford business school professor Dr. Laurie Weiss (maybe this?). From Commonwealth Club podcast, second half

I think the difference between intentional and unintentional attention is an important part of AI and collective thought. Machine learning is starting to exploit unintentional attention. It’s reflexive. A population which is not being intentional in their attention is more easily herded.

SimplE Embedding for Link Prediction in Knowledge Graphs

  • Knowledge graphs contain knowledge about the world and provide a structured representation of this knowledge. Current knowledge graphs contain only a small subset of what is true in the world. Link prediction approaches aim at predicting new links for a knowledge graph given the existing links among the entities. Tensor factorization approaches have proved promising for such link prediction problems. Proposed in 1927, Canonical Polyadic (CP) decomposition is among the first tensor factorization approaches. CP generally performs poorly for link prediction as it learns two independent embedding vectors for each entity, whereas they are really tied. We present a simple enhancement of CP (which we call SimplE) to allow the two embeddings of each entity to be learned dependently. The complexity of SimplE grows linearly with the size of embeddings. The embeddings learned through SimplE are interpretable, and certain types of background knowledge can be incorporated into these embeddings through weight tying. We prove SimplE is fully expressive and derive a bound on the size of its embeddings for full expressivity. We show empirically that, despite its simplicity, SimplE outperforms several state-of-the-art tensor factorization techniques. SimplE’s code is available on GitHub at

Knowledge base construction

  • Knowledge base construction (KBC) is the process of populating a knowledge base (KB) with facts (or assertions) extracted from data (e.g., text, audio, video, tables, diagrams, …). For example, one may want to build a medical knowledge base of interactions between drugs and diseases, a Paleobiology knowledge base to understand when and where did dinosaurs live, or a knowledge base of people’s relationships such as spouse, parents or sibling. DeepDive can be used to facilitate KBC.

GPT-2 Agents

  • More writing
  • Make a version of the chessboard with the coarse and granular trajectories


  • Adjust chapters


  • Continue working on mapping transitions between coordinate frames
  • Plotting rotations – it’s working, though not exactly in the way I was expecting:quats
  • Duh. That’s the overlap of the positive X and Z points, which are in the same plane and 90 degrees out of phase
  • 2:00 Meeting
    • Status and schedules


  • Finished reading the next paper. Time to write up

Phil 7.27.20

I had a good weekend. Got to ride in the mountains. Actually finished my chores, to I didn’t get to paying bills. Saw my sister – outside, 8′ apart, much more careful than last time. Went on a date.

Translating Embeddings for Modeling Multi-relational Data

  • We consider the problem of embedding entities and relationships of multi-relational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose, TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.


  • Back to draft zero – grinding along


  • Check out Vadim’s rwheel results today.
  • Work on calculating the contributions from the rwheels to rotation around an arbitrary vector


  • Write up second review – done!
  • Started on third paper

Phil 7.24.20

I had home-grown tomatoes this morning!

And I hung up my shiny new diploma!

GPT-2 Agents

  • I think it’s time to start writing the paper. Something like Synthetic Agents in Language Models: Navigating belief
    • Using the IEEE(ACSOS) template
    • Set up the paper with authors and dummy text. Starting to fill in the pieces
  • Writing the methods section and needed to count the number of games (#draw + #resigns). The easiest way to do this was jut to count all the word frequencies. Here are the top terms:
    to : 1474559
    from : 1472081
    moves : 1472071
    white : 1062561
    black : 1056840
    pawn : 392494
    in : 330044
    move : 307701
    takes : 307166
    rook : 258476
    knight : 250998
    bishop : 225442
    queen : 175254
    king : 173837
    pawn. : 145164
    check. : 91512


  • The list goes on a while. The most mentioned squares are d4 (56,224), d5(53,986), and f6(48,772)

God help me, I’m updating my IDE


  • Asked Vadim to exercise the satellite through +/- 90
  • Need to start working on the mapping of rwheels to inertial(?) frame. The thing is, the yaw axis rotates 360 degrees every day, so what frame do we use? My thinking is that the inertial frame (as defined by the star tracker) is unchanging, but we have a rotating frame inside that . The satellite’s moves are relative to that rotating frame plus the inertial frame. So the satellite’s first task is to keep its orientation relative to the rotating frame, then execute commands with respect to that frame. So a stacked matrix of inertial frame, Earth frame, vehicle matrix and then a matrix for each of the rwheels?

Phil 7.23.20

Amid a tense meeting with protesters, Portland Mayor Ted Wheeler tear-gassed by federal agents

GPT-2 Agents

  • Good back-and-forth with Antonio about venues
  • It struck me that statistical tests about fair dice might give me a way of comparing the two populations. Pieces are roughly equivalent to dice sides. Looking at this post on the RPG Stackexchange. That led me to Pearson’s Chi-square test (which rang a bell as the sort of test I might need).
  • Success! Here’s the code:
    from scipy.stats import chisquare, chi2_contingency
    from scipy.stats.stats import pearsonr
    import pandas as pd
    import numpy as np
    gpt = [51394,
    twic = [49386,
    z, p = chisquare(f_obs=gpt,f_exp=twic)
    print("z = {}, p = {}".format(z, p))
    ar = np.array([gpt, twic])
    df = pd.DataFrame(ar, columns=['pawns', 'rooks', 'bishops', 'knights', 'queen', 'king'], index=['gpt-2', 'twic'])
    print("\n", df)
    z,p,dof,expected=chi2_contingency(df, correction=False)
    print("\nNo correction: z = {}, p = {}, DOF = {}, expected = {}".format(z, p, dof, expected))
    z,p,dof,expected=chi2_contingency(df, correction=True)
    print("\nCorrected: z = {}, p = {}, DOF = {}, expected = {}".format(z, p, dof, expected))
    cor = pearsonr(gpt, twic)
    print("\nCorrelation = {}".format(cor))
  • Here’s the results:
    "C:\Program Files\Python\python.exe" C:/Development/Sandboxes/GPT-2_agents/gpt2agents/analytics/
    z = 8696.966788178523, p = 0.0
     [[51394 25962 19242 23334 15928 19953]
     [49386 31507 28263 31493 22818 23608]]
            pawns  rooks  bishops  knights  queen   king
    gpt-2  51394  25962    19242    23334  15928  19953
    twic   49386  31507    28263    31493  22818  23608
    No correction: z = 2202.2014776980245, p = 0.0, DOF = 5, expected = [[45795.81128532 26114.70012657 21586.92215826 24914.13916789 17606.71268169 19794.71458027]
     [54984.18871468 31354.29987343 25918.07784174 29912.86083211 21139.28731831 23766.28541973]]
    Corrected: z = 2202.2014776980245, p = 0.0, DOF = 5, expected = [[45795.81128532 26114.70012657 21586.92215826 24914.13916789 17606.71268169 19794.71458027]
     [54984.18871468 31354.29987343 25918.07784174 29912.86083211 21139.28731831 23766.28541973]]
    Correlation = (0.9779452546334226, 0.0007242538456558558)
    Process finished with exit code 0


  • It might be time to start writing this up!


  • Found vehicle orientation mnemonics: GNC_AD_STA_FUSED_QRS#


  • 11:00 Meeting with Erik and Vadim about schedules. Erik will send an update. The meeting went well. Vadim’s going to exercise the model through a set of GOTO ANGLE 90 / GOTO ANGLE 0 for each of the rwheels, and we’ll see how they map to the primary axis of the GOES

Phil 7.21.20

Superstrata ebike

Review papers – finished reading the first, write review today. First review done!

Realized that I really need to update my online resumes to include Python and Machine Learning. Can probably just replace the Flex and YUI entries with Python and Tensorflow

Read this today: Proposal: A Market for Truth to Address False Ads on Social Media. It’s by Marshall Van Alstyne, a Questrom Chair Professor at Boston University where he teaches information economics. From the Wikipedia entry

  • Information has special characteristics: It is easy to create but hard to trust. It is easy to spread but hard to control. It influences many decisions. These special characteristics (as compared with other types of goods) complicate many standard economic theories. 
  • Information economics is formally related to game theory as two different types of games that may apply, including games with perfect information,[5] complete information,[6] and incomplete information.[7] Experimental and game-theory methods have been developed to model and test theories of information economics,[8]
  • This looks as close to the description of decisions in the presence of expensive information that I’ve seen so far

GPT-2 Agents

  • The run completed last night! I have 156,313 synthetic moves
  • Reworking the queries from the actual moves to reflect the probes for the synthetic
  • Created a view that combines the probe and the response into a description:
    create or replace view gpt_view as
        select tm.move_number, tm.color, tm.piece, tm.`from`, tm.`to`, concat(tm.probe, tm.response) as description
        FROM table_moves as tm;
  • Almost forgot to backup the db before doing something dumb
  • Created a “constraint string” that should make the game space searched somewhat more similar:
    and (move_number < 42 or description like "%White takes%" or description like "%Black takes%" or description like "%Check%")
  • Made the changes to the code and am running the analysis
  • My fancy queries are producing odd results. Pulling out the constraint string. That looks pretty good!


  • As an aside, the chess queries and extraction is based on an understanding of movement tems like ‘from’ and ‘to’. Thinking about Alex’ finding of consensus metaterms, I think it would be useful to look for movement/consensus/compromise terms and then weighting the words that are nearby

ML meeting

  • Vacation pix!
  • Went over results shown above
  • Arpita found some good embedding results using Tensorboard, but not sure where to go from there?