Monthly Archives: August 2020

Phil 8.10.20

Really good weekend. I feel almost recharged to this time last week 🙂

Language Models as Knowledge Bases (Via Shimei)

  • Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as “fill-in-the-blank” cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of relations, are easy to extend to more data, and require no human supervision to train. We present an in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models. We find that (i) without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pretraining approaches. The surprisingly strong ability of these models to recall factual knowledge without any fine-tuning demonstrates their potential as unsupervised open-domain QA systems. The code to reproduce our analysis is available at this https URL.

#COVID

  • Currently at 103, 951 tweets translated

JuryRoom

  • Write reference section – done

GOES

  • I need to do an incremental rotation to track the reference points from last week
  • Still having problems with the secondary rotation. I’m clearly doing something basic wrong
  • Meeting with Vadim

GPT-2 Agents

  • Create a word cloud for multiple passes of “She came into the room”
  • Add something about the place for qualitative research in a Language model sociology. Outliers are the places that the models learn to ignore. So traditional research will be the way that these marginalized populations are not forgotten.
  • Screwed up the ArXiv bibliography submission. Fixed

ICTAI

  • Started reading the last paper, which is on <shudder> ontologies

Phil 8.7.20

#COVID

  • The Arabic translation program is chunking along. It’s translated over 27,000 tweets so far. I think I’m seeing the power and risks of AI/ML in this tiny example. See, I’ve been programming since the late 1970’s, in many, many, languages and environments, and the common thread in everything I’ve done was the idea of deterministic execution.  That’s the idea that you can, if you have the time and skills, step through a program line by line in a debugger and figure out what’s going on. It wasn’t always true in practice, but the idea was conceptually sound.
  • This translation program is entirely different. To understand why, it helps to look at the code:

translator

  • This is the core of the code. It looks a lot like code I’ve written over the years. I open a database, get some lines, manipulate them, and put them back. Rinse, lather, repeat.
  • That manipulation, though…
  • The six lines in yellow are the Huggingface API, which allow me to access Microsoft’s Marian Neural Machine Translation models, and have them use the pretrained models generated by the University of Helsinki. The one I’m using translates Arabic (src = ‘ar’) to English (trg = ‘en’). The lines that do the work are in the inner loop:
    batch = tok.prepare_translation_batch(src_texts=[d['contents']])
    gen = model.generate(**batch)  # for forward pass: model(**batch)
    words: List[str] = tok.batch_decode(gen, skip_special_tokens=True)
  • The first line is straightforward. It converts the Arabic words to tokens (numbers) that the language model works in. The last line does the reverse, converting result tokens to english.
  • The middle line is the new part. The input vector of tokens is goes to the input layer of the model, where they get sent through a 12-layer, 512-hidden, 8-heads, ~74M parameter model. Tokens that can be converted to English pop put the other side. I know (roughly) how it works at the neuron and layer level, but the idea of stepping through the execution of such a model to understand the translation process is meaningless.
  • In the time it took to write this, its translated about 1,000 more tweets. I can have my Arabic-speaking friends to a sanity check on a sample of these words, but we’re going to have to trust the overall behavior of the model to do our research in, because some of these systems only work on English text.
  • So we’re trusting a system that we cannot verify to to research at a scale that would otherwise be impossible. If the model is good enough, the results should be valid. If the model behaves poorly, then we have bad science. The problem is right now there is only one Arabic to English translation model available, so there is no way to statistically examine the results for validity.
  • And I guess that’s really how we’ll have to proceed in this new world where ML becomes just another API. Validity of results will depend on diversity on model architectures and training sets. That may occur naturally in some areas, but in others, there may only be one model, and we may never know the influences that it has on us.

GOES

  • More quaternions. Need to do multiple axis movement properly. Can you average two quaternions and have something meaningful?
  • Here’s the reference frame with two rotations based off of the origin, so no drift. Now I need to do an incremental rotation to track these points:

reference_frame

GPT-2 Agents

  • Start digging into knowledge graphs

Phil 8.6.20

Coronavirus: The viral rumours that were completely wrong (BBC)

An ocean of Books (Google Arts & Culture Experiments)

bookocean

Hopfield Networks is All You Need

  • We show that the transformer attention mechanism is the update rule of a modern Hopfield network with continuous states. This new Hopfield network can store exponentially (with the dimension) many patterns, converges with one update, and has exponentially small retrieval errors. The number of stored patterns is traded off against convergence speed and retrieval error. The new Hopfield network has three types of energy minima (fixed points of the update): (1) global fixed point averaging over all patterns, (2) metastable states averaging over a subset of patterns, and (3) fixed points which store a single pattern. Transformer and BERT models operate in their first layers preferably in the global averaging regime, while they operate in higher layers in metastable states. The gradient in transformers is maximal for metastable states, is uniformly distributed for global averaging, and vanishes for a fixed point near a stored pattern. Using the Hopfield network interpretation, we analyzed learning of transformer and BERT models. Learning starts with attention heads that average and then most of them switch to metastable states. However, the majority of heads in the first layers still averages and can be replaced by averaging, e.g. our proposed Gaussian weighting. In contrast, heads in the last layers steadily learn and seem to use metastable states to collect information created in lower layers. These heads seem to be a promising target for improving transformers. Neural networks with Hopfield networks outperform other methods on immune repertoire classification, where the Hopfield net stores several hundreds of thousands of patterns. We provide a new PyTorch layer called “Hopfield”, which allows to equip deep learning architectures with modern Hopfield networks as a new powerful concept comprising pooling, memory, and attention. GitHub: this https URL

Can GPT-3 Make Analogies?. By Melanie Mitchell | by Melanie Mitchell | Aug, 2020 | Medium

#COVID

  • Going to try to get the translator working and inserting best effort into the DB. They we can make queries for the good results. Done! Here’s a shot of it chunking away. About one translation a second:

translation

GOES

  • Work on quaternion frame tracking
  • This might help with visualization: matplotlib.org/3.1.1/api/animation_api
  • Updating my work box. Had a weird experience upgrading pip. It hit a permissions issue and failed out without rolling back. I had to use get-pip.py to get it back
  • Looking good:

rotate_to_point

JuryRoom

  • 5:30(?) meeting
  • Project grant application

ICTAI

  • Write review – done. One to go!

 

Phil 8.5.20

Wajanat’s defense at 10:00!

Train your TensorFlow model on Google Cloud using TensorFlow Cloud

import

How QAnon Creates a Dangerous Alternate Reality

  • Game designer Adrian Hon says the conspiracy theory parallels the immersive worlds of alternate reality games.

GPT-2 Agents

  • Finish the results section – done!. Need to do Discussion (done!), Future Work (done!), and Conclusions(done!)
  • Looked on Scholar for “language model sociology GPT” and didn’t find anything, so I’m hopeful that this is still a pretty novel idea

Book

  • Add in more content to the Overleaf project

GOES

  • 2:00 Meeting

#COVID group 4:30

  • Write translator code for tomorrow and get that running

Read paper 5 – done. Started great but no results section!

Phil 8.4.20

Vadim is on vacation, so I’m going to focus on my paper. When I get back to the angle interpolation, I need to make sure that I can rotate a point in and plane using the cross product vector + angle technique. I’m pretty sure that having the start vec X stop vec gives me a right hand vector which should have the direction I want to rotate built in. Anyway, that’s your job to figure out, future self!

Talking to Stacy about podcasts,and listened to her suggestion of Unladylike, For some reason, that made me think of the accessibility of the arguments and suggestions for how to make feminism work. There is this paper, Past, Present and Future of User Interface Software Tools,  that talks about this idea of threshold (the amount of work to achieve basic competency) and ceiling (the maximum capability of the system). Political systems are a population-scale interface, and these concepts should apply?

GPT-2 Agents

  • Add something to graph creation that talks about how the network has a roughly topological relationship to the chessboard. The orientation can be rotated or flipped,and it resembles a rubber sheet, but adjacent parts are generally adjacent.
  • Write up navigation results section. Introduce what it means to navigate, then the algorithms, then the plot on the chessboard of the two legal routes. Note that the two moves are linear diagonals in the actual and reconstructed chessboard
  • In the discussion, emphasize how the chess language model is an embodiment of human bias that is encoded in the trajectories that are chosen, like the two-square first (rook)  move
  • Learning how to do pseudocode in LaTeX. Trying out algorithm2e. I think it actually looks pretty good.
  • Mostly finished the results section. Need to do Discussion, Future Work, and Conclusions tomorrow

ML Seminar

  • Good meeting. I might have access to Twitter COVID data!
  • I also realized that it is August and not September. Which means that instead of a week until submission, I have a MONTH and a week until submission

Write review for paper #4 – done! Two to go

Phil 8.3.20

I found Knuth’s version of “how to write a paper”!

knuth

GPT-2 Agents

  • Writing paper

GOES

  • Status report – done
  • More quaternions. Got the reference frame doing what I want:

ref_rotation

  • Here it’s starting at -45 (rotated around the Y axis) and 0, rotated around the Z. The Z axis is rotated 10 degrees per step. When Z is between 90 and 180, Y is rotated to 0. When Z > 180, Y is set to 45
  • I’ve started to add the tracking, and it’s close-ish:

ref_vehicle_rotation_bug

ICTAI 2020

  • Starting next paper – finished reading. It’s pretty bad…