Phil 12.1.20

Language Through a Prism: A Spectral Approach for Multiscale Language Representations (Twitter summary)

  • Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information at these scales, and can we force them to better capture structure across this hierarchy? We approach this question by focusing on individual neurons, analyzing the behavior of their activations at different timescales. We show that signal processing provides a natural framework for separating structure across scales, enabling us to 1) disentangle scale-specific information in existing embeddings and 2) train models to learn more about particular scales. Concretely, we apply spectral filters to the activations of a neuron across an input, producing filtered embeddings that perform well on part of speech tagging (word-level), dialog speech acts classification (utterance-level), or topic classification (document-level), while performing poorly on the other tasks. We also present a prism layer for training models, which uses spectral filters to constrain different neurons to model structure at different scales. Our proposed BERT + Prism model can better predict masked tokens using long-range context and produces multiscale representations that perform better at utterance- and document-level tasks. Our methods are general and readily applicable to other domains besides language, such as images, audio, and video.

A Visual Guide to Regular Expression

https://twitter.com/emollick/status/1333571781727318019

This could be something for diversity injection?

Corporate Reporting in the Era of Artificial Intelligence

  • The researchers find that companies expecting higher levels of machine readership prepare their disclosures in ways that are more readable by this audience. “Machine readability” is measured in terms of how easily the information can be processed and parsed, with a one standard deviation increase in expected machine downloads corresponding to a 0.24 standard deviation increase in machine readability. For example, a table in a disclosure document might receive a low readability score because its formatting makes it difficult for a machine to recognize it as a table. A table in a disclosure document would receive a high readability score if it made effective use of tagging so that a machine could easily identify and analyze the content.

GPT-2 Agents

  • I want to create a database for generated output. There are two tables:
    • table_experiment – done!
      • Contains the experiment details:
        • id (key)
        • Date
        • Probe list
        • all hyperparameters
    • table_output – done!
      • id
      • experiment_id
      • root_id
      • tag (e.g. “raw”, “date”, “location”, “tweet”
      • depth (this is the index of each piece of content. Raw is 0, then each parsed out section increases depth by 1)
      • content
      • regexes
  • Created a gpt_experiments database. I need to make sure that I can read from one db and write to another
  • Good results on the test. Need to try something at a larger scale to test the embeddings:
https://viztales.files.wordpress.com/2020/12/image-1.png
  • 3:30 Meeting. Get script for Antonio
    • Getting small models for the long and short training sets
    • Look into embedding visualizer
    • Send Antonio info on the COVID Twitter stream while Sim assembles the scripts

GOES

  • Register for MORS
  • Status report for November