Author Archives: pgfeldman

Phil 7.9.20

NVAE: A Deep Hierarchical Variational Autoencoder

  • Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels.

VAEsNotGANs

Like Two Pis in a Pod: Author Similarity Across Time in the Ancient Greek Corpus

  • One commonly recognized feature of the Ancient Greek corpus is that later texts frequently imitate and allude to model texts from earlier time periods, but analysis of this phenomenon is mostly done for specific author pairs based on close reading and highly visible instances of imitation. In this work, we use computational techniques to examine the similarity of a wide range of Ancient Greek authors, with a focus on similarity between authors writing many centuries apart. We represent texts and authors based on their usage of high-frequency words to capture author signatures rather than document topics and measure similarity using Jensen- Shannon Divergence. We then analyze author similarity across centuries, finding high similarity between specific authors and across the corpus that is not common to all languages.

GPT-2 Agents

  • Setting up some experiments, for real and synthetic, black and white. All values should have raw numbers and percentages:
    • Moves from each square by piece+color / total number of moves from square
    • Moves to each square by piece+color / total number of moves from square
    • Squares by piece+color / total number of pieces
    • Sequences? I’d have to add back in castling and re-run. Maybe later
    • Squares used over time (first 10 moves, second 10, etc)
    • Pieces used over time
  • Create new directory called results that will contain the spreadsheets
  • Running the first queries. It’s going to take about an hour by my estimation, but nothing is exploding as far as the queries go
  • Add a spreadsheet for illegal moves. Done! Here’s the results. The GPT agents make 3 illegal moves out of 1,565:
    illegal bishop move: {'from': 'e7', 'to': 'c6'}
    illegal knight move: {'from': 'c5', 'to': 'a8'}
    illegal queen move: {'from': 'f8', 'to': 'h4'}
    Dataframe: ../results/legal_1.xlsx/legal-table_moves
             illegal  legal
    pawns          0    446
    rooks          0    270
    bishops        1    193
    knights        1    266
    queen          1    175
    king           0    212
    totals         3   1562
    Dataframe: ../results/legal_1.xlsx/legal-table_actual
             illegal   legal
    pawns          0   49386
    rooks          0   31507
    bishops        0   28263
    knights        0   31493
    queen          0   22818
    king           0   23608
    totals         0  188324

     

move_percentage

GOES

  • Waiting on Vadim
  • 2:00 AIMS-Core v3.0 Overview
  • Ping MARCOM

Waikato

  • 6:00 Meeting

Phil 7.8.20

A brief history of high-speed trading (via the Museum of American Finance)

  • In the late 1830s, Philadelphia broker William C. Bridges operated a private signal station between New York and Philadelphia which disseminated stock market news to him and his backers (and to no one else). The signals were transmitted through an “optical telegraph,” which consisted of a series of boards on a pole, mounted on hills that could be seen by a telescope.

DtZ

  • The IHME site has improved to the point that we should pull down our site

GPT-2 Agents

  • Need to think about how to show that interrogating a language model is sufficiently similar to interrogating actual data.
    • At this point, I know that the language model comes up with legal moves
    • I need to compare the statistics of actual moves to synthetic moves to see if the populations are sufficiently similar. This means that I need to get the training and evaluation data into the database. Once that’s done, I can compare the frequency of move types (e.g. “At move 10, White moves pawn from a2 to a4”), and the moves from a particular location (e.g. “e2” can have moves to “e3” and “e4” with the pawn, or diagonals with the “f1” bishop or the white queen).
    • The level of similarity should indicate if the biases of the players are represented in the language model.
      • There should be a way of determining a lower bound of data?
      • Once this is shown, then the idea of generalizing to other human interactions can be justified.
  • Started PGNtoDB, which will populate table_actual
    • Ignoring castling for now
    • Chunking into the database! And by chunking, I refer to the sound of the drive 🙂
    • And now I have a catalog of 188,324 human chess moves

chess_moves_db

GOES

  • 10:00 Meeting with Vadim
  • 2:00 Status
  • Last training for a while!

Phil 7.7.20

The opportunity cost of this is going to be so steep. I wonder what country will set up an effective, open, online university?

f1

GPT-2 Agents

  • Working through the texthero examples. Spent a lot of time figuring out how to print elements from a row in a Dataframe, which was ridiculously hard. Instead, I just turned it into a dict and worked with that
    # print the first n rows of a dataframe using the specified columns. Use a -1 for printing all rows 
    def print_df(df:pd.DataFrame, headers:List, num_rows:int = 4, max_chars:int = 80):
        s:pd.Series
        rows = 0
    
        d:Dict = df.to_dict('index')
        rd:Dict
        for index, rd in d.items():
            st = ""
            keys = rd.keys()
            for key in headers:
                if key in keys:
                    val = rd[key]
                    st += "{}: {}, ".format(key, val[:max_chars])
            print(st)
            rows += 1
            if num_rows != -1 and rows > num_rows:
                break
  • The scatterplot appears to use plotly, since it’s presented in the browser. That’s kind of cool, since it implies that the plotting functions of plotly are free somehow? After going to the plotly.com website, I see that “Plotly.py is free and open source and you can view the source, report issues or contribute on GitHub.” That would be worth digging into some more then. Here’s the PCA plot:

pca

  • You can make word clouds easily, too

WordCloud

GOES

  • Finish training? Ooops, forgot
  • Some discussion with Vadim about the structure of the control

ML Seminar

  • Good discussion on topic extraction over time. Basically, create k topics from the entire corpora. Each topic is a ranking of all the words in the corpus. Behavior over time is the amount of the top words from each topic k in each time sample t.

Phil 7.6.20

GPT-2 Agents

  • Search the db for the appropriate “from to” text snippet (e.g. “Black moves pawn from e2 to e3”), with a count of the number of times this move was done using that piece. Done!
  • Add a “fewest hops” (A* – traditional network approach), closest (each step finds the closest node to the target) in addition to the line following algorithm. There will have to be some user testing to see what makes the most sense, if any
  • Played around a bit with Summarization, but it didn’t work that well
  • TextHero came across my Twitter feed. It might be good for topic extraction? Trying it out, but the documentation is… sparse. Checking out
    • Installing many things:
      • unidecode
      • spacy (which installs many other things)
      • plotly (I thought you had to pay to use this?)
      • wordcloud
    • Working through the example, which is broken. Trying to fix based on the Getting Started

GOES

  • 11:00 Meeting with Vadim
  • Got the DataDictionary streaming to InfluxDBL

influx_ddict

  • More Satern – one more course down

Phil 7.4.20

Starting to think about topic modeling.Here are some resources:

I also want to search the db for the appropriate “from to” text snippet (e.g. “Black moves pawn from e2 to e3”), maybe with a count of the number of times this move was done using that piece

Also, I think it makes sense to have a “fewest hops” (A* – traditional network approach), closest (each step finds the closest node to the target) in addition to the line following algorithm. There will have to be some user testing to see what makes the most sense, if any

The map is based on the single jumps, and shows the big jumps as arcs

Phil 7.3.20

Today is a federal holiday, so no rocket science

Huggingface has a pipeline interface now that is pretty abstract. This works:

from transformers import pipeline

translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
  • [{‘translation_text’: ‘Hugging Face est une entreprise technologique basée à New York et à Paris.’}]

Wow: GPT-3 writes code!

DtZ is back up! Too many countries have the disease and the histories had to be cropped to stay under the data cap for the free service

GPT-2 Agents

  • Work on more granular path finding
    • Going to try the hypotenuse of distance to source and line first – nope
    • Trying looking for the distances of each and doing a nested sort
    • I had a problem where I was checking to see whether a point was between the current node and the target node using the original line between the source and target nodes. Except that I was checking on a lone from the current node to the target, and failing the test. Oops! Fixed
    • I went back to the hypotenuse version now that the in_between test isn’t broken and look at that!

granular

    • Added the option for coarse or granular paths
  • Start thinking about topic extraction for a given corpus

#COVID

  • Evaluate Arabic to English translation. Got it working!
    from transformers import MarianTokenizer, MarianMTModel
    from typing import List
    src = 'ar'  # source language
    trg = 'en'  # target language
    sample_text = "لم يسافر أبي إلى الخارج من قبل"
    sample_text2 = "الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة "
    mname = f'Helsinki-NLP/opus-mt-{src}-{trg}'
    
    model = MarianMTModel.from_pretrained(mname)
    tok = MarianTokenizer.from_pretrained(mname)
    batch = tok.prepare_translation_batch(src_texts=[sample_text2])  # don't need tgt_text for inference
    gen = model.generate(**batch)  # for forward pass: model(**batch)
    words: List[str] = tok.batch_decode(gen, skip_special_tokens=True) 
    print(words)
  • It took a few tries to find the right model. The naming here is very haphazard.
  • Asked for a sanity check from the group
    • This:
      الصحة_السعودية تعلن إصابة أربعيني بفيروس كورونا بالمدينة المنورة حيث صنفت عدواه بحالة أولية مخالطة الإبل مشيرة إلى أن حماية الفرد من(كورونا)تكون باتباع الإرشادات الوقائية والمحافظة على النظافة والتعامل مع #الإبل والمواشي بحرص شديد من خلال ارتداء الكمامة
    • Translates to this:
      Saudi health announces a 40-year-old corona virus in the city of Manora, where his enemy was classified as a primary camel conglomerate, indicating that the protection of the individual from Corona would be through preventive guidance, hygiene, and careful handling of the Apple and the cattle by wearing the gag.

       

  • Write script that takes a batch of rows and adds translations until all the rows in the table are complete

Book chat

Phil 7.2.20

Emergence of polarized ideological opinions in multidimensional topic spaces

  • Opinion polarization is on the rise, causing concerns for the openness of public debates. Additionally, extreme opinions on different topics often show significant correlations. The dynamics leading to these polarized ideological opinions pose a challenge: How can such correlations emerge, without assuming them a priori in the individual preferences or in a preexisting social structure? Here we propose a simple model that reproduces ideological opinion states found in survey data, even between rather unrelated, but sufficiently controversial, topics. Inspired by skew coordinate systems recently proposed in natural language processing models, we solidify these intuitions in a formalism where opinions evolve in a multidimensional space where topics form a non-orthogonal basis. The model features a phase transition between consensus, opinion polarization, and ideological states, which we analytically characterize as a function of the controversialness and overlap of the topics. Our findings shed light upon the mechanisms driving the emergence of ideology in the formation of opinions.

DtZ has broken

dtzfail

GPT2-Agents

  • Continue working on the trajectory. I think that a plot that works entirely on distance to target can result in spirals, so there needs to be some kind of system that looks at the distance to the center line first, and if there is a fail, move the last node from the trajectory list to a dirty list. Then the search restores the cur node to the previous, and continue the search with the trajectory and dirty list nodes ignored?
  • Found an example to fix: A6 – H7
    • get_closest_node() line = [337.0, 44.0, 581.0, 499.0], cur_node = h1, node_list = [‘a6’, ‘b6’, ‘c7’, ‘d7’, ‘e6’, ‘c5’, ‘b7’, ‘g7’, ‘h6’, ‘g6’, ‘c6’, ‘e7’, ‘f7’, ‘g8’, ‘f6’, ‘d8’, ‘a8’, ‘e8’, ‘d6’, ‘b4’, ‘b8’, ‘c8’, ‘c4’, ‘e5’, ‘d5’, ‘d4’, ‘b5’, ‘c3’, ‘e4’, ‘f5’, ‘f8’, ‘f4’, ‘g5’, ‘g4’, ‘h5’, ‘h4’, ‘f3’, ‘d3’, ‘c2’, ‘e3’, ‘d2’, ‘e2’, ‘b2’, ‘b1’, ‘c1’, ‘e1’, ‘d1’, ‘a1’, ‘f1’, ‘g3’, ‘h3’, ‘g2’, ‘f2’, ‘g1’, ‘h2’, ‘h1’]
    • It does fine until it gets to E6, where it chooses c5
    • Adding a target distance-based search if the distance to line search fails seems to have fixed it:
      nlist = list(nx.all_neighbors(self.gml_model, cur_node))
      print("\tneighbors = {}".format(nlist))
      dist_dict = {}
      sx, sy = self.get_center(cur_node)
      
      for n in nlist:
          if n not in node_list:
              newx, newy = self.get_center(n)
              newa = [newx, newy]
              print("\tline dist checking {} at {}".format(n, newa))
              x, y = self.point_to_line([l[0], l[1]], [l[2], l[3]], newa)
              ca = [x, y]
              ib = self.is_between([sx, sy], [l[2], l[3]], [x, y])
              if ib:
                  # option 1: Find the closest to the line
                  dist = np.linalg.norm(np.array(newa)-np.array(ca))
                  dist_dict[n] = dist
                  print("\tis BETWEEN = {}, dist = {}".format(ib, dist))
      if len(dist_dict) == 0:
          ta = [self.get_center(self.target_node)]
          for n in nlist:
              if n not in node_list:
                  newx, newy = self.get_center(n)
                  newa = [newx, newy]
                  print("\ttarget dist checking {} at {}".format(n, newa))
                  # option 2: Find the closest to the target node
                  dist = np.linalg.norm(np.array(newa)-np.array(ta))
                  dist_dict[n] = dist
                  print("\tis CLOSEST: dist = {}".format(dist))
  • Got legal trajectories working. Below is a set of jumps that are legal (rook to c1, bishop to e3 and then h6, then rook the rest of the way) I think I want to also sort based on closest distance to the current node.

legal_moves

GOES

  • Add InfluxDB streaming to DD
  • 10:00 Sim meeting
  • 2:00 Status meeting

Phil 7.1.20

I should be riding across southern Spain right now

#COVID19

  • Huggingface has fixed the Marian Model/Toekenizer! Need to try Arabic, and if it works, translate the tweets in the db

GPT-2 Agents

  • Shimei had a good point last night that belief maps may be directed. For example, it may be easier for a person to go from smoking to harder drugs than to go back to just smoking. The path from hard drugs might involve 12-step programs, which would be unlikely to be reached from smoking. Other beliefs can swing back and forth, as we see with, for example, the desirability of deficit spending when in power.
  • Finishing up direct selection of source and target nodes as a way of procrastinating on calc_direct, which is going to be harder. I’m always nervous with recursion!
  • Added a test to see that one of the neighbors is the target node first
  • Broke out the angle calculations
  • Need to sort by distance to line and distance to target. It may be necessary to step away from the target occasionally. For now, it’s an option as I try to figure out what’s best:
    # option 1: Find the closest to the line
    # dist = np.linalg.norm(np.array(na)-np.array(ca))
    # option 2: Find the closest to the target node
    dist = np.linalg.norm(np.array(newa)-np.array(ta))
    dist_dict[n] = dist

GOES

  • Status report for June
  • 1:30 Sim progress meeting – things are working! Need to hook up the data dictionary to influx for monitoring and debugging. Add Erik to the invites
  • 2:00 Status meeting
  • I did a NASA/GSFC training module early!

Phil 6.30.20

(Re)Discovering Protein Structure and Function Through Language Modeling (ArXiv)(Code)

  • In our study, we show how a language model, trained simply to predict a masked (hidden) amino acid in a protein sequence, recovers high-level structural and functional properties of proteins. In particular, we show how the Transformer language model uses attention (1) to capture the folding structure of proteins, connecting regions that are apart in the underlying sequence but spatially close in the protein structure, and (2) targets binding sites, a key functional component of proteins. We also introduce a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with biological processes and provide a tool to aid scientific discovery. The code for the visualization tool and experiments is available at https://github.com/salesforce/provis.
  • TL;DR: Trained solely on language modeling, the Transformer’s attention mechanism recovers high-level structural and functional properties of proteins.
  • We explored the degree to which attention captures these contact relationships by analyzing the attention patterns of 5,000 protein sequences and comparing them to ground-truth contact maps. Our analysis revealed that one particular head — the 12th layer’s 4th head, denoted as head 12-4 — aligned remarkably well with the contact map. For “high confidence” attention (> .9 ), 76% of this head’s total attention connected amino acids that were in contact. In contrast, the background frequency of contacts among all amino acid pairs in the dataset is just 1.3%.

GPT-2 Agents

  • Add a menu that writes node spatial information to the DB
  • Add a “Graph from DB” menu that assembles the edge information from the move table and the node information from the new table, above.
  • Continue on path finding
    • Distance between a point and a line using numpy  (stackoverflow). Not exactly what I need, which is the point of intersection and  the distance. There is a stackoverflow post that is close, but here’s a version that tests the results and plots it:
      import numpy as np
      import math
      import matplotlib.pyplot as plt
      
      p1 = np.array([1.0, 1.0])
      l1 = np.array([0.0, 1.0])
      l2 = np.array([1.0, 0.0])
      
      lvec = l2 - l1
      lvec /= np.linalg.norm(lvec, 2)
      
      p2 = l1 + lvec * np.dot(p1 - l1, lvec)
      print("intesection = {}".format(p2)) #0.2 1.
      
      pvec = p2 - p1
      dist = np.linalg.norm(pvec, 2)
      pvec /= dist
      det = np.linalg.det([lvec, pvec])
      dot = np.dot(lvec, pvec)
      rads = math.atan2(det, dot)
      print("distance = {}, angle = {}".format(dist, math.degrees(rads)))
      
      plt.plot([l1[0], l2[0]],[l1[1], l2[1]])
      plt.plot([p1[0], p2[0]],[p1[1], p2[1]])
      plt.show()
  • Here’s the test for seeing if a point is on a line. Again, loosely based on a stackoverflow post:
    def is_between(self, l1:[int, int], l2:[int, int], p1:[int, int], epsilon:float = .1) -> bool:
        p1 = np.array(p1).astype(np.float)
        l1 = np.array(l1).astype(np.float)
        l2 = np.array(l2).astype(np.float)
        
        s1 = np.linalg.norm(l1-p1)
        s2 = np.linalg.norm(l2-p1)
        d = np.linalg.norm(l2-l1)
        # print("d = {}, s1 + s2 = {}".format(d, s1+s2))
        if abs(d - (s1+s2)) < epsilon:
            return True
        return False
  • Got graphical node selection working. Need to tie that back into the menus for start and stop

Proposal

  • Looks like no writing today. Done, maybe?

GOES

  • 10:00 CASSIE demo – really good
  • 12:00 All Hands – need to catch up on my training. Something for the afternoons?

ML Seminar

  • Status report
  • Participated in some some triage on Arpita’s and Fatima’s paper

Phil 6.29.20

ACM IUI 2021 is the 26th annual premier international forum for reporting
outstanding research and development on intelligent user interfaces.

  • ACM IUI is where the Human-Computer Interaction (HCI) and Artificial 
    Intelligence (AI) communities meet, with contributions from related fields 
    such as psychology, behavioral science, cognitive science, computer 
    graphics, design, the arts, and more. Our focus is on improving the
    interaction between humans and digital technology, by leveraging both HCI
    approaches and state-of-the art AI techniques from machine learning,
    natural language processing, data mining, knowledge representation and
    reasoning.

GOES:

  • Ping Erik about collaborative VR coding environments. Done
  • 2:00 Meeting with Vadim
    • Walked through the deep hierarchy example
    • He’s now running 4 wheels and starting to get close, though the RW speed plots are not close to the actuals. It makes me think that there is more feedback control in the satellite implementation than there is implied in the documentation.

Proposal

  • After digging into the existing text, we realized that a lot of the technical sections were flat wrong, and depended on a kind of “magical ML thinking” that should have been in our phase III. So, lots of writing.

GPT-2 Agents

  • Working on trajectory plotting
  • Fix the listbox select. I was using the wrong event. It should be like this.

ListBoxSelect

  • Aaaand then there were a bunch of weird errors. For some reason, the call to a new ListBox also calls the previous ListBox with no args(?) so I get an error. Chased down and fixed.
  • Plot main line. Done!

NodeLine

  • Plot legal connections of closest lines
    • I think this can be done by looking at the nodes that are connected to the start (current) node, then looking at the coordinates of all the children. The one that is closest to the line and between the current and the target gets added to the list
    • Plotting all the node connections so there can be a sanity check:

NodeLine

  • Use the weight of the lines to choose the lines
  • Build a narrative rutter that describes the route (Here be there stampedes!)

Phil 6.26.20

Let’s not forget that things are not going well here:

DtZ

python_data_libs

Many useful links in the replies (like Stumpy for time series)

GPT2-Agents

  • Working on plotting nodes correctly, being able to select them, then plotting closest legal moves that reach a destination
  • Got node selection working!

2020-06-26

With loaded nodes as well. I have an issue where the callback from the mouse ius happening before the selection in a list, so I need to fix that:

2020-06-26 (1)

GOES

  • 10:00 Meeting with Vadim
  • Realized that I had been too fancy to remember how to deal with commands to individual controllers. Figuring that out now – done!

Book

  • Meeting with Michelle to discuss editing – went very well. Sent her a copy of the “book” version of the dissertation

Phil 6.25.20

Latent Embeddings of Point Process Excitations

  • When specific events seem to spur others in their wake, marked Hawkes processes enable us to reckon with their statistics. The underdetermined empirical nature of these event-triggering mechanisms hinders estimation in the multivariate setting. Spatiotemporal applications alleviate this obstacle by allowing relationships to depend only on relative distances in real Euclidean space; we employ the framework as a vessel for embedding arbitrary event types in a new latent space. By performing synthetic experiments on short records as well as an investigation into options markets and pathogens, we demonstrate that learning the embedding alongside a point process model uncovers the coherent, rather than spurious, interactions.

Misinformation, Crisis, and Public Health—Reviewing the Literature

  • The Covid-19 pandemic has been accompanied by a parallel “infodemic” (Rothkopf 2003; WHO 2020a), a term used by the World Health Organization (WHO) to describe the widespread sharing of false and misleading information about the novel coronavirus. Misleading information about the disease has been a problem in diverse societies around the globe. It has been blamed for fatal poisonings in Iran (Forrest 2020), racial hatred and violence against people of Asian descent (Kozlowska 2020), and the use of unproven and potentially dangerous drugs (Rogers et al. 2020). A video promoting a range of false claims and conspiracy theories about the disease, including an antivaccine message, spread widely (Alba 2020) across social media platforms and around the world. Those spreading misinformation include friends and relatives with the best intentions, opportunists with books and nutritional supplements to sell, and world leaders trying to consolidate political power.

GPT-2 Agents

  • Well, networkx can write a gefx file that Gephi can read, but not the other way around.
  • Networkx CAN read and write gml files, though. Switching to that.
  • That seems to be working well:

gml_read_write

  • Now let’s see if we can draw it in the app
  • Things are starting to get very specific. creating a subclass
  • Pulling attributes is not obvious. Here’s how you do it for the nodes read in from gml:
    attrs = nx.get_node_attributes(self.gml_model, 'graphics')
    for key, val in attrs.items():
        print("{} = {}".format(key, val))

 

  • Loading and displaying the nodes! Next, I need to get piece data from the database. Also, since the graphics attribute can be a dictionary, it may be possible to add attributes like that to the edge data? Then I won’t need to re-access the db. Conversely, another way to do this might be to update the table in the db with positions, etc. Hmmmm

GraphNavigator

GOES

  • 10:00 Meeting with Vadim. Nope, he broke his code. Rescheduled for tomorrow

Proposal

  • Work on technical section with Aaron?

Phil 6.24.20

GPT-2 Agents

  • Starting work on the navigator app
  • Today’s progress:

TkCanvasBase

  • I think that this can be the core of the initial navigation capability for any corpus. You should be able to identify a topic on the map or in the list, and the system will figure out the most direct route (linear distance).
  • I think there also needs to be an ability to see the directly connected neighbors as well, since they might be farther away due to mapping constraints. For example, we can see that d2 is linked directly to d7, which is almost completely across the board. This is the result of the white queen making a pretty aggressive move. It’s not common, but it does happen. It might be interesting for someone working their way from arithmetic to calculus to see, for example, how Johann Carl Friedrich Gauss did it:

nearest

GOES

  • 10:00 Meeting with Vadim
    • We’re going to try to get a single RW to move the vehicle through two successive 90-degree maneuvers, then verify that everything is working correctly on the other RWs, then go to RW sets
  • 2:00 Status meeting

Proposal

Phil 6.23.20

Oh, look, we’re not going to let smart, motivated people into the country and sabotage our future because, I dunno, being xenophobic trumps everything?

Collective Intelligence 2020

  • You can watch all the keynotes on our YouTube channel.
  • Conference proceedings (papers & presentations) are online here.

GPT-2 Agents

  • Working on getting Gephi installed and running everywhere.
  • Next is  to export graphs from networkx. Done! A little tricky. I’m using a dictionary attached to each node to store the pieces that traversed that particular edge, but the exporter chokes on that. So I have to create a new graph without the dict and export that. It looks pretty good too!

gephi_first_map

  • I’m going to import that into Illustrator and see if I can build a (distorted) chessboard. Here’s the result:

chess_nearest_neighbors_6_23_20

  • To get a sense of how this relates back to the ground truth of the chessboard, the red lines are the columns of the board (a – h) and the green lines are the rows (1 – 8). Here’s the comparison with the actual board:

chessboard

  • It’s clearly a grid. The opposite corners are far away from each other. The left (queen) side of the board is more complicated, which may be because of the queen?
  • I had a chat with Aaron about all of this and I think the next step is to show that this map can be used for meaningful navigation. Consider the following two trajectories from opposite sides of the map:

chessboard_trajectories

  • These are the kind of trajectories that you’d like to be able to plot on a map. Let’s say you’re on square A1, and you’re on a rook. For you, only row 1 and column A are directly accessible. But maybe you could ride a bishop from A3 to F8, then take a king the rest of the way. Now, the shortest number of moves could be to take the rook from A1 to A8 to H8, but the journey would cover a greater distance. In terms of belief space, you would not be making incremental shifts to your understanding, you would be making two, equally large jumps that combined are roughly 1.4 times farther than the more direct route. That’s the difference between navigating in space vs navigating in a network.
  • I think the next step is to write an app that reads in the GEFX files, which contain location information and link them back to the database, so it’s possible to plot a beginning and an ending, and have the app figure out the legal moves that move you near that line towards your destination.
  • After that, it’s time to finetune the NN on the antibubbles corpora and see if the same thing can be done.

GOES

  • Need to record a video of my talk for GVSETS
  • Sent a copy to Aaron for SBIR
  • Started looking at the SBIR materials

ML Seminar