Phil 10.30.18

7:00 – 3:30 ASRC PhD

  • Search as embodies in the “Ten Blue Links” meets the requirements of a Parrow “Normal Accident”
    • The search results are densely connected. That’s how PageRank works. Even latent connections matter.
    • The change in popularity of a page rapidly affects the rank. So the connections are stiff
    • The relationships of the returned links both to each other and to the broader information landscape in general is hidden.
    • An additional density and stiffness issue is that everyone uses Google, so there is a dense, stiff connection between the search engine and the population of users
  • Write up something about how
    • ML can make maps, which decrease the likelihood of IR contributing to normal accidents
    • AI can use these maps to understand the shape of human belief space, and where the positive regions and dangerous sinks are.
  • Two measures for maps are the concepts or Range and length. Range is the distance that a trajectory can be placed on the map and remain contiguous. Length is the total distance that a trajectory travels, independent of the map its placed on.
  • Write up the basic algorithm of ML to map production
    • Take a set of trajectories that are known to be in the same belief region (why JuryRoom is needed) as the input
    • Generate an N-dimensional coordinate frame that best preserves length over the greatest range.
    • What is used as the basis for the trajectory may matter. The range (at a minimum), can go from letters to high-level topics. I think any map reconstruction based on letters would be a tangle, with clumps around TH, ER, ON, and AN. At the other end, an all-encompassing meta-topic, like WORDS would be a single, accurate, but useless single point. So the map reconstruction will become possible somewhere between these two extremes.
  • The Nietzsche text is pretty good. In particular, check out the way the sentences form based on the seed  “s when one is being cursed.
    • the fact that the spirit of the spirit of the body and still the stands of the world
    • the fact that the last is a prostion of the conceal the investion, there is our grust
    • the fact them strongests! it is incoke when it is liuderan of human particiay
    • the fact that she could as eudop bkems to overcore and dogmofuld
    • In this case, the first 2-3 words are the same, and random, semi-structured text. That’s promising, since the compare would be on the seed plus the generated text.
  • Today, see how fast a “Shining” (All work and no play makes Jack a dull boy.) text can be learned and then try each keyword as a start. As we move through the sentence, the probability of the next words should change.
    • Generate the text set
    • Train the Nietzsche model on the new text. Done. Here are examples with one epoch and a batch size of 32, with a temperature of 1.0:
      ----- diversity: 0.2
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
      
      ----- diversity: 0.5
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
      
      ----- diversity: 1.0
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy anl wory and no play makes jand no play makes jack a dull boy all work and no play makes jack a 
      
      ----- diversity: 1.2
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a pull boy all work and no play makes jack andull boy all work and no play makes jack a dull work and no play makes jack andull

      Note that the errors start with a temperature of 1.0 or greater

    • Rewrite the last part of the code to generate text based on each word in the sentence.
      • So I tried that and got gobbledygook. The issues is that the prediction only works on waveform-sized chunks. To verify this, I created a seed from the input text, truncating it to maxlen (20 in this case):
        sentence = "all work and no play makes jack a dull boy"[:maxlen]

        That worked, but it means that the character-based approach isn’t going to work

        ----- temperature: 0.2
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
        
        ----- temperature: 0.5
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
        
        ----- temperature: 1.0
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy pllwwork wnd no play makes 
        
        ----- temperature: 1.2
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes

         

    • Based on this result and the ensuing chat with Aaron, we’re going to revisit the whole LSTM with numbers and build out a process that will support words instead of characters.
  • Looking for CMAC models, I found Self Organizing Feature Maps at NeuPy.com:
  • Here’s How Much Bots Drive Conversation During News Events
    • Late last week, about 60 percent of the conversation was driven by likely bots. Over the weekend, even as the conversation about the caravan was overshadowed by more recent tragedies, bots were still driving nearly 40 percent of the caravan conversation on Twitter. That’s according to an assessment by Robhat Labs, a startup founded by two UC Berkeley students that builds tools to detect bots online. The team’s first product, a Chrome extension called BotCheck.me, allows users to see which accounts in their Twitter timelines are most likely bots. Now it’s launching a new tool aimed at news organizations called FactCheck.me, which allows journalists to see how much bot activity there is across an entire topic or hashtag

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.