Category Archives: Machine Learning

Phil 12.21.18

7:00 – 4:30 ASRC PhD/NASA/NOAA

  • Spatial Representations in the Human Brain
    • While extensive research on the neurophysiology of spatial memory has been carried out in rodents, memory research in humans had traditionally focused on more abstract, language-based tasks. Recent studies have begun to address this gap using virtual navigation tasks in combination with electrophysiological recordings in humans. These studies suggest that the human medial temporal lobe (MTL) is equipped with a population of place and grid cells similar to that previously observed in the rodent brain. Furthermore, theta oscillations have been linked to spatial navigation and, more specifically, to the encoding and retrieval of spatial information. While some studies suggest a single navigational theta rhythm which is of lower frequency in humans than rodents, other studies advocate for the existence of two functionally distinct delta–theta frequency bands involved in both spatial and episodic memory. Despite the general consensus between rodent and human electrophysiology, behavioral work in humans does not unequivocally support the use of a metric Euclidean map for navigation. Formal models of navigational behavior, which specifically consider the spatial scale of the environment and complementary learning mechanisms, may help to better understand different navigational strategies and their neurophysiological mechanisms. Finally, the functional overlap of spatial and declarative memory in the MTL calls for a unified theory of MTL function. Such a theory will critically rely upon linking task-related phenomena at multiple temporal and spatial scales. Understanding how single cell responses relate to ongoing theta oscillations during both the encoding and retrieval of spatial and non-spatial associations appears to be key toward developing a more mechanistic understanding of memory processes in the MTL.
  • Three Kinds of Spatial Cognition
    • Nora S. Newcombe (Scholar)
    • Spatial cognition is often (but wrongly) conceptualized as a single domain of cognition. However, humans function in more than one way in the spatial world. We navigate, as do all mobile animals, but we also manipulate objects using distinctive hands with opposable thumbs, unlike other species. In fact, an important characteristic of human adaptation is the ability to invent tools. Of course, another central asset is human symbolic ability, which includes the ability to spatialize thought in abstractions such as maps, graphs, and analogies. Thus, there are at least three kinds of spatial cognition with three separable functions. Navigation involves moving around the environment to find food and shelter, and to avoid danger. It draws on several interconnected neural subsystems that track movement and encode the location of external entities with respect to each other and the moving self (i.e., extrinsic coding), and it integrates these inputs to achieve best‐possible estimates. Human navigation is characterized by a great deal of individual variation. Tool use and invention involves the mental representation and transformation of the shapes of objects (i.e., intrinsic coding). It relies on substantially different neural subsystems than navigation. Like navigation, it shows marked individual differences, which are related to variations in learning in science, technology, engineering, and mathematics (STEM). Spatialization is an aspect of human symbolic skill that cuts across multiple cognitive domains and involves many kinds of spatial symbol systems, including language, metaphor, analogy, gesture, sketches, diagrams, graphs, maps, and mental images. These spatial symbol systems are vital to many kinds of learning, including in STEM. Future research on human spatial cognition needs to further delineate the origins, development, neural substrates, variability, and malleability of navigation, tool use, and abstract spatial thinking, as well as their interconnections to each other and to other cognitive skills.
  • A little bit adding to Normal Accidents notes
  • Working on saving out to history and item table
  • Turns out that if you want to retrieve floats from a postgres table using psycopg2, you have to register a custom handler:
    DEC2FLOAT = psycopg2.extensions.new_type(
        psycopg2.extensions.DECIMAL.values,
        'DEC2FLOAT',
        lambda value, curs: float(value) if value is not None else None)
    psycopg2.extensions.register_type(DEC2FLOAT)
  • Learned about dollar-quoting as per https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING
  • To execute big inserts with psycopg2, you need to set autocommit = True
    self.conn = psycopg2.connect(config_str)
    self.conn.autocommit = True
  • And you should use try/catch
    def query_no_result(self, sql: str) -> bool:
        try:
            self.cursor.execute(sql)
            desc = self.cursor.statusmessage
            return True
        except:
            print("unable to execute: {}".format(sql))
            return False

 

Phil 12.19.18

7:00 – 4:30 ASRC PhD/NASA

  • I think the IEEE paper with Antonio should be something on the math behind diversity/chaos <-> ensembles <-> hierarchies/stampedes
  • Continuing with Normal Accidents
  • Thinking about costly signalling (economics, and more generally) WRT stampede theory. It’s a form of friction. Proof-of-work could be adaptively added to a system to inhibit stampedes?
    • In contract theorysignalling is the idea that one party (termed the agent) credibly conveys some information about itself to another party (the principal). For example, in Michael Spence’s job-market signalling model, (potential) employees send a signal about their ability level to the employer by acquiring education credentials. The informational value of the credential comes from the fact that the employer believes the credential is positively correlated with having greater ability and difficult for low ability employees to obtain. Thus the credential enables the employer to reliably distinguish low ability workers from high ability workers.
  • Tweak optimizer. Change callback method to replace_with_your_method(), and add some documentation on how to use the classes
  • Move clustering paper to LaTex folder – done!
  • Pinged Antonio about the new paper ideas
  • More predictive analytics? Kind of – setting up to read and write out time series to db
  • Upgrading my postgreSQL, to read/write from a table
    • Installed drivers and python packages
    • Reading and writing in the IDE
      • Created and populated a dummy data table
    • Reading in python. Writing is next
  • I was thinking that if we run out of D&D data for other map configurations, that I can use the current data with some repurposing (replace orc with xxx) to create conversations about other environments using human speech. Kind of like PCA for DNA.

Phil 12.17.18

7:00 – 4:30 ASRC NASA/PhD

  • Ted Radio Hour interview with Margaret Heffernan, who spoke about her book, Willful Blindness:
    • “Companies that have been studied for willful blindness can be asked questions like, are there issues at work that people are afraid to raise? And when academics have done studies like this of corporations in the United States, what they find is 85 percent of people say yes. Eighty-five percent of people know there’s a problem, but they won’t say anything. And when I duplicated the research in Europe, asking all the same questions, I found exactly the same number. And what’s really interesting is that when I go to companies in Switzerland, they tell me this is a uniquely Swiss problem. And when I go to Germany, they say, oh yes, this is the German disease. And when I go to companies in England they say, oh yeah, the British are really bad at this. And the truth is, this is a human problem. We’re all, under certain circumstances, willfully blind.”
    • I’ve been thinking about this a lot because when I say, well, why don’t people speak up? What I get is, oh, it’s the culture. And I think, well, what is the culture? The culture is the accumulation of everybody’s actions. And in many of the organizations I work with, change starts in very unexpected places because people just decide, I want to do this or I want to try this. And then they discover they don’t get shot. And then they discover that, actually, now, they’ve got a really exciting project. You know, I think the most dangerous thing in organizations is silence. It’s all those brains whizzing around full of observations and insight and ideas that are not being articulated.
    • I think that that the 15% who do speak out are Nomads. They are mis-aligned with the culture and as such it’s 1) Easier to see problems and solutions. 2) an inability to not behave independently.
  • Bayesian Layers: A Module for Neural Network Uncertainty
    • We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with layers capturing uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations (“stochastic output layers”), and the function itself (Gaussian processes). With reversible layers, one can also propagate uncertainty from input to output such as for flow-based distributions and constant-memory backpropagation. Bayesian Layers are a drop-in replacement for other layers, maintaining core features that one typically desires for experimentation. As demonstration, we fit a 10-billion parameter “Bayesian Transformer” on 512 TPUv2 cores, which replaces attention layers with their Bayesian counterpart.
  • Continuing with Normal Accidents
  • Nice interactive on disinformation on Twitter
  • The universal decay of collective memory and attention
    • Collective memory and attention are sustained by two channels: oral communication (communicative memory) and the physical recording of information (cultural memory). Here, we use data on the citation of academic articles and patents, and on the online attention received by songs, movies and biographies, to describe the temporal decay of the attention received by cultural products. We show that, once we isolate the temporal dimension of the decay, the attention received by cultural products decays following a universal biexponential function. We explain this universality by proposing a mathematical model based on communicative and cultural memory, which fits the data better than previously proposed log-normal and exponential models. Our results reveal that biographies remain in our communicative memory the longest (20–30 years) and music the shortest (about 5.6 years). These findings show that the average attention received by cultural products decays following a universal biexponential function.
  • Zach walkthough
    • Yarn Workspaces
    • NextJS – Tools for developing React Apps – check the github repo to see, for example, how to roll your own web server
    • REACT hooks api
  • Got the basic recursion piece of the optimizer working right. Works for ints, floats, and strings:
    def cascading_step(self):
        self.cur_val = self.range_array[self.index]
        print("{} cur_val = {}".format(self.name, self.cur_val))
    
        child_complete = True
        if self.child:
            child_complete = self.child.cascading_step()
    
        if child_complete:
            self.index += 1
            if self.index >= len(self.range_array):
                self.index = 0
                return True
        return False
  • And here’s the first working test:
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_0
    step 0 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_1
    step 1 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_2
    step 2 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_3
    step 3 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_1
    v1 cur_val = v1_0

     

Phil 12.14.18

7:00 – 4:30 ASRC PhD/NASA

  • Sent Greg a couple of quick notes on using CNNs to match bacteria to phages.
  • Continuing with Normal Accidents
  • A Digital Test of the News: Checking the Web for Public Facts – Workshop report, December 2018
    • The Digital Test of the News workshop brought together digital sociologists, data visualisation and new media researchers at the Centre for Interdisciplinary Methodologies at the University of Warwick on 8 and 9 May 2018. The workshop is part of a broader research collaboration between the Centre for Interdisciplinary Methodologies and the Public Data Lab which investigates the changing nature of public knowledge formation in digital societies and develops inventive methods to capture and visualise knowledge dynamics online. Below we outline the workshop’s aims and outcomes.
  • Added plots to the NN code. Everything seems to look right. Looking at the individual weights in a layer is very informative. Need to add this kind of plotting to our keras code somehow:
  • Changing the coherence code so that the row values are zero or one. Actually, as the amount of data grows, BOW is getting more useful raw. This spreadsheet shows all posts, including the DM. Note that the word frequency is power law (R squared = .9352): All_posts
  • Started the optimizer and excel utils classes
  • NOAA meeting
  • NOAA meeting 2
  • Some thoughts from Aaron on our initial ML approach
    • I think for the first pass we do 1-2 models based on contract type focused on the $500k+ award contracts (120ish total).
    • We construct the inputs like sentences in the word generation LSTM with padded lengths equal to the longest running contract of that type, sorted by length. The model can be tested against current contracts held out in a test set using point by point prediction so we can show accuracy of the model against existing data and use that to set our accuracy threshold.
    • My guess is this will be at least an ORF/PAC model (two different primary contract types) which we can work on tuning with the learning_optimizer.py to get as accurate as possible in the timeframe we have.
    • One of the things we advertise as “next steps” is a detailed analysis of contracts based on similarity measures to identify a series of more accurate models. We can pair this with additional models such as Exponential Smoothing and ARIMA which use fundamentally the exact same pipeline.
    • The GUI will be plumbed up to show these analytic outputs on a per contract basis and we can show by the end of January a simple linear model and the LSTM model to demonstrate how Exponential Smoothing / ARIMA or model averages could be displayed. Once we have these outputs we can take the top 5 highest predicted UDO and display in a summary page so they can use those as a launching off point.
    • If we do it this way it means we only have to focus on the completion of TimeSeriesML LSTM and it’s data pipeline with a maximum of 2 initial models (contract type). I think that is a far more reasonable thing to complete in the timeframe and should still be really exciting to show off.

Phil 12.12.18

7:00 – 4:30 ASRC NASA/PhD

  • Do a dungeon analytic with new posts and DM for Aaron – done!
  • Send email to Shimei for registration and meeting after grading is finished
  • Start review of Normal Accidents – started!
  • Debug NN code – in process. Very tricky figuring out the relationships between the layers in backpropagation
  • Sprint planning
  • NASA meeting
  • Talked to Zach about the tagging project. Looks good, but I wonder how much time we’ll have. Got a name though – TaggerML

Phil 12.11.18

7:00 – 4:30 ASRC PhD/NASA

mercator_projection

Somehow, this needs to get into a discussion of the trustworthiness of maps

  • I realized that we can hand-code these initial dungeons, learn a lot and make this a baseline part of the study. This means that we can compare human and machine data extraction for map making. My initial thoughts as to the sequence are:
    • Step 1: Finish running the initial dungeon
    • Step 2: researchers determine a set of common questions that would be appropriate for each room. Something like:
      • Who is the character?
      • Where is the character?
      • What is the character doing?
      • Why is the character doing this?
    • Each answer should also include a section of the text that the reader thinks answers that question. Once this has been worked out on paper, a simple survey website (simpler) can be built that automates this process and supports data collection at moderate scales.
    • Use answers to populate a “Trajectories” sheet in an xml file and build a map!
    • Step 3: Partially automate the extraction to give users a generated survey that lets them select the most likely answer/text for the who/where/what/why questions. Generate more maps!
    • Step 4: Full automation
  • Added these thoughts to the analysis section of the google doc
  • The 11th International Natural Language Generation Conference
    • The INLG conference is the main international forum for the presentation and discussion of all aspects of Natural Language Generation (NLG), including data-to-text, concept-to-text, text-to-text and vision to-text approaches. Special topics of interest for the 2018 edition included:
      • Generating Text with Affect, Style and Personality,
      • Conversational Interfaces, Chatbots and NLG, and
      • Data-driven NLG (including the E2E Generation Challenge)
  • Back to grokking DNNs
    • Still building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
    • array formatting issues. Tricky
    • I think I’m done enough to start debugging. Tomorrow
  • Sprint review

Phil 12.10.18

7:00 – 5:30 ASRC NASA/PhD

  • For my morning academic work, I am cooking delicious things.
  • There is text in the dungeon! Here’s what happened when I ran the analytics against 3 posts and held back the dungeon master. Rather than put up a bunch of screenshots, here’s the spreadsheet: Day_1_Dungeon_1
  • Russell Richie (twitter) (Scholar) One of my favorite results in the paper is that you can compress the embeddings 10x or more while preserving prediction performance, suggesting that the type of knowledge used to make these kind of judgments may only vary along a relative handful of latent dimensions.
    • dtwstpluuaasaa4
    • dtwqzvwv4aa7kot
  • Ok, back to grokking DNNs
    • Building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
  • Fika and meeting with Wayne
    • Ade might be interested in doing some coding work!
    • Went over the initial results spreadsheet with Wayn. Overall, progress seems on track. He had an additional thought for venues that I didn’t note.
    • Ping Shimei about 899

Phil 12.7.18

7:00 – 4:30 ASRC NASA/PhD

Phil 12.5.18

7:00 – 4:30 ASRC PhD/NASA

Phil 11.30.18

7:00 – 3:00 ASRC NASA

  • Started Second Person, and learned about GURPS
  • Added a section on navigating belief places and spaces to the dissertation
  • It looks like I’m doing Computational Discourse Analysis, which has more to do with how the words in a discussion shift over time. Requested this chapter through ILL
  • Looking at Cornell Conversational Analysis Toolkit
  • More Grokking today so I don’t lose too much focus on understanding NNs
        • Important numpy rules:
          import numpy as np
          
          val = np.array([[0.6]])
          row = np.array([[-0.59, 0.75, -0.94,0.34 ]])
          col = np.array([[-0.59], [ 0.75], [-0.94], [ 0.34]])
          
          print ("np.dot({}, {}) = {}".format(val, row, np.dot(val, row)))
          print ("np.dot({}, {}) = {}".format(col, val, np.dot(col, val)))
          
          '''
          note the very different results:
          np.dot([[0.6]], [[-0.59  0.75 -0.94  0.34]]) = [[-0.354  0.45  -0.564  0.204]]
          np.dot([[-0.59], [ 0.75], [-0.94], [ 0.34]], [[0.6]]) = [[-0.354], [ 0.45 ], [-0.564], [ 0.204]]
          '''
        • So here’s the tricky bit that I don’t get yet
          # Multiply the values of the relu'd layer [[0, 0.517, 0, 0]] by the goal-output_layer [.61]
          weight_mat = np.dot(layer_1_col_array, layer_1_to_output_delta) # e.g. [[0], [0.31], [0], [0]]
          weights_layer_1_to_output_col_array += alpha * weight_mat # add the scaled deltas in
          
          # Multiply the streetlights [[1], [0], [1] times the relu2deriv'd input_to_layer_1_delta [[0, 0.45, 0, 0]]
          weight_mat = np.dot(input_layer_col_array, input_to_layer_1_delta) # e.g. [[0, 0.45, 0, 0], [0, 0, 0, 0], [0, 0.45, 0, 0]]
          weights_input_to_layer_1_array += alpha * weight_mat # add the scaled deltas in
        • It looks to me that as we work back from the output layer, we multiply our layer’s weights by the manipulated (relu in this case) for the last layer, and the derivative in the next layer forward?  I know that we are working out how to distribute the adjustment of the weights via something like the chain rule…

       

Phil 11.29.18

7:00 – 4:30 ASRC PhD/NASA

    • Listening to repeat of America Abroad Sowing Chaos: Russia’s Disinformation Wars. My original notes are here
    • Finished World without End: The Delta Green Open Campaign Setting, by A. Scott Glancey
      • Overall, this describes the creation of the cannon of the Delta Green playspace. The goal as described was to root the work in existing fiction (Lovecraft’s Cthulhu) and historical fact. This provides the core of the space that players can move out from or fill in. Play does not produce more cannon, so it produces a trajectory that may have high influence for the actual players, but may not move beyond that. The article discusses Agent Angela, as an example of a thumbnail sketch that has become a mythical character, independent of the work of the authors with respect to Cannon. My guess is as the Agent Angela space became “stiffer” that it could also be shared more.
      • As a role-playing game, Delta Green’s narrative differs from the traditional narratives of literature, theater, and film because it offers only plot without characters to drive the story forward. It’s up to the role-players to provide the characters. Role-playing game settings are narratives not built around any specific protagonist, yet capable of accommodating multiple protagonists. Thus, role-playing games, particularly the classic paper-and-dice ones, are by their very nature vast narratives. (page 77)
      • During the designing of the Delta Green vast narrative it was decided that we would publish more open-ended source material than scenarios. Source material is usually built around an enemy of Delta Green with a particular agenda or set of goals, much like a traditional role-playing game scenario is set up, only without the framework of scenes and set pieces designed to channel the players through to a resolution of the scenario. The reason for emphasizing open ended source material over scenarios is that we were trying to encourage Keepers to design their own scenarios without pinning them down with too much canon. That is always a danger with creating a role-playing game background. You want to create a rich environment, but you don’t want to fill in so many details that there is nothing new for the players and Keepers to create with their own games. (Page 81)
      • If the players in a role-playing game campaign start to think that their characters are more disposable than the villain, they are going to feel marginalized After all, whose story is this-theirs or a non-player character’s? The fastest way to alienate a group of players is to give them the impression that they are not the center of the story. If they are not the ones driving the action forward, then what’s the point in playing a role-playing game? They might as well be watching a movie if they cannot affect the pacing, action, and outcome of a story. (Page 83)
    • Going to create a bag of words collection for post subjects and posts that are not from the DM, and then plot the use of the words over time (by sequential post). I think that once stop words are removed, that patterns might be visible.
      • Pulling out the words
      • Have the overall counts
      • Building the count mats
      • Stop words worked, needed to drop punctuation and caps
    • Yoast has an array that looks immediately usable:
      [ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do", "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's", "its", "itself", "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd", "she'll", "she's", "should", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "we", "we'd", "we'll", "we're", "we've", "were", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "would", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves" ]
    • Good, progress. I’m using TF-IDF to determine the importance of the term in the timeline. That’s ok, but not great. Here’s a plot: room_terms
    • You can see the three rooms, but they don’t stand out all that well. Maybe a low-pass filter on top of this? Anyway, done for the day.

 

Phil 11.21.18

7:00 – 4:00 ASRC PhD/NASA

  • More adversarial herding: Bots increase exposure to negative and inflammatory content in online social systems
    • Social media can deeply influence reality perception, affecting millions of people’s voting behavior. Hence, maneuvering opinion dynamics by disseminating forged content over online ecosystems is an effective pathway for social hacking. We propose a framework for discovering such a potentially dangerous behavior promoted by automatic users, also called “bots,” in online social networks. We provide evidence that social bots target mainly human influencers but generate semantic content depending on the polarized stance of their targets. During the 2017 Catalan referendum, used as a case study, social bots generated and promoted violent content aimed at Independentists, ultimately exacerbating social conflict online. Our results open challenges for detecting and controlling the influence of such content on society.
    • Bot detection appendix
      • It occurs to me that if bots can be detected, then they can be mapped in aggregate on the belief map. This could show what types of beliefs are being artificially enhanced or otherwise influenced
  • Migrating Characterizing Online Public Discussions through Patterns of Participant Interactions to Phlog. Done!
  • Working my way through Grokking. Today’s progress:
    # based on https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter6%20-%20Intro%20to%20Backpropagation%20-%20Building%20Your%20First%20DEEP%20Neural%20Network.ipynb
    import numpy as np
    import matplotlib.pyplot as plt
    import typing
    # methods --------------------------------------------
    
    
    # sets all negative numbers to zero
    def relu(x: np.array) -> np.array :
        return (x > 0) * x
    
    
    def relu2deriv(output: float) -> float:
        return output > 0 # returns 1 for input > 0
        # return 0 otherwise
    
    
    def nparray_to_list(vals: np.array) -> typing.List[float]:
        data = []
        for x in np.nditer(vals):
            data.append(float(x))
        return data
    
    
    def plot_mat(title: str, var_name: str, fig_num: int, mat: typing.List[float], transpose: bool = False):
        f = plt.figure(fig_num)
        np_mat = np.array(mat)
        if transpose:
            np_mat = np_mat.T
        plt.plot(np_mat)
        names = []
        for i in range(len(np_mat)):
            names.append("{}[{}]".format(var_name, i))
        plt.legend(names)
        plt.title(title)
    
    # variables ------------------------------------------
    np.random.seed(1)
    hidden_size= 4
    alpha = 0.2
    
    weights_input_to_1_array = 2 * np.random.random((3, hidden_size)) - 1
    weights_1_to_output_array = 2 * np.random.random((hidden_size, 1)) - 1
    # the samples. Columns are the things we're sampling
    streetlights_array = np.array( [[ 1, 0, 1 ],
                                    [ 0, 1, 1 ],
                                    [ 0, 0, 1 ],
                                    [ 1, 1, 1 ] ] )
    
    # The data set we want to map to. Each entry in the array matches the corresponding streetlights_array roe
    walk_vs_stop_array = np.array([1, 1, 0, 0]).T # and why are we using the transpose here?
    
    error_plot_mat = [] # for drawing plots
    weights_l1_to_output_plot_mat = [] # for drawing plots
    weights_input_to_l1_plot_mat = [] # for drawing plots
    
    iter = 0
    max_iter = 1000
    epsilon = 0.001
    layer_2_error = 2 * epsilon
    
    while layer_2_error > epsilon:
        layer_2_error = 0
        for row_index in range(len(streetlights_array)):
            # input holds one instance of the data set at a time
            input_layer_array = streetlights_array[row_index:row_index + 1]
            # layer one holds the results of the NONLINEAR transformation of the input layer's values (multiply by weights and relu)
            layer_1_array = relu(np.dot(input_layer_array, weights_input_to_1_array))
            # output layer takes the LINEAR transformation of the values in layer one and sums them (mult)
            output_layer = np.dot(layer_1_array, weights_1_to_output_array)
    
            # the error is the difference of the output layer and the goal squared
            goal = walk_vs_stop_array[row_index:row_index + 1]
            layer_2_error += np.sum((output_layer - goal) ** 2)
    
            # compute the amount to adjust the transformation weights for layer one to output
            layer_1_to_output_delta = (goal - output_layer)
            # compute the amount to adjust the transformation weights for input to layer one
            input_to_layer_1_delta= layer_1_to_output_delta.dot(weights_1_to_output_array.T) * relu2deriv(layer_1_array)
    
            #Still need to figure out why the transpose, but this is where we incrementally adjust the weights
            l1t_array = layer_1_array.T
            ilt_array = input_layer_array.T
            weights_1_to_output_array += alpha * l1t_array.dot(layer_1_to_output_delta)
            weights_input_to_1_array += alpha * ilt_array.dot(input_to_layer_1_delta)
    
            print("[{}] Error: {:.3f}, L0: {}, L1: {}, L2: {}".format(iter, layer_2_error, input_layer_array, layer_1_array, output_layer))
    
            #print("[{}] Error: {}, Weights: {}".format(iter, total_error, weight_array))
            error_plot_mat.append([layer_2_error])
    
            weights_input_to_l1_plot_mat.append(nparray_to_list(weights_input_to_1_array))
            weights_l1_to_output_plot_mat.append(nparray_to_list(weights_1_to_output_array))
    
            iter += 1
            # stop even if we don't converge
            if iter > max_iter:
                break
    
    print("\n--------------evaluation")
    for row_index in range(len(streetlights_array)):
        input_layer_array = streetlights_array[row_index:row_index + 1]
        layer_1_array = relu(np.dot(input_layer_array, weights_input_to_1_array))
        output_layer = np.dot(layer_1_array, weights_1_to_output_array)
    
        print("{} = {:.3f} vs. {}".format(input_layer_array, float(output_layer), walk_vs_stop_array[row_index]))
    
    # plots ----------------------------------------------
    
    f1 = plt.figure(1)
    plt.plot(error_plot_mat)
    plt.title("error")
    plt.legend(["layer_2_error"])
    
    plot_mat("input to layer 1 weights", "weight", 2, weights_input_to_l1_plot_mat)
    plot_mat("layer 1 to output weights", "weight", 3, weights_l1_to_output_plot_mat)
    
    
    
    plt.show()
    
    

Phil 11.20.18

7:00 – 3:30 ASRC PhD/NASA

  • Disrupting the Coming Robot Stampedes: Designing Resilient Information Ecologies got accepted to the iConference! Time to start thinking about the slide deck…
    • Workshop: Online nonsense: tools and teaching to combat fake news on the Web
      • How can we raise the quality of what we find on the Web? What software might we build, what education might we try to provide, and what procedures (either manual or mechanical) might be introduced? What are the technical and legal issues that limit our responses? The speakers will suggest responses to problems, and we’ll ask the audience what they would do in specific circumstances. Examples might include anti-vaccination pages, nonstandard cancer treatments, or climate change denial. We will compare with past history, such as the way CB radio became useless as a result of too much obscenity and abuse, or the way the Hearst newspapers created the Spanish-American War. We’ll report out the suggestions and evaluations of the audience.
  • SocialOcean: Visual Analysis and Characterization of Social Media Bubbles
    • Social media allows citizens, corporations, and authorities to create, post, and exchange information. The study of its dynamics will enable analysts to understand user activities and social group characteristics such as connectedness, geospatial distribution, and temporal behavior. In this context, social media bubbles can be defined as social groups that exhibit certain biases in social media. These biases strongly depend on the dimensions selected in the analysis, for example, topic affinity, credibility, sentiment, and geographic distribution. In this paper, we present SocialOcean, a visual analytics system that allows for the investigation of social media bubbles. There exists a large body of research in social sciences which identifies important dimensions of social media bubbles (SMBs). While such dimensions have been studied separately, and also some of them in combination, it is still an open question which dimensions play the most important role in defining SMBs. Since the concept of SMBs is fairly recent, there are many unknowns regarding their characterization. We investigate the thematic and spatiotemporal characteristics of SMBs and present a visual analytics system to address questions such as: What are the most important dimensions that characterize SMBs? and How SMBs embody in the presence of specific events that resonate with them? We illustrate our approach using three different real scenarios related to the single event of Boston Marathon Bombing, and political news about Global Warming. We perform an expert evaluation, analyze the experts’ feedback, and present the lessons learned.
  • More Grokking. We’re at backpropagation, and I’m not seeing it yet. The pix are cool though:
  • Continuing Characterizing Online Public Discussions through Patterns of Participant Interactions.
    • This paper introduces a computational framework to characterize public discussions, relying on a representation that captures a broad set of social patterns which emerge from the interactions between interlocutors, comments and audience reactions. (Page 198:1)
    • we use it to predict the eventual trajectory of individual discussions, anticipating future antisocial actions (such as participants blocking each other) and forecasting a discussion’s growth (Page 198:1)
    • platform maintainers may wish to identify salient properties of a discussion that signal particular outcomes such as sustained participation [9] or future antisocial actions [16], or that reflect particular dynamics such as controversy [24] or deliberation [29]. (Page 198:1)
    • Systems supporting online public discussions have affordances that distinguish them from other forms of online communication. Anybody can start a new discussion in response to a piece of content, or join an existing discussion at any time and at any depth. Beyond textual replies, interactions can also occur via reactions such as likes or votes, engaging a much broader audience beyond the interlocutors actively writing comments. (Page 198:2)
      • This is why JuryRoom would be distinctly different. It’s unique affordances should create unique, hopefully clearer results.
    • This multivalent action space gives rise to salient patterns of interactional structure: they reflect important social attributes of a discussion, and define axes along which discussions vary in interpretable and consequential ways. (Page 198:2)
    • Our approach is to construct a representation of discussion structure that explicitly captures the connections fostered among interlocutors, their comments and their reactions in a public discussion setting. We devise a computational method to extract a diverse range of salient interactional patterns from this representation—including but not limited to the ones explored in previous work—without the need to predefine them. We use this general framework to structure the variation of public discussions, and to address two consequential tasks predicting a discussion’s future trajectory: (a) a new task aiming to determine if a discussion will be followed by antisocial events, such as the participants blocking each other, and (b) an existing task aiming to forecast the growth of a discussion [9]. (Page 198:2)
    • We find that the features our framework derives are more informative in forecasting future events in a discussion than those based on the discussion’s volume, on its reply structure and on the text of its comments (Page 198:2)
    • we find that mainstream print media (e.g., The New York Times, The Guardian, Le Monde, La Repubblica) is separable from cable news channels (e.g., CNN, Fox News) and overtly partisan outlets (e.g., Breitbart, Sean Hannity, Robert Reich)on the sole basis of the structure of the discussions they trigger (Figure 4).(Page 198:2)
    • Figure 4
    • These studies collectively suggest that across the broader online landscape, discussions take on multiple types and occupy a space parameterized by a diversity of axes—an intuition reinforced by the wide range of ways in which people engage with social media platforms such as Facebook [25]. With this in mind, our work considers the complementary objective of exploring and understanding the different types of discussions that arise in an online public space, without predefining the axes of variation. (Page 198:3)
    • Many previous studies have sought to predict a discussion’s eventual volume of comments with features derived from their content and structure, as well as exogenous information [893069, inter alia]. (Page 198:3)
    • Many such studies operate on the reply-tree structure induced by how successive comments reply to earlier ones in a discussion rooted in some initial content. Starting from the reply-tree view, these studies seek to identify and analyze salient features that parameterize discussions on platforms like Reddit and Twitter, including comment popularity [72], temporal novelty [39], root-bias [28], reply-depth [41, 50] and reciprocity [6]. Other work has taken a linear view of discussions as chronologically ordered comment sequences, examining properties such as the arrival sequence of successive commenters [9] or the extent to which commenters quote previous contributions [58]. The representation we introduce extends the reply-tree view of comment-to-comment. (Page 198:3)
    • Our present approach focuses on representing a discussion on the basis of its structural rather than linguistic attributes; as such, we offer a coarser view of the actions taken by discussion participants that more broadly captures the nature of their contributions across contexts which potentially exhibit large linguistic variation.(Page 198:4)
    • This representation extends previous computational approaches that model the relationships between individual comments, and more thoroughly accounts for aspects of the interaction that arise from the specific affordances offered in public discussion venues, such as the ability to react to content without commenting. Next, we develop a method to systematically derive features from this representation, hence producing an encoding of the discussion that reflects the interaction patterns encapsulated within the representation, and that can be used in further analyses.(Page 198:4)
    • In this way, discussions are modelled as collections of comments that are connected by the replies occurring amongst them. Interpretable properties of the discussion can then be systematically derived by quantifying structural properties of the underlying graph: for instance, the indegree of a node signifies the propensity of a comment to draw replies. (Page 198:5)
      • Quick responses that reflect a high degree of correlation would be tight. A long-delayed “like” could be slack?
    • For instance, different interlocutors may exhibit varying levels of engagement or reciprocity. Activity could be skewed towards one particularly talkative participant or balanced across several equally-prolific contributors, as can the volume of responses each participant receives across the many comments they may author.(Page 198: 5)
    • We model this actor-focused view of discussions with a graph-based representation that augments the reply-tree model with an additional superstructure. To aid our following explanation, we depict the representation of an example discussion thread in Figure 1 (Page 198: 6)
    • Fig1Table1
    • Relationships between actors are modeled as the collection of individual responses they exchange. Our representation reflects this by organizing edges into hyperedges: a hyperedge between a hypernode C and a node c ‘ contains all responses an actor directed at a specific comment, while a hyperedge between two hypernodes C and C’ contains the responses that actor C directed at any comment made by C’ over the entire discussion. (Page 198: 6)
      • I think that this  can be represented as a tensor (hyperdimensional or flattened) with each node having a value if there is an intersection. There may be an overall scalar that allows each type of interaction to be adjusted as a whole
    • The mixture of roles within one discussion varies across different discussions in intuitively meaningful ways. For instance, some discussions are skewed by one particularly active participant, while others may be balanced between two similarly-active participants who are perhaps equally invested in the discussion. We quantify these dynamics by taking several summary statistics of each in/outdegree distribution in the hypergraph representation, such as their maximum, mean and entropy, producing aggregate characterizations of these properties over an entire discussion. We list all statistics computed in the appendices (Table 4). (Page 198: 6, 7)
    • Table4
    • To interpret the structure our model offers and address potentially correlated or spurious features, we can perform dimensionality reduction on the feature set our framework yields. In particular, let X be a N×k matrix whose N rows each correspond to a thread represented by k features.We perform a singular value decomposition on X to obtain a d-dimensional representation X ˜ Xˆ = USVT where rows of U are embeddings of threads in the induced latent space and rows of V represent the hypergraph-derived features. (Page 198: 9)
      • This lets us find the hyperplane of the map we want to build
    • Community-level embeddings. We can naturally extend our method to characterize online discussion communities—interchangeably, discussion venues—such as Facebook Pages. To this end, we aggregate representations of the collection of discussions taking place in a community, hence providing a representation of communities in terms of the discussions they foster. This higher level of aggregation lends further interpretability to the hypergraph features we derive. In particular, we define the embedding U¯C of a community C containing threads {t1, t2, . . . tn } as the average of the corresponding thread embeddings Ut1 ,Ut2 , . . .Utn , scaled to unit l2 norm. Two communities C1 and C2 that foster structurally similar discussions then have embeddings U¯C1 and U¯C2 that are close in the latent space.(Page 198: 9)
      • And this may let us place small maps in a larger map. Not sure if the dimensions will line up though
    • The set of threads to a post may be algorithmically re-ordered based on factors like quality [13]. However, subsequent replies within a thread are always listed chronologically.We address elements of such algorithmic ranking effects in our prediction tasks (§5). (Page 198: 10)
    • Taken together, these filtering criteria yield a dataset of 929,041 discussion threads.(Page 198: 10)
    • We now apply our framework to forecast a discussion’s trajectory—can interactional patterns signal future thread growth or predict future antisocial actions? We address this question by using the features our method extracts from the 10-comment prefix to predict two sets of outcomes that occur temporally after this prefix. (Pg 198:10)
      • These are behavioral trajectories, though not belief trajectories. Maps of these behaviors could probably be built, too.
    • For instance, news articles on controversial issues may be especially susceptible to contentious discussions, but this should not translate to barring discussions about controversial topics outright. Additionally, in large-scale social media settings such as Facebook, the content spurring discussions can vary substantially across different sub-communities, motivating the need to seek adaptable indicators that do not hinge on content specific to a particular context. (Page 198: 11)
    • Classification protocol. For each task, we train logistic regression classifiers that use our full set of hypergraph-derived features, grid-searching over hyperparameters with 5-fold cross-validation and enforcing that no Page spans multiple folds.13 We evaluate our models on a (completely fresh) heldout set of thread pairs drawn from the subsequent week of data (Nov. 8-14, 2017), addressing a model’s potential dependence on various evolving interface features that may have been deployed by Facebook during the time spanned by the training data. (Page 198: 11)
      • We use logistic regression classifiers from scikit-learn with l2 loss, standardizing features and grid-searching over C = {0.001, 0.01, 1}. In the bag-of-words models, we tf-idf transform features, set a vocabulary size of 5,000 words and additionally grid-search over the maximum document frequency in {0.25, 0.5, 1}. (Page 198: 11, footnote 13)
    • We test a model using the temporal rate of commenting, which was shown to be a much stronger signal of thread growth than the structural properties considered in prior work [9] (Page 198: 12)
    • Table 3 shows Page-macroaveraged heldout accuracies for our prediction tasks. The feature set we extract from our hypergraph significantly outperforms all of the baselines in each task. This shows that interactional patterns occurring within a thread’s early activity can signal later events, and that our framework can extract socially and structurally-meaningful patterns that are informative beyond coarse counts of activity volume, the reply-tree alone and the order in which commenters contribute, along with a shallow representation of the linguistic content discussed. (Page 198: 12)
      • So triangulation from a variety of data sources produces more accurate results in this context, and probably others. Not a surprising finding, but important to show
    • Table3
    • We find that in almost all cases, our full model significantly outperforms each subcomponent considered, suggesting that different parts of the hypergraph framework add complementary information across these tasks. (Page 198: 13)
    • Having shown that our approach can extract interaction patterns of practical importance from individual threads, we now apply our framework to explore the space of public discussions occurring on Facebook. In particular, we identify salient axes along which discussions vary by qualitatively examining the latent space induced from the embedding procedure described in §3, with d = 7 dimensions. Using our methodology, we recover intuitive types of discussions, which additionally reflect our priors about the venues which foster them. This analysis provides one possible view of the rich landscape of public discussions and shows that our thread representation can structure this diverse space of discussions in meaningful ways. This procedure could serve as a starting point for developing taxonomies of discussions that address the wealth of structural interaction patterns they contain, and could enrich characterizations of communities to systematically account for the types of discussions they foster. (Page 198: 14) 
      • ^^^Show this to Wayne!^^^
    • The emergence of these groupings is especially striking since our framework considers just discussion structure without explicitly encoding for linguistic, topical or demographic data. In fact, the groupings produced often span multiple languages—the cluster of mainstream news sites at the top includes French (Le Monde), Italian (La Repubblica) and German (SPIEGEL ONLINE) outlets; the “sports” region includes French (L’EQUIPE) as well as English outlets. This suggests that different types of content and different discussion venues exhibit distinctive interactional signatures, beyond lexical traits. Indeed, an interesting avenue of future work could further study the relation between these factors and the structural patterns addressed in our approach, or augment our thread representation with additional contextual information. (Page 198: 15)
    • Taken together, we can use the features, threads and Pages which are relatively salient in a dimension to characterize a type of discussion. (Page 198: 15)
    • To underline this finer granularity, for each examined dimension we refer to example discussion threads drawn from a single Page, The New York Times (https://www.facebook.com/nytimes), which are listed in the footnotes. (Page 198: 15)
      • Common starting point. Do they find consensus, or how the dimensions reduce?
    • Focused threads tend to contain a small number of active participants replying to a large proportion of preceding comments; expansionary threads are characterized by many less-active participants concentrating their responses on a single comment, likely the initial one. We see that (somewhat counterintuitively) meme-sharing discussion venues tend to have relatively focused discussions. (Page 198: 15)
      • These are two sides of the same dimension-reduction coin. A focused thread should be using the dimension-reduction tool of open discussion that requires the participants to agree on what they are discussing. As such it refines ideas and would produce more meme-compatible content. Expansive threads are dimension reducing to the initial post. The subsequent responses go in too many directions to become a discussion.
    • Threads at one end (blue) have highly reciprocal dyadic relationships in which both reactions and replies are exchanged. Since reactions on Facebook are largely positive, this suggests an actively supportive dynamic between actors sharing a viewpoint, and tend to occur in lifestyle-themed content aggregation sub-communities as well as in highly partisan sites which may embody a cohesive ideology. In threads at the other end (red), later commenters tend to receive more reactions than the initiator and also contribute more responses. Inspecting representative threads suggests this bottom-heavy structure may signal a correctional dynamic where late arrivals who refute an unpopular initiator are comparatively well-received. (Page 198: 17)
    • This contrast reflects an intuitive dichotomy of one- versus multi-sided discussions; interestingly, the imbalanced one-sided discussions tend to occur in relatively partisan venues, while multi-sided discussions often occur in sports sites (perhaps reflecting the diversity of teams endorsed in these sub-communities). (Page 198: 17)
      • This means that we can identify one-sided behavior and use that then to look at they underlying information. No need to look in diverse areas, they are taking care of themselves. This is ecosystem management 101, where things like algae blooms and invasive species need to be recognized and then managed
    • We now seek to contrast the relative salience of these factors after controlling for community: given a particular discussion venue, is the content or the commenter more responsible for the nature of the ensuing discussions? (Page 198: 17)
    • This suggests that, perhaps somewhat surprisingly, the commenter is a stronger driver of discussion type. (Page 198: 18)
      • I can see that. The initial commenter is kind of a gate-keeper to the discussion. A low-dimension, incendiary comment that is already aligned with one group (“lock her up”), will create one kind of discussion, while a high-dimensional, nuanced post will create another.
    • We provide a preliminary example of how signals derived from discussion structure could be applied to forecast blocking actions, which are potential symptoms of low-quality interactions (Page 198: 18)
    • Important references