Phil 12.14.18

7:00 – 4:30 ASRC PhD/NASA

  • Sent Greg a couple of quick notes on using CNNs to match bacteria to phages.
  • Continuing with Normal Accidents
  • A Digital Test of the News: Checking the Web for Public Facts – Workshop report, December 2018
    • The Digital Test of the News workshop brought together digital sociologists, data visualisation and new media researchers at the Centre for Interdisciplinary Methodologies at the University of Warwick on 8 and 9 May 2018. The workshop is part of a broader research collaboration between the Centre for Interdisciplinary Methodologies and the Public Data Lab which investigates the changing nature of public knowledge formation in digital societies and develops inventive methods to capture and visualise knowledge dynamics online. Below we outline the workshop’s aims and outcomes.
  • Added plots to the NN code. Everything seems to look right. Looking at the individual weights in a layer is very informative. Need to add this kind of plotting to our keras code somehow:
  • Changing the coherence code so that the row values are zero or one. Actually, as the amount of data grows, BOW is getting more useful raw. This spreadsheet shows all posts, including the DM. Note that the word frequency is power law (R squared = .9352): All_posts
  • Started the optimizer and excel utils classes
  • NOAA meeting
  • NOAA meeting 2
  • Some thoughts from Aaron on our initial ML approach
    • I think for the first pass we do 1-2 models based on contract type focused on the $500k+ award contracts (120ish total).
    • We construct the inputs like sentences in the word generation LSTM with padded lengths equal to the longest running contract of that type, sorted by length. The model can be tested against current contracts held out in a test set using point by point prediction so we can show accuracy of the model against existing data and use that to set our accuracy threshold.
    • My guess is this will be at least an ORF/PAC model (two different primary contract types) which we can work on tuning with the to get as accurate as possible in the timeframe we have.
    • One of the things we advertise as “next steps” is a detailed analysis of contracts based on similarity measures to identify a series of more accurate models. We can pair this with additional models such as Exponential Smoothing and ARIMA which use fundamentally the exact same pipeline.
    • The GUI will be plumbed up to show these analytic outputs on a per contract basis and we can show by the end of January a simple linear model and the LSTM model to demonstrate how Exponential Smoothing / ARIMA or model averages could be displayed. Once we have these outputs we can take the top 5 highest predicted UDO and display in a summary page so they can use those as a launching off point.
    • If we do it this way it means we only have to focus on the completion of TimeSeriesML LSTM and it’s data pipeline with a maximum of 2 initial models (contract type). I think that is a far more reasonable thing to complete in the timeframe and should still be really exciting to show off.

Phil 12.13.18

7:00 – 4:00 ASRC PhD/NASA

  • BBC Business Daily on making decisions under uncertainty. In particular, David Tuckett (Scholar), professor and director of the Centre for the Study of Decision-Making Uncertainty at University College London talks about how we reduce our sense of uncertainty by telling ourselves stories that we can then align with. This reminds me of how conspiracy theories develop, in particular the remarkable storyline of QAnon.
  • More Normal Accident review
  • NYTimes on frictionless design being a problem
  • Dungeon processing – broke out three workbooks for queries with all players, no dm, and just the dm. Also need to write up some code that generates the story on html.
  • Backprop debugging. I think it works? class_error
  • Here’s the core of the forward (train) and backpropagation (learn) code:
    def train(self):
        if self.source != None:
            src = self.source
            self.neuron_row_array =, src.weight_row_mat)
            if( != None): # No activation function to output layer
                self.neuron_row_array = relu(self.neuron_row_array) # TODO: use passed-in activation function
            self.neuron_col_array = self.neuron_row_array.T
    def learn(self, alpha):
        if self.source != None:
            src = self.source
            delta_scalar =, src.weight_col_mat)
            delta_threshold = relu2deriv(src.neuron_row_array) # TODO: use passed in derivative function
   = delta_scalar * delta_threshold
            mat =,
            src.weight_row_mat += alpha * mat
            src.weight_col_mat = src.weight_row_mat.T
  • And here’s the evaluation:
  • --------------evaluation
    input: [[1. 0. 1.]] = pred: 0.983 vs. actual:[1]
    input: [[0. 1. 1.]] = pred: 0.967 vs. actual:[1]
    input: [[0. 0. 1.]] = pred: -0.020 vs. actual:[0]
    input: [[1. 1. 1.]] = pred: 0.000 vs. actual:[0]

Phil 12.12.18

7:00 – 4:30 ASRC NASA/PhD

  • Do a dungeon analytic with new posts and DM for Aaron – done!
  • Send email to Shimei for registration and meeting after grading is finished
  • Start review of Normal Accidents – started!
  • Debug NN code – in process. Very tricky figuring out the relationships between the layers in backpropagation
  • Sprint planning
  • NASA meeting
  • Talked to Zach about the tagging project. Looks good, but I wonder how much time we’ll have. Got a name though – TaggerML

Phil 12.11.18

7:00 – 4:30 ASRC PhD/NASA


Somehow, this needs to get into a discussion of the trustworthiness of maps

  • I realized that we can hand-code these initial dungeons, learn a lot and make this a baseline part of the study. This means that we can compare human and machine data extraction for map making. My initial thoughts as to the sequence are:
    • Step 1: Finish running the initial dungeon
    • Step 2: researchers determine a set of common questions that would be appropriate for each room. Something like:
      • Who is the character?
      • Where is the character?
      • What is the character doing?
      • Why is the character doing this?
    • Each answer should also include a section of the text that the reader thinks answers that question. Once this has been worked out on paper, a simple survey website (simpler) can be built that automates this process and supports data collection at moderate scales.
    • Use answers to populate a “Trajectories” sheet in an xml file and build a map!
    • Step 3: Partially automate the extraction to give users a generated survey that lets them select the most likely answer/text for the who/where/what/why questions. Generate more maps!
    • Step 4: Full automation
  • Added these thoughts to the analysis section of the google doc
  • The 11th International Natural Language Generation Conference
    • The INLG conference is the main international forum for the presentation and discussion of all aspects of Natural Language Generation (NLG), including data-to-text, concept-to-text, text-to-text and vision to-text approaches. Special topics of interest for the 2018 edition included:
      • Generating Text with Affect, Style and Personality,
      • Conversational Interfaces, Chatbots and NLG, and
      • Data-driven NLG (including the E2E Generation Challenge)
  • Back to grokking DNNs
    • Still building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
    • array formatting issues. Tricky
    • I think I’m done enough to start debugging. Tomorrow
  • Sprint review

Phil 12.10.18

7:00 – 5:30 ASRC NASA/PhD

  • For my morning academic work, I am cooking delicious things.
  • There is text in the dungeon! Here’s what happened when I ran the analytics against 3 posts and held back the dungeon master. Rather than put up a bunch of screenshots, here’s the spreadsheet: Day_1_Dungeon_1
  • Russell Richie (twitter) (Scholar) One of my favorite results in the paper is that you can compress the embeddings 10x or more while preserving prediction performance, suggesting that the type of knowledge used to make these kind of judgments may only vary along a relative handful of latent dimensions.
    • dtwstpluuaasaa4
    • dtwqzvwv4aa7kot
  • Ok, back to grokking DNNs
    • Building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
  • Fika and meeting with Wayne
    • Ade might be interested in doing some coding work!
    • Went over the initial results spreadsheet with Wayn. Overall, progress seems on track. He had an additional thought for venues that I didn’t note.
    • Ping Shimei about 899

Phil 12.7.18

7:00 – 4:30 ASRC NASA/PhD

Analyzing Discourse and Text Complexity for Learning and Collaborating

Analyzing Discourse and Text Complexity for Learning and Collaborating

Author: Mihai Dascalu


  • …informational level, coherence is most frequently accounted by: lexical chains (Morris and Hirst 1991; Barzilay and Elhadad 1997; Lapata and Barzilay 2005) (see 4.3.1 Semantic Distances and Lexical Chains), centering theory (Miltsakaki and Kukich 2000; Grosz et al. 1995) (see 4.2 Discourse Analysis and the Polyphonic Model) in which coherence is established via center continuation, or Latent Semantic Analysis (Foltz et al. 1993, 1998) (see 4.3.2 Semantic Similarity through Tagged LSA) used for measuring the cosine similarity between adjacent phrases
  • Among chat voices there are sequential and transversal relations, highlighting a specific point of view in a counterpointal way, as mentioned in previous work (Trausan-Matu and Rebedea 2009).
  • From a computational perspective, until recently, the goals of discourse analysis in existing approaches oriented towards conversations analysis were to detect topics and links (Adams and Martell 2008), dialog acts (Kontostathis et al. 2009), lexical chains (Dong 2006) or other complex relations (Rose et al. 2008) (see 3.1.3 CSCL Computational Approaches). The polyphonic model takes full advantage of term frequency – inverse document frequency Tf-Jdf (Adams and Martell 2008; Schmidt and Stone), Latent Semantic Analysis (Schmidt and Stone ; Dong 2006), Social Network Analysis (Dong 2006), Machine Learning (e.g., Nai”ve Bayes (Kontostathis et al. 2009), Support Vector Machines and Collin’s perceptron (Joshi and Rose 2007), the TagHelper environment (Rose et al. 2008) and the semantic distances from the lexicalized ontology WordNet (Adams and Martell 2008; Dong 2006). The model starts from identifying words and patterns in utterances that are indicators of cohesion among them and, afterwards, performs an analysis based on the graph, similar in some extent to a social network, and on threads and their interactions.
  • Semantic Distances and Lexical Chains: an ontology consists of a set of concepts specific to a domain and of the relations between pairs of concepts. Starting from the representation of a domain, we can define various distance metrics between concepts based on the defined relationships among them and later on extract lexical chains, specific to a given text that consist of related/cohesive concepts spanning throughout a text fragment or the entire document.
    • Lexicalized Ontologies and Semantic Distances: One of the most commonly used resources for English sense relations in terms of lexicalized ontologies is the WordNet lexical database (Fellbaum 1998; Miller I 995, 2010) that consists of three separate databases, one for nouns, a different one for verbs, and a third one for adjectives and adverbs. WordNet groups words into sets of cognitively related words (synsets), thus describing a network of meaningfully inter-linked words and concepts.
    • Nevertheless, we must also present the limitations of WordNet and of semantic distances, with impact on the development of subsequent systems (see 6 PolyCAFe – Polyphonic Conversation Analysis and Feedback and 7 ReaderBench (I) – Cohesion-based Discourse Analysis and Dialogism): I/ the focus only on common words, without covering any special domain vocabularies; 2/ reduced extensibility as the serialized model makes difficult the addition of new domain-specific concepts or relationships
    • Building the Disambiguation Graph:Lexical chaining derives from textual cohesion (Halliday and Hasan 1976) and involves the selection of related lexical items in a given text (e.g. , starting from Figure 8, the following lexical chain could be generated if all words occur in the initial text fragment: “cheater, person, cause, cheat, deceiver, . .. “). In other words, the lexical cohesive structure of a text can be represented as lexical chaining that consists of sequences of words tied together by semantic relationships and that can span across the entire text or a subsection of it. (Ontology-based chaining formulas on page 63)
    • The types of semantic relations taken into consideration when linking two words are hypernymy, hyponymy, synonymy, antonymy, or whether the words are siblings by sharing a common hypernym. The weights associated with each relation vary according to the strength of the relation and the proximity of the two words in the text analyzed.
  • Semantic Similarity through Tagged LSA: Latent Semantic Analysis (LSA) (Deerwester et al. 1989; Deerwester et al. 1990; Dumais 2004; Landauer and Dumais 1997) is a natural language processing technique starting from a vector-space representation of semantics highlighting the co-occurrence relations between terms and containing documents, after that projecting the terms in sets of concepts (semantic spaces) related to the initial texts. LSA builds the vector-space model, later on used also for evaluating similarity between terms and documents, now indirectly linked through concepts (Landauer et al. 1998a; Manning and Schi.itze 1999). Moreover, LSA can be considered a mathematical method for representing words’ and passages’ meaning by analyzing in an unsupervised manner a representative corpus of natural language texts.
    • In terms of documents size, semantically and topically coherent passages of approximately 50 to 100 words are the optimal units to be taken into consideration while building the initial matrix (Landauer and Dumais 2011).
      • This fits nicely to post size. Also a good design consideration for JuryRoom
    • Therefore, as compromise of all previous NLP specific treatments, the latest version of the implemented tagged LSA model (Dascalu et al. 2013a; Dascalu et al. 2013b) uses lemmas plus their corresponding part-of-speech, after initial input cleaning and stop words elimination.
  • Topic Relatedness through Latent Dirichlet Allocation
    • Starting from the presumption that documents integrate multiple topics, each document can now be considered a random mixture of corpus-wide topics. In order to avoid confusion, an important aspect needs to be addressed: topics within LDA are latent classes, in which every word has a given probability, whereas topics that are identified within subsequently developed systems (A .S.A.P., Ch.A.MP., Po/yCAFe and ReaderBench) are key concepts from the text. Additionally, similar to LSA, LDA also uses the implicit assumption of the bag of words approach that the order of words doesn’t matter when extracting key concepts and similarities of concepts through co-occurrences within a large corpus.
    • Every topic contains a probability for every word, but after the inference phase a remarkable demarcation can be observed between salient or dominant concepts of a topic and all other vocabulary words. In other words, the goal of LDA is to reflect the thematic structure of a document or of a collection through hidden variables and to infer this hidden structure by using a posterior inference model (Blei et al. 2003)
    • there are inevitably estimation errors, more notable when addressing smaller documents or texts with a wider spread of concepts, as the mixture of topics becomes more uncertain

Phil 12.6.18

7:00 – 4:00 ASRC PhD/NASA

  • Looks like Aaron has added two users
  • Create a “coherence” matrix, where the threshold is based on an average of one or more previous cells. The version shown below uses the tf-idf matrix as a source and checks to see if there are any non-zero values within an arbitrary span. If there are, then the target matrix (initialized with zeroes) is incremented by one on that span. This process iterates from a step of one (the default), to the specified step size. As a result, the more contiguous nonzero values are, the larger and more bell-curved the row sequences will be: spreadsheet3
  • Create a “details” sheet that has information about the database, query, parameters, etc. Done.
  • Set up a redirect so that users have to go through the IRB page if they come from outside the antibubbles site
  • It’s the End of News As We Know It (and Facebook Is Feeling Fine)
    • And as the platforms pumped headlines into your feed, they didn’t care whether the “news” was real. They didn’t want that responsibility or expense. Instead, they honed in on engagement—did you click or share, increasing value to advertisers?
      • Diversity (responsibility, expense), Stampede (engagement, share)
  • Finished Analyzing Discourse and Text Complexity for Learning and Collaborating, and created this entry for the notes.
  • Was looking at John Du Bois paper Towards a dialogic syntax, which looks really interesting, but seems like it might be more appropriate for spoken dialog. Instead, I think I’ll go to Claire Cardie‘s presentation on chat argument analysis at UMD tomorrow and see if that has better alignment.
    • Argument Mining with Structured SVMs and RNNs
      • We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

Phil 12.5.18

7:00 – 4:30 ASRC PhD/NASA

Phil 12.4.18

7:00 – 8:00 (13 hrs) ASRC NASA/PhD

  • Put my discourse analysis finds here, so they don’t get lost.
  • Adding a bit more to my post that talks about inertial network behavior
  • Added xmlwriter, since Pandas can’t handle writing out dictionaries, though it can plot them just fine…
  • The test dungeon discourse sequence as a matrix. You can clearly see the three rooms in the top rows. Aaron and I agree that this is a cross-correlation signal processing problem. Next is to gather some real-world dataspreadsheet
  • Discussion with Aaron about next steps with Antonio. Basically say that we’re booked through April, but can review and comment.
  • IEEE Talk by Hai “Helen” Li of Duke University:



Phil 12.3.18

7:00 – 6:00 ASRC PhD

  • Reading Analyzing Discourse and Text Complexity for Learning and Collaborating, basically to find methods that show important word frequency varying over time.
  • Just in searching around, I also found a bunch of potentially useful resources. I’m emphasizing Python at the moment, because that’s the language I’m using at work right now.
    • 5agado has a bunch of nice articles on Medium, linked to code. In particular, there’s Conversation Analyzer – An Introduction, with associated code.
    • High frequency word entrainment in spoken dialogue
      • Cognitive theories of dialogue hold that entrainment, the automatic alignment between dialogue partners at many levels of linguistic representation, is key to facilitating both production and comprehension in dialogue. In this paper we examine novel types of entrainment in two corpora—Switchboard and the Columbia Games corpus. We examine entrainment in use of high-frequency words (the most common words in the corpus), and its association with dialogue naturalness and flow, as well as with task success. Our results show that such entrainment is predictive of the perceived naturalness of dialogues and is significantly correlated with task success; in overall interaction flow, higher degrees of entrainment are associated with more overlaps and fewer interruptions.
    • Looked some more at the Cornel Toolkit, but it seems focussed on other conversation attributes, with more lexical analysis coming later
    • There is a github topic on discourse-analysis, of which John W. DuBoisrezonator project looks particularly interesting. Need to ask Wayne about how to reach out to someone like that.
      • Recently I’ve been interested in what happens when participants in conversation build off each other, reusing words, structures and other linguistic resources just used by a prior speaker. In dialogic syntax, as I call it, parallelism of structure across utterances foregrounds similarities in function, but also brings out differences. Participants notice even the subtlest contrasts in stance–epistemic, affective, illocutionary, and so on–generated by the resonance between juxtaposed utterances. The theories of dialogic syntax and stance are closely related, and I’m currently working on exploring this linkage–one more example of figuring out how language works on multiple levels simultaneously, uniting structure, meaning, cognition, and social interaction.
  • From Computational Propaganda: If You Make It Trend, You Make It True
    • As an example, searching for “Vitamin K shot” (a routine health intervention for newborns) returns almost entirely anti-vaccine propaganda; anti-vaccine conspiracists write prolific quantities of content about that keyword, actively selling the myth that the shot is harmful, causes cancer, causes SIDS. Searches for the phrase are sparse because medical authorities are not producing counter-content or fighting the SEO battle in response.
    • This is literally a use case where a mapping interface would show that something funny was going on in this belief space
  • Yuanyuan’s proposal defense
    • Surgical telementoring, trainee performing the operation is monitored remotely by expert.
    • These are physical models!
    • Manual coding
    • Tracks communication intention, not lexical content
    • Linear Mixed Model
      • Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure. For example, students could be sampled from within classrooms, or patients from within doctors.
    • DiCoT: a methodology for applying Distributed Cognition to the design of team working systems <– might be worth looking at for dungeon teams
    • Note, a wireless headset mic is nice if there are remote participants and you need to move around the room
    • GLIMMPSE power analysis
  • Add list of publications to the dissertation?
  • Good meeting with Wayne. Brought him up to speed on We discussed chiplay 2019 as a good next venue. We also went over what the iConference presentation might be. More as this develope, since it’s not all that clear. Certainly a larger emphasis on video. Also, it will be in the first batch of presentations.

Phil 12.2.18

This is a story about information at rest and information in motion. Actually, it’s really just a story about information in  motion, mediated by computers. Information at rest is pretty predictable. Go pick up an actual, physical, book. Alone, it’s not going to do much. But it is full of information. It’s your job to put it in motion. The right kind of motion can change the world. The thing is, that change, be it the creation of a political movement, or the discovery of a new field of study, is oddly physical. Out terms that describe it (field, movement) are physical. They have weight, and inertia. But that is a property us us — human beings — evolved meat machines that interact with information using mechanisms evolved over millennia to deal with physical interactions. Information in motion isn’t physical.  But we aren’t equipped to deal with that intuitively. The machines that we have built to manipulate information are. And though they are stunningly effective in this, they do not share our physics-based biases about how to interpret the world.

And that may lead to some ugly surprises.

The laws of physics don’t apply in information space.

Actually, we rarely deal with information. We deal in belief, which is the subset of information that we have opinions about. We don’t care how flat a table is as long as it’s flat enough. But we care a lot about the design of the dining room table that we’re putting in our dining room.

In this belief space, we interpret information using a brain that is evolved based on the behavior of the physical world. That’s a possible reason that we have so many movement terms for describing belief behavior. It is unlikely that we could develop any other intuition, given the brief time that there has even been a concept of information.

There are also some benefits to treating belief as if it has physical properties. It affords group coordination. Beliefs that change gradually can be aligned easier (dimension reduction), allowing groups to reach consensus and compromise. This combined with our need for novelty creates somewhat manageable trajectories. Much of the way that we communicate depends on this linearity. Spoken and written language are linear constructs. Sequential structures like stories contain both information and the order of delivery. Only the sequence differs in music. The notes are the same.

But belief merely creates the illusion that information has qualities like weight. Although the neurons in our brain are slower than solid-state circuits, the electrochemical network that they build is capable of behaving in far less linear and inertial ways. Mental illness can be regarded as a state where the brain network is misbehaving. It can be underdamped, leading to racing thoughts or overdamped manifesting as depression. It can have runaway resonances, as with seizures. In these cases, the functioning of the brain no longer maps successfully to the external, physical environment. There seems to be an evolutionary sweet spot where enough intelligence to model and predict possible outcomes is useful. Functional intelligence appears to be a saddle point, surrounded by regions of instability and stasis.

Computers, which have not evolved under these rules treat information very differently. They are stable in their function as sets of transistors. But the instructions that those transistors execute, the network of action and information is not so well defined. For example, computers can access all information simultaneously. This is one of the qualities of computers that makes them so effective at search. But this capability leads to deeply connected systems with complicated implications that we tend to mask with interfaces that we, the users, find intuitive.

For example, it is possible to add the illusion of physical properties to information. In simulation we model masses, springs and dampers to create sophisticated representations of real-world behaviors. But simply because these systems mimic their real-world counterpart, doesn’t mean that they have these intrinsic propertiesConsider the simulation of a set of masses and springs below:


Depending on the solver (physics algorithm), damping, and stiffness, the system will behave in a believable way. Choose the Euler solver, turn down the damping and wind up the stiffness and the systems becomes unstable, or explodes:


The computer, of course, doesn’t know the difference. It can detect instability only if we program or train is specifically to do so. This is not just true in simulations for physical systems, but also training-based systems like neural networks (gradient descent) and genetic algorithms (mutation rate). In all these cases, systems can converge or explode based on the algorithms used and the hyperparameters that configure them.

This is the core of an implicit user interface problem. The more we make our intelligent machines so that they appear to be navigating in belief spaces, the more we will be able to work productively with them in intuitive ways. Rather than describing carefully what we want them to do (either with programming or massive training sets), we will be able to negotiate with them, and arrive at consensus or compromise. The fundamental problem is that this is a facade that does not reflect the underlying hardware. Because no design is perfect, and accidents are inevitable, I think that it is impossible to design systems that will not “explode” in reaction to unpredictable combinations if inputs.

But I think that we can reduce the risks. If we legislate an environment that requires a minimum level of diversity in these systems, from software design through training data,  to hardware platform, we can increase the likelihood that when a nonlinear accident happens it will only happen in one of several loosely coupled systems. The question of design is a socio-cultural one that consists of several elements:

  1. How will these systems communicate that guarantees loose coupling?
  2. What is the minimum number of systems that should be permitted, and under what contexts?
  3. What is the maximum “size” of a single system?

By addressing these issues early, in technical and legislative venues, we have an opportunity to create a resilient socio-technical ecosystem, where novel interactions of humans and machines can create new capabilities and opportunities, but also a resilient environment that is fundamentally resistant to catastrophe.

Phil 12.1.18

Trying to think about how intelligence at an individual level is different from intelligence at a population level. At an individual level, the question is how much computation to spend in the presence of imperfect / incomplete information. Does it make sense to be an unquestioning acolite? Follow fashion? Go your own way? These define a spectrum from lowest to highest amount of computation. A population that is evolving over time works with different demands. There is little or no sense of social coordination at a population’s genetic level (though there is coevolution). It seems to me that it is more a question of how to allocate traits in the population in such a way that optimises the duration that the genetic pattern that defines the population. The whole population needs a level of diversity. Clone populations (Aspens, etc) fail quickly. Gene exchange increases the likelihood of survival, even though it is costly. Similarly, explore/exploit and other social traits may be distributed unevenly so that there are always nomadic individuals that move through the larger ecosystem, producing a diaspora that a population can use to recover from a disaster that decimates the main population centers. Genes probably don’t “care” if these nomads are outcasts or willing explorers, but the explorers will be probably be better equipped to survive and create a new population, perpetuating the “explorer” genes at some level in the larger genome, at least within some time horizon where there is a genetic, adaptive “memory” of catastrophe.

Phil 11.30.18

7:00 – 3:00 ASRC NASA

  • Started Second Person, and learned about GURPS
  • Added a section on navigating belief places and spaces to the dissertation
  • It looks like I’m doing Computational Discourse Analysis, which has more to do with how the words in a discussion shift over time. Requested this chapter through ILL
  • Looking at Cornell Conversational Analysis Toolkit
  • More Grokking today so I don’t lose too much focus on understanding NNs
        • Important numpy rules:
          import numpy as np
          val = np.array([[0.6]])
          row = np.array([[-0.59, 0.75, -0.94,0.34 ]])
          col = np.array([[-0.59], [ 0.75], [-0.94], [ 0.34]])
          print ("{}, {}) = {}".format(val, row,, row)))
          print ("{}, {}) = {}".format(col, val,, val)))
          note the very different results:
[[0.6]], [[-0.59  0.75 -0.94  0.34]]) = [[-0.354  0.45  -0.564  0.204]]
[[-0.59], [ 0.75], [-0.94], [ 0.34]], [[0.6]]) = [[-0.354], [ 0.45 ], [-0.564], [ 0.204]]
        • So here’s the tricky bit that I don’t get yet
          # Multiply the values of the relu'd layer [[0, 0.517, 0, 0]] by the goal-output_layer [.61]
          weight_mat =, layer_1_to_output_delta) # e.g. [[0], [0.31], [0], [0]]
          weights_layer_1_to_output_col_array += alpha * weight_mat # add the scaled deltas in
          # Multiply the streetlights [[1], [0], [1] times the relu2deriv'd input_to_layer_1_delta [[0, 0.45, 0, 0]]
          weight_mat =, input_to_layer_1_delta) # e.g. [[0, 0.45, 0, 0], [0, 0, 0, 0], [0, 0.45, 0, 0]]
          weights_input_to_layer_1_array += alpha * weight_mat # add the scaled deltas in
        • It looks to me that as we work back from the output layer, we multiply our layer’s weights by the manipulated (relu in this case) for the last layer, and the derivative in the next layer forward?  I know that we are working out how to distribute the adjustment of the weights via something like the chain rule…