# Phil 12.11.18

7:00 – 4:30 ASRC PhD/NASA

Somehow, this needs to get into a discussion of the trustworthiness of maps

• I realized that we can hand-code these initial dungeons, learn a lot and make this a baseline part of the study. This means that we can compare human and machine data extraction for map making. My initial thoughts as to the sequence are:
• Step 1: Finish running the initial dungeon
• Step 2: researchers determine a set of common questions that would be appropriate for each room. Something like:
• Who is the character?
• Where is the character?
• What is the character doing?
• Why is the character doing this?
• Each answer should also include a section of the text that the reader thinks answers that question. Once this has been worked out on paper, a simple survey website (simpler) can be built that automates this process and supports data collection at moderate scales.
• Use answers to populate a “Trajectories” sheet in an xml file and build a map!
• Step 3: Partially automate the extraction to give users a generated survey that lets them select the most likely answer/text for the who/where/what/why questions. Generate more maps!
• Step 4: Full automation
• Added these thoughts to the analysis section of the google doc
• The 11th International Natural Language Generation Conference
• The INLG conference is the main international forum for the presentation and discussion of all aspects of Natural Language Generation (NLG), including data-to-text, concept-to-text, text-to-text and vision to-text approaches. Special topics of interest for the 2018 edition included:
• Generating Text with Affect, Style and Personality,
• Conversational Interfaces, Chatbots and NLG, and
• Data-driven NLG (including the E2E Generation Challenge)
• Back to grokking DNNs
• Still building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
• array formatting issues. Tricky
• I think I’m done enough to start debugging. Tomorrow
• Sprint review

# Phil 12.3.18

7:00 – 6:00 ASRC PhD

• Reading Analyzing Discourse and Text Complexity for Learning and Collaborating, basically to find methods that show important word frequency varying over time.
• Just in searching around, I also found a bunch of potentially useful resources. I’m emphasizing Python at the moment, because that’s the language I’m using at work right now.
• 5agado has a bunch of nice articles on Medium, linked to code. In particular, there’s Conversation Analyzer – An Introduction, with associated code.
• High frequency word entrainment in spoken dialogue
• Cognitive theories of dialogue hold that entrainment, the automatic alignment between dialogue partners at many levels of linguistic representation, is key to facilitating both production and comprehension in dialogue. In this paper we examine novel types of entrainment in two corpora—Switchboard and the Columbia Games corpus. We examine entrainment in use of high-frequency words (the most common words in the corpus), and its association with dialogue naturalness and flow, as well as with task success. Our results show that such entrainment is predictive of the perceived naturalness of dialogues and is significantly correlated with task success; in overall interaction flow, higher degrees of entrainment are associated with more overlaps and fewer interruptions.
• Looked some more at the Cornel Toolkit, but it seems focussed on other conversation attributes, with more lexical analysis coming later
• There is a github topic on discourse-analysis, of which John W. DuBoisrezonator project looks particularly interesting. Need to ask Wayne about how to reach out to someone like that.
• Recently I’ve been interested in what happens when participants in conversation build off each other, reusing words, structures and other linguistic resources just used by a prior speaker. In dialogic syntax, as I call it, parallelism of structure across utterances foregrounds similarities in function, but also brings out differences. Participants notice even the subtlest contrasts in stance–epistemic, affective, illocutionary, and so on–generated by the resonance between juxtaposed utterances. The theories of dialogic syntax and stance are closely related, and I’m currently working on exploring this linkage–one more example of figuring out how language works on multiple levels simultaneously, uniting structure, meaning, cognition, and social interaction.
• From Computational Propaganda: If You Make It Trend, You Make It True
• As an example, searching for “Vitamin K shot” (a routine health intervention for newborns) returns almost entirely anti-vaccine propaganda; anti-vaccine conspiracists write prolific quantities of content about that keyword, actively selling the myth that the shot is harmful, causes cancer, causes SIDS. Searches for the phrase are sparse because medical authorities are not producing counter-content or fighting the SEO battle in response.
• This is literally a use case where a mapping interface would show that something funny was going on in this belief space
• Yuanyuan’s proposal defense
• Surgical telementoring, trainee performing the operation is monitored remotely by expert.
• These are physical models!
• Manual coding
• Tracks communication intention, not lexical content
• Linear Mixed Model
• Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure. For example, students could be sampled from within classrooms, or patients from within doctors.
• DiCoT: a methodology for applying Distributed Cognition to the design of team working systems <– might be worth looking at for dungeon teams
• Note, a wireless headset mic is nice if there are remote participants and you need to move around the room
• GLIMMPSE power analysis
• Add list of publications to the dissertation?
• Good meeting with Wayne. Brought him up to speed on antibubbles.com. We discussed chiplay 2019 as a good next venue. We also went over what the iConference presentation might be. More as this develope, since it’s not all that clear. Certainly a larger emphasis on video. Also, it will be in the first batch of presentations.

# Phil 11.15.18

ASRC PhD, NASA 7:00 – 5:00

• Incorporate T’s changes – done!
• Topic Modeling with LSA, PLSA, LDA & lda2Vec
• More Grokking. Here’s the work for the day:
# based on https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter5%20-%20Generalizing%20Gradient%20Descent%20-%20Learning%20Multiple%20Weights%20at%20a%20Time.ipynb
import numpy as np
import matplotlib.pyplot as plt
import random

# methods ----------------------------------------------------------------
def neural_network(input, weights):
out = input @ weights
return out

def error_gt_epsilon(epsilon: float, error_array: np.array) -> bool:
for i in range(len(error_array)):
if error_array[i] > epsilon:
return True
return False

# setup vars --------------------------------------------------------------
#inputs
toes_array =  np.array([8.5, 9.5, 9.9, 9.0])
wlrec_array = np.array([0.65, 0.8, 0.8, 0.9])
nfans_array = np.array([1.2, 1.3, 0.5, 1.0])

#output goals
hurt_array  = np.array([0.2, 0.0, 0.0, 0.1])
wl_binary_array   = np.array([  1,   1,   0,   1])
sad_array   = np.array([0.3, 0.0, 0.1, 0.2])

weights_array = np.random.rand(3, 3) # initialise with random weights
'''
#initialized with fixed weights to compare with the book
weights_array = np.array([ [0.1, 0.1, -0.3], #hurt?
[0.1, 0.2,  0.0], #win?
'''
alpha = 0.01 # convergence scalar

# just use the first element from each array fro training (for now?)
input_array = np.array([toes_array[0], wlrec_array[0], nfans_array[0]])

line_mat = [] # for drawing plots
epsilon = 0.01 # how close do we have to be before stopping
#create and fill an error array that is big enough to enter the loop
error_array = np.empty(len(input_array))
error_array.fill(epsilon * 2)

# loop counters
iter = 0
max_iter = 100

while error_gt_epsilon(epsilon, error_array): # if any error in the array is big, keep going

#right now, the dot product of the (3x1) input vector and the (3x3) weight vector that returns a (3x1) vector
pred_array = neural_network(input_array, weights_array)

# how far away are we linearly (3x1)
delta_array = pred_array - goal_array
# error is distance squared to keep positive and weight the system to fixing bigger errors (3x1)
error_array = delta_array ** 2

# Compute how far and in what direction (3x1)
weights_d_array = delta_array * input_array

print("\niteration [{}]\nGoal = {}\nPred = {}\nError = {}\nDelta = {}\nWeight Deltas = {}\nWeights: \n{}".format(iter, goal_array, pred_array, error_array, delta_array, weights_d_array, weights_array))

#subtract the scaled (3x1) weight delta array from the weights array
weights_array -= (alpha * weights_d_array)

#build the data for the plot
line_mat.append(np.copy(error_array))
iter += 1
if iter > max_iter:
break

plt.plot(line_mat)
plt.title("error")
plt.legend(("toes", "win/loss", "fans"))
plt.show()
• Here’s a chart!
• Continuing Characterizing Online Public Discussions through Patterns of Participant Interactions

# Phil 8.30.18

7:00 – 5:00  ASRC MKT

• Target Blue Sky paper for iSchool/iConference 2019: The chairs are particularly looking for “Blue Sky Ideas” that are open-ended, possibly even “outrageous” or “wacky,” and present new problems, new application domains, or new methodologies that are likely to stimulate significant new research.
• I’m thinking that a paper that works through the ramifications of this diagram as it relates to people and machines. With humans that are slow responding with spongy, switched networks the flocking area is large. With a monolithic densely connected system it’s going to be a straight line from nomadic to stampede.
• Length: Up to 4 pages (excluding references)
• Submission deadline: October 1, 2018
• Final versions due: December 14, 2018
• First versions will be submitted using .pdf. Final versions must be submitted in .doc, .docx or La Tex.
• More good stuff on BBC Business Daily Trolling for Cash
• Anger and animosity is prevalent online, with some people even seeking it out. It’s present on social media of course as well as many online forums. But now outrage has spread to mainstream media outlets and even the advertising industry. So why is it so lucrative? Bonny Brooks, a writer and researcher at Newcastle University explains who is making money from outrage. Neuroscientist Dr Dean Burnett describes what happens to our brains when we see a comment designed to provoke us. And Curtis Silver, a tech writer for KnowTechie and ForbesTech, gives his thoughts on what we need to do to defend ourselves from this onslaught of outrage.
• Exposure to Opposing Views can Increase Political Polarization: Evidence from a Large-Scale Field Experiment on Social Media
• Christopher Bail (Scholar)
• There is mounting concern that social media sites contribute to political polarization by creating “echo chambers” that insulate people from opposing views about current events. We surveyed a large sample of Democrats and Republicans who visit Twitter at least three times each week about a range of social policy issues. One week later, we randomly assigned respondents to a treatment condition in which they were offered financial incentives to follow a Twitter bot for one month that exposed them to messages produced by elected officials, organizations, and other opinion leaders with opposing political ideologies. Respondents were re-surveyed at the end of the month to measure the effect of this treatment, and at regular intervals throughout the study period to monitor treatment compliance. We find that Republicans who followed a liberal Twitter bot became substantially more conservative post-treatment, and Democrats who followed a conservative Twitter bot became slightly more liberal post-treatment. These findings have important implications for the interdisciplinary literature on political polarization as well as the emerging field of computational social science.
• Setup gcloud tools on laptop – done
• Setup Tensorflow on laptop. Gave up un using CUDA 9.1, but got tf doing ‘hello, tensorflow’
• Marcom meeting – 2:00
• Get the concept of behaviors being a more scalable, dependable way of vetting information.
• Eg Watching the DISI of outrage as manifested in trolling
• “Uh. . . . not to be nitpicky,,,,,but…the past tense of drag is dragged, not drug.”: An overview of trolling strategies
• Dr Claire Hardaker (Scholar) (Blog)
• I primarily research aggression, deception, and manipulation in computer-mediated communication (CMC), including phenomena such as flaming, trolling, cyberbullying, and online grooming. I tend to take a forensic linguistic approach, based on a corpus linguistic methodology, but due to the multidisciplinary nature of my research, I also inevitably branch out into areas such as psychology, law, and computer science.
• This paper investigates the phenomenon known as trolling — the behaviour of being deliberately antagonistic or offensive via computer-mediated communication (CMC), typically for amusement’s sake. Having previously started to answer the question, what is trolling? (Hardaker 2010), this paper seeks to answer the next question, how is trolling carried out? To do this, I use software to extract 3,727 examples of user discussions and accusations of trolling from an eighty-six million word Usenet corpus. Initial findings suggest that trolling is perceived to broadly fall across a cline with covert strategies and overt strategies at each pole. I create a working taxonomy of perceived strategies that occur at different points along this cline, and conclude by refining my trolling definition.
• Citing papers
• FireAnt (Filter, Identify, Report, and Export Analysis Toolkit) is a freeware social media and data analysis toolkit with built-in visualization tools including time-series, geo-position (map), and network (graph) plotting.
• Fix marquee – done
• Export to ppt – done!
• include videos – done
• Center title in ppt:
• model considerations – done
• diversity injection – done
• Got the laptop running Python and Tensorflow. Had a stupid problem where I accidentally made a virtual environment and keras wouldn’t work. Removed, re-connected and restarted IntelliJ and everything is working!

# Phil 8.17.18

7:00 – 4:30 ASRC MKT

• Alex Steffen –  how economies must adapt to cope with climate effects, such as areas (Miami) and industries (Fossil Fuels) that are overvalued because of costs that are not being factored in.
• Going to start writing up (some) of my slides for SASO as a set of essays on Phlog to clarify my thinking
• Add cross-referencing to poster – done!
• More on Foundations of Temporal Text Networks – done!
• More on Graph Laplacians, since this is coming up a lot
• Need to spend some time looking into
• Ok, here we go…. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
• Github repo (belongs to lead author, Jiezhong Qiu,)
• Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network’s normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices’ context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks» Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
• So far, my basic insight is that matrix factorization is a form of (lossy) dimension reduction into an embedding space. Not sure yet how to use the factoring matrices as coordinates though. For example, a 2D matrix would be size L by M. For a 2D embedding, do you create an Lx2 and a 2xM factor matrices? Need to read more.
• …learning latent representations for networks, a.k.a., network embedding, has been extensively studied in order to automatically discover and map a network’s structural properties into a latent space.
• ZOMG!

# Phil 8.16.18

7:00 – 4:30 ASRC MKT

• R2D3 is an experiment in expressing statistical thinking with interactive design. Find us at @r2d3us
• Foundations of Temporal Text Networks
• Davide Vega (Scholar)
• Matteo Magnani (Scholar)
• Three fundamental elements to understand human information networks are the individuals (actors) in the network, the information they exchange, that is often observable online as text content (emails, social media posts, etc.), and the time when these exchanges happen. An extremely large amount of research has addressed some of these aspects either in isolation or as combinations of two of them. There are also more and more works studying systems where all three elements are present, but typically using ad hoc models and algorithms that cannot be easily transferred to other contexts. To address this heterogeneity, in this article we present a simple, expressive and extensible model for temporal text networks, that we claim can be used as a common ground across different types of networks and analysis tasks, and we show how simple procedures to produce views of the model allow the direct application of analysis methods already developed in other domains, from traditional data mining to multilayer network mining.
• Ok, I’ve been reading the paper and if I understand it correctly, it’s pretty straightforward and also clever. It relates a lot to the way that I do term document matrices, and then extends the concept to include time, agents, and implicitly anything you want to. To illustrate, here’s a picture of a tensor-as-matrix: The important thing to notice is that there are multiple dimensions represented in a square matrix. We have:
• agents
• documents
• terms
• steps
• This picture in particular is of an undirected adjacency matrix, but I think there are ways to handle in-degree and out-degree, though I think that’s probably better handled by having one matrix for indegree and one for out.
• Because it’s a square matrix, we can calculate the steps between any node that’s on the matrix, and the centrality, simply by squaring the matrix and keeping track of the steps until the eigenvector settles. We can also weight nodes by multiplying that node’s row and column by the scalar. That changes the centrality, but ot the connectivity. We can also drop out components (steps for example) to see how that changes the underlying network properties.
• If we want to see how time affects the development of the network, we can start with all the step nodes set to a zero weight, then add them in sequentially. This means, for example, that clustering could be performed on the nonzero nodes.
• Some or all of the elements could be factorized using NMF, resulting in smaller, faster matrices.
• Network embedding could be useful too. We get distances between nodes. And this looks really important: Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
• I think I can use any and all of the above methods on the network tensor I’m describing. This is very close to a mapping solution.
• The Shifting Discourse of the European Central Bank: Exploring Structural Space in Semantic Networks (cited by the above paper)
• Convenient access to vast and untapped collections of documents generated by organizations is a valuable resource for research. These documents (e.g., Press releases, reports, speech transcriptions, etc.) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of such large document corpora does not come without challenges. Two of these challenges are 1) the need for appropriate automated methods for text mining and analysis and 2) the redundant and predictable nature of the formalized discourse contained in these collections of texts. Our article proposes an approach that performs well in overcoming these particular challenges for the analysis of documents related to the recent financial crisis. Using semantic network analysis and a combination of structural measures, we provide an approach that proves valuable for a more comprehensive analysis of large and complex semantic networks of formal discourse, such as the one of the European Central Bank (ECB). We find that identifying structural roles in the semantic network using centrality measures jointly reveals important discursive shifts in the goals of the ECB which would not be discovered under traditional text analysis approaches.
• Comparative Document Analysis for Large Text Corpora
• This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains—scientific papers and news—demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.
• Social and semantic coevolution in knowledge networks
• Socio-semantic networks involve agents creating and processing information: communities of scientists, software developers, wiki contributors and webloggers are, among others, examples of such knowledge networks. We aim at demonstrating that the dynamics of these communities can be adequately described as the coevolution of a social and a socio-semantic network. More precisely, we will first introduce a theoretical framework based on a social network and a socio-semantic network, i.e. an epistemic network featuring agents, concepts and links between agents and between agents and concepts. Adopting a relevant empirical protocol, we will then describe the joint dynamics of social and socio-semantic structures, at both macroscopic and microscopic scales, emphasizing the remarkable stability of these macroscopic properties in spite of a vivid local, agent-based network dynamics.
• Tensorflow 2.0 feedback request
• Shortly, we will hold a series of public design reviews covering the planned changes. This process will clarify the features that will be part of TensorFlow 2.0, and allow the community to propose changes and voice concerns. Please join developers@tensorflow.org if you would like to see announcements of reviews and updates on process. We hope to gather user feedback on the planned changes once we release a preview version later this year.

# Phil 8.8.18

7:00 – 4:00 ASRC MKT

• Oh, look, a new Tensorflow (1.10). Time to break things. I like the BigTable integration though.
• Learning Meaning in Natural Language Processing — A Discussion
• Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread. Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end. Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:

1. a summary of the discussion that you will find below,
2. an interactive view to explore the trees of tweets, and
3. commented map to get an overview of the main points discussed:
• The Current Best of Universal Word Embeddings and Sentence Embeddings
• This post is thus a brief primer on the current state-of-the-art in Universal Word and Sentence Embeddings, detailing a few

• strong/fast baselines: FastText, Bag-of-Words
• state-of-the-art models: ELMo, Skip-Thoughts, Quick-Thoughts, InferSent, MILA/MSR’s General Purpose Sentence Representations & Google’s Universal Sentence Encoder.

If you want some background on what happened before 2017 😀, I recommend the nice post on word embeddings that Sebastian wrote last year and his intro posts.

• Treeverse is a browser extension for navigating burgeoning Twitter conversations.
• Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices
• With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.
• Continuing to work on SASO slides – close to done. Got a lot of adversarial herding FB examples from the House Permanent Committee on Intelligence. Need to add them to the slide. Sobering.
• And this looks like a FANTASTIC ride out of Trento: ridewithgps.com/routes/27552411
• Fixed the border menu so that it’s a toggle group

# Phil 7.19.18

7:00 – 3:00 ASRC MKT

• More on augmented athletics: Pinarello Nytro electric road bike review
• WhatsApp Research Awards for Social Science and Misinformation (\$50k – Applications are due by August 12, 2018, 11:59pm PST)
• Setting up meeting with Don for 3:30 Tuesday the 24th. He also gave me some nice leads on potential people for Dance my PhD:
• Dr. Linda Dusman
• Linda Dusman’s compositions and sonic art explore the richness of contemporary life, from the personal to the political. Her work has been awarded by the International Alliance for Women in Music, Meet the Composer, the Swiss Women’s Music Forum, the American Composers Forum, the International Electroacoustic Music Festival of Sao Paulo, Brazil, the Ucross Foundation, and the State of Maryland in 2004, 2006, and 2011 (in both the Music: Composition and the Visual Arts: Media categories). In 2009 she was honored as a Mid- Atlantic Arts Foundation Fellow for a residency at the Virginia Center for the Creative Arts. She was invited to serve as composer in residence at the New England Conservatory’s Summer Institute for Contemporary Piano in 2003. In the fall of 2006 Dr. Dusman was a Visiting Professor at the Conservatorio di musica “G. Nicolini” in Piacenza, Italy, and while there also lectured at the Conservatorio di musica “G. Verdi” in Milano. She recently received a Maryland Innovation Initiative grant for her development of Octava, a real-time program note system (octavaonline.com).
• Doug Hamby
• A choreographer who specializes in works created in collaboration with dancers, composers, visual artists and engineers. Before coming to UMBC he performed in several New York dance companies including the Martha Graham Dance Company and Doug Hamby Dance. He is the co-artistic director of Baltimore Dance Project, a professional dance company in residence at UMBC. Hamby’s work has been presented in New York City at Lincoln Center Out-of-Doors, Riverside Dance Festival, New York International Fringe Festival and in Brooklyn’s Prospect Park. His work has also been seen at Fringe Festivals in Philadelphia, Edinburgh, Scotland and Vancouver, British Columbia, as well as in Alaska. He has received choreography awards from the National Endowment for the Arts, Maryland State Arts Council, New York State Council for the Arts, Arts Council of Montgomery County, and the Baltimore Mayor’s Advisory Committee on Arts and Culture. He has appeared on national television as a giant slice of American Cheese.
• Sent out a note with dates and agenda to the committee for the PhD review thing. Thom can open up August 6th
• Continuing extraction of seed terms for the sentence generation. And it looks like my tasking for next sprint will be to put together a nice framework for plugging in predictive patterns systems like LSTM and multi-layer perceptrons.
• This seems to be working:
agentRelationships GreenFlockSh_1
sampleData 0.0
cell cell_[4, 6]
influences AGENT
influence GreenFlockSh_0 val =  0.8778825396520958
influence GreenFlockSh_2 val =  0.8859173062045552
influence GreenFlockSh_3 val =  0.9390368569108515
influence GreenFlockSh_4 val =  0.9774328763377834
influences SOURCE
influence UL_point val =  0.032906293611796644
• Sprint planning
• VP-613: Develop general TensorFlow/Keras NN format
• LSTM
• MLP
• CNN
• VP-616: SASO Preparation
• Slides
• Poster
• Demo

# Phil 7.1.18

On vacation, but oddly enough, I’m back on my morning schedule, so here I am in Bormio, Italy at 4:30 am.

I forgot my HDMI adaptor for the laptop. Need to order one and have it delivered to Zurich – Hmmm. Can’t seem to get it delivered from Amazon to a hotel. Will have to buy in Zurich

Need to add Gamerfate to the lit review timeline to show where I started to get interested in the problem – tried it but didn’t like it. I’d have to redo the timeline and I’m not sure I have the excel file

Add vacation pictures to slides – done!

Some random thoughts

• When using the belief space example of the table, note that if we sum up all the discussions about tables, we would be able to build a pretty god map of what matters to people with regards to tables
• Manifold learning is what intelligent systems do as a way of determining relationships between things (see curse of dimensionality). As groups of individuals, we need to coordinate our manifold learning activities so that we can us the power of group cognition. When looking at how manifold learning schemes like t-sne and particularly embedding systems such as word2vec create their own unique embeddings, it becomes clear that our machines are not yet engaged in group cognition, except in the simplest way of re-using trained networks and copied hyperparameters. This is very prone to stampedes
• In conversation at dinner, Mike M mentioned that he’d like a language app that is able to indicate the centrality of a term an order that list so that it’s possible to learn a language in a “prioritized” way that can be context-dependent. I think that LMN with a few tweaks could do that.

Continuing the Evolution of Cooperation. A thing that strikes me is that once a TIT FOR TAT successfully takes over, then it becomes computationally easier to ALWAYS COOPERATE. That could evolve to become dominant and be completely vulnerable to ALWAYS DEFECT

# Phil 6.18.18

ASRC MKT 7:00 – 8:00

• Nice ride on Saturday on Skyline drive
• Using Social Network Information in Bayesian Truth Discovery
• We investigate the problem of truth discovery based on opinions from multiple agents who may be unreliable or biased. We consider the case where agents’ reliabilities or biases are correlated if they belong to the same community, which defines a group of agents with similar opinions regarding a particular event. An agent can belong to different communities for different events, and these communities are unknown a priori. We incorporate knowledge of the agents’ social network in our truth discovery framework and develop Laplace variational inference methods to estimate agents’ reliabilities, communities, and the event states. We also develop a stochastic variational inference method to scale our model to large social networks. Simulations and experiments on real data suggest that when observations are sparse, our proposed methods perform better than several other inference methods, including majority voting, the popular Bayesian Classifier Combination (BCC) method, and the Community BCC method.
• Scale-free correlations in starling flocks
• From bird flocks to fish schools, animal groups often seem to react to environmental perturbations as if of one mind. Most studies in collective animal behavior have aimed to understand how a globally ordered state may emerge from simple behavioral rules. Less effort has been devoted to understanding the origin of collective response, namely the way the group as a whole reacts to its environment. Yet, in the presence of strong predatory pressure on the group, collective response may yield a significant adaptive advantage. Here we suggest that collective response in animal groups may be achieved through scale-free behavioral correlations. By reconstructing the 3D position and velocity of individual birds in large flocks of starlings, we measured to what extent the velocity fluctuations of different birds are correlated to each other. We found that the range of such spatial correlation does not have a constant value, but it scales with the linear size of the flock. This result indicates that behavioral correlations are scale free: The change in the behavioral state of one animal affects and is affected by that of all other animals in the group, no matter how large the group is. Scale-free correlations provide each animal with an effective perception range much larger than the direct inter-individual interaction range, thus enhancing global response to perturbations. Our results suggest that flocks behave as critical systems, poised to respond maximally to environmental perturbations.
• Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study
• By reconstructing the three-dimensional positions of individual birds in airborne flocks of a few thousand members, we show that the interaction does not depend on the metric distance, as most current models and theories assume, but rather on the topological distance. In fact, we discovered that each bird interacts on average with a fixed number of neighbors (six to seven), rather than with all neighbors within a fixed metric distance. We argue that a topological interaction is indispensable to maintain a flock’s cohesion against the large density changes caused by external perturbations, typically predation. …
• Thread on the failure to replicate the Stanford Prison Experiment by Alex Haslam (scholar) (home page). Paper coming soon
• The Stanford Prison Experience—as it is presented in textbooks—presents human nature as naturally conforming to oppressive systems. This is a lesson that extends well beyond prison systems and the field criminology—but it’s wrong. Alex and his colleagues (especially Steve Reicher) have been arguing for years that conformity often emerges when leaders cultivate a sense of shared identity. This is an active, engaged process—very different from automatic and mindless conformity.
• Started Irrational Exuberance, by Robert Shiller
• Send note to Don, Aaron and Shimei
• Read Ego-motion in Self-Aware Deep Learning on Medium. It’s about reflective learning of navigation in physical spaces, though I wonder if there is an equivalent process in belief spaces. Looked through scholar and
• Slide prep and Fika walkthrough
• Went well. Ravi suggested adding another slide that discusses the methods in detail, while Sy pretty much demanded that I get rid of “Questions” and put the title of the paper in its place
• When adding the detail for Ravi, I discovered that the simulator and map reconstruction did not handle single, high dimensional agents well, so I spent a few hours fixing bugs to get the screen captures to build the slides.

# Phil 6.11.18

7:00 – 6:00 ASRC MKT

• More Bit by Bit. Reading the section on ethics. It strikes me that simulation could be a way to cut the PII Gordion Knot in some conditions. If a simulation can be developed that generates statistically similar data to the desired population, then the simulated data and the simulation code can be released to the research community. The dataset becomes infinite and adjustable, while the PII data can be held back. Machine learning systems trained on the simulated data can then be evaluated on the confidential data. The differences in the classification by the ML systems between real data and simulated data can also provide insight into the gaps in fidelity of the simulated data, which would provide an ongoing improvement to the simulation, which could in turn be released to the community.
• Continuing with the cleanup of the SASO paper. Mostly done but some trimming of redundent bits and the “Ose Simple Trick” paragraph.
• Monday prices:
• Fika
• Come up with 3-5 options for a finished state for the dissertation. It probably ranges from “pure theory” through “instance based on theory” to “a map generated by the system that matches the theory”
• Once the SASO paper is in, set up a “wine and cheese” get together for the committee to go over the current work and discuss changes to the next phase
• Start on a new IRB. Emphasize how everyone will have the same system to interact with, though their interactions will be different. Emphasize that the system has to allow open interaction to provide the best chance to realize theoretical results.
• Will and I are on the hook for a Fika about LaTex

# Phil 6.7.18

7:00 – 4:30 ASRC MKT

• Che Dorval
• The SLT meeting went well, apparently. Need to determine next steps
• Back to Bit by Bit. Reading about mass collaboration. eBird looks very interesting. All kinds of social systems involved here.
• Research
• Deep Multi-Species Embedding
• Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project \textit{eBird}, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.
• Start fixing This one Simple Trick
• Highlighted all the specified changes. There are a lot of them!
• Started working on figure 2, and realized (after about an hour of Illustrator work) that the figure is correct. I need to verify each comment before fixing it!
• Researched NN anomaly detection. That work seems to have had its heyday in the ’90s, with more conventional (but computationally intensive) methods being preferred these days.
• I also thought that Dr. Li’s model had a time-orthogonal component for prediction, but I don’t think that’s true. THe NN is finding the frequency and bounds on its own.
• Wrote up a paragraph expressing my concerns and sent to Aaron.

# Phil 6.5.18

7:00 – 6:00 ASRC

• Read the SASO comments. Most are pretty good. My reviewer #2 was #3 this time. There is some rework that’s needed. Most of the comments are good, even the angry ones from #3, which are mostly “where is particle swarm optimization???”
• Got an example quad chart from Helena that I’m going to base mine on
• Neat thing from Brian F:
• Lots. Of. White. Paper.

# Phil 6.1.18

7:00 – 6:00 ASRC MKT

• Bot stampede reaction to “evolution” in a thread about UNIX. This is in this case posting scentiment against the wrong thing. There are layers here though. It can also be advertising. Sort of the dark side of diversity injection.
• Seems like an explore/exploit morning
• Autism on “The Leap”: Neurotypical and Neurodivergent (Neurodiversity)
• From a BBC Business Daily show on Elon Musk
• Thomas Astebro (Decision Science): The return to independent invention: evidence of unrealistic optimism, risk seeking or skewness loving?
• Examining a sample of 1,091 inventions I investigate the magnitude and distribution of the pre‐tax internal rate of return (IRR) to inventive activity. The average IRR on a portfolio investment in these inventions is 11.4%. This is higher than the risk‐free rate but lower than the long‐run return on high‐risk securities and the long‐run return on early‐stage venture capital funds. The portfolio IRR is significantly higher, for some ex anteidentifiable classes of inventions. The distribution of return is skew: only between 7‐9% reach the market. Of the 75 inventions that did, six realised returns above 1400%, 60% obtained negative returns and the median was negative.
• Myth of first mover advantage
• Conventional wisdom would have us believe that it is always beneficial to be first – first in, first to market, first in class. The popular business literature is full of support for being first and legions of would-be business leaders, steeped in the Jack Welch school of business strategy, will argue this to be the case. The advantages accorded to those who are first to market defines the concept of First Mover Advantage (FMA). We outline why this is not the case, and in fact, that there are conditions of applicability in order for FMA to hold (and these conditions often do not hold). We also show that while there can be advantages to being first, from an economic perspective, the costs generally exceed the benefits, and the full economics of FMA are usually a losing proposition. Finally, we show that increasingly, we live in a world where FMA is eclipsed by innovation and format change, rendering the FMA concept obsolete (i.e. strategic obsolescence).
• More Bit by Bit
• Investigating the Effects of Google’s Search Engine Result Page in Evaluating the Credibility of Online News Sources
• Recent research has suggested that young users are not particularly skilled in assessing the credibility of online content. A follow up study comparing students to fact checkers noticed that students spend too much time on the page itself, while fact checkers performed “lateral reading”, searching other sources. We have taken this line of research one step further and designed a study in which participants were instructed to do lateral reading for credibility assessment by inspecting Google’s search engine result page (SERP) of unfamiliar news sources. In this paper, we summarize findings from interviews with 30 participants. A component of the SERP noticed regularly by the participants is the so-called Knowledge Panel, which provides contextual information about the news source being searched. While this is expected, there are other parts of the SERP that participants use to assess the credibility of the source, for example, the freshness of top stories, the panel of recent tweets, or a verified Twitter account. Given the importance attached to the presence of the Knowledge Panel, we discuss how variability in its content affected participants’ opinions. Additionally, we perform data collection of the SERP page for a large number of online news sources and compare them. Our results indicate that there are widespread inconsistencies in the coverage and quality of information included in Knowledge Panels.
• White paper
• Note that belief maps are cultural artifacts, so comparing someone from one belief space to others in a shared physical belief environment can be roughly equivalent to taking the dot product of the belief space vectors that you need to compare. This could produce a global “alignment map” that can suggest how aligned, opposed, or indifferent a population might be with respect to an intervention, ranging from medical (Ebola teams) to military (special forces operations).
• Similar maps related to wealth in Rwanda based on phone metadata: Blumenstock, Joshua E., Gabriel Cadamuro, and Robert On. 2015. “Predicting Poverty and Wealth from Mobile Phone Metadata.” Science350 (6264):1073–6. https://doi.org/10.1126/science.aac4420
• Added a section about how mapping belief maps would afford prediction about local belief, since overall state, orientation and velocity could be found for some individuals who are geolocated to that area and then extrapolated over the region.

# Phil 5.31.18

7:00 – ASRC MKT

• Via BBC Business Daily, found this interesting post on diversity injection through lunch table size:
• KQED is playing America Abroad – today on russian disinfo ops:
• Sowing Chaos: Russia’s Disinformation Wars
• Revelations of Russian meddling in the 2016 US presidential election were a shock to Americans. But it wasn’t quite as surprising to people in former Soviet states and the EU. For years they’ve been exposed to Russian disinformation and slanted state media; before that Soviet propaganda filtered into the mainstream. We don’t know how effective Russian information warfare was in swaying the US election. But we do know these tactics have roots going back decades and will most likely be used for years to come. This hour, we’ll hear stories of Russian disinformation and attempts to sow chaos in Europe and the United States. We’ll learn how Russia uses its state-run media to give a platform to conspiracy theorists and how it invites viewers to doubt the accuracy of other news outlets. And we’ll look at the evolution of internet trolling from individuals to large troll farms. And — finally — what can be done to counter all this?
• Some interesting papers on the “Naming Game“, a form of coordination where individuals have to agree on a name for something. This means that there is some kind of dimension reduction involved from all the naming possibilities to the agreed-on name.
• The Grounded Colour Naming Game
• Colour naming games are idealised communicative interactions within a population of artificial agents in which a speaker uses a single colour term to draw the attention of a hearer to a particular object in a shared context. Through a series of such games, a colour lexicon can be developed that is sufficiently shared to allow for successful communication, even when the agents start out without any predefined categories. In previous models of colour naming games, the shared context was typically artificially generated from a set of colour stimuli and both agents in the interaction perceive this environment in an identical way. In this paper, we investigate the dynamics of the colour naming game in a robotic setup in which humanoid robots perceive a set of colourful objects from their own perspective. We compare the resulting colour ontologies to those found in human languages and show how these ontologies reflect the environment in which they were developed.
• Group-size Regulation in Self-Organised Aggregation through the Naming Game
• In this paper, we study the interaction effect between the naming game and one of the simplest, yet most important collective behaviour studied in swarm robotics: self-organised aggregation. This collective behaviour can be seen as the building blocks for many others, as it is required in order to gather robots, unable to sense their global position, at a single location. Achieving this collective behaviour is particularly challenging, especially in environments without landmarks. Here, we augment a classical aggregation algorithm with a naming game model. Experiments reveal that this combination extends the capabilities of the naming game as well as of aggregation: It allows the emergence of more than one word, and allows aggregation to form a controllable number of groups. These results are very promising in the context of collective exploration, as it allows robots to divide the environment in different portions and at the same time give a name to each portion, which can be used for more advanced subsequent collective behaviours.
• More Bit by Bit. Could use some worked examples. Also a login so I’m not nagged to buy a book I own.
• Descriptive and injunctive norms – The transsituational influence of social norms.
• Three studies examined the behavioral implications of a conceptual distinction between 2 types of social norms: descriptive norms, which specify what is typically done in a given setting, and injunctive norms, which specify what is typically approved in society. Using the social norm against littering, injunctive norm salience procedures were more robust in their behavioral impact across situations than were descriptive norm salience procedures. Focusing Ss on the injunctive norm suppressed littering regardless of whether the environment was clean or littered (Study 1) and regardless of whether the environment in which Ss could litter was the same as or different from that in which the norm was evoked (Studies 2 and 3). The impact of focusing Ss on the descriptive norm was much less general. Conceptual implications for a focus theory of normative conduct are discussed along with practical implications for increasing socially desirable behavior.
• Construct validity centers around the match between the data and the theoretical constructs. As discussed in chapter 2, constructs are abstract concepts that social scientists reason about. Unfortunately, these abstract concepts don’t always have clear definitions and measurements.
• Simulation is a way of implementing theoretical constructs that are measurable and testable.
• Hyperparameter Optimization with Keras
• Recognizing images from parts Kaggle winner
• White paper
• Storyboard meeting
• The advanced analytics division(?) needs a modeling and simulation department that builds models that feed ML systems.
• Meeting with Steve Specht – adding geospatial to white paper