Category Archives: Mapping

Phil 2.12.20

7:00 – 8:00pm ASRC PhD, GOES

  • Create figures that show an agent version of the dungeon
  • Replicate the methods and detailed methods of the cartography slides
  • Text for each group by room can be compared by the rank difference between them and the overall. Put that in a spreadsheet, plot and maybe determine the DTW value?
    • Add the sim version of the dungeon and the rank comparison to the dissertation
  • Put all ethics on one slide – done
  • Swapped out power supply, but now the box won’t start. Dropped off to get repaired
  • Corporate happy hour

Phil 1.17.20

An ant colony has memories that its individual members don’t have

  • Like a brain, an ant colony operates without central control. Each is a set of interacting individuals, either neurons or ants, using simple chemical interactions that in the aggregate generate their behaviour. People use their brains to remember. Can ant colonies do that? 

7:00 – ASRC

  •  Dissertation
    • More edits
    • Changed all the overviews so that they also reference the section by name. It reads better now, I think
    • Meeting with Thom
  • GPT-2 Agents
  • GSAW Slide deck

Phil 12.26.19

ASRC PhD 7:00 – 4:00

  • Dissertation
    • Limitations
  • GPT-2 agents setup – set up the project, but in the process of getting the huggingface transformers, I wound up setting up that project as well
    • Following directions for
      • pip install transformers
      • git clone https://github.com/huggingface/transformers
        • cd transformers
        • pip install .
      • pip install -e .[testing]
        • make test – oops. My GNU Make wasn’t on the path – fixed it
        • running tests
          • Some passed, some failed. Errors like: tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model Fatal Python error: Aborted
          • Sure is keeping the processor busy… Like bringing the machine to its knees busy….
          • Finished – 14 failed, 10 passed, 196 skipped, 20 warnings in 1925.12s (0:32:05)
  • Fixed the coffee maker
  • Dealt with stupid credit card nonsense

Phil 12.7.19

You can now have an AI DM. AI Dungeon 2. Here’s an article about it: You can do nearly anything you want in this incredible AI-powered game. It looks like a GPT-2 model trained with chooseyouradventure. Here’s the “how we did it”. Wow

The Toxins We Carry (Whitney Phillips)

  • My proposal is that we begin thinking ecologically, an approach I explore with Ryan Milner, a communication scholar, in our forthcoming book You Are Here: A Field Guide for Navigating Polluted Information. From an ecological perspective, Wardle’s term “information pollution” makes perfect sense. Building on Wardle’s definition, we use the inverted form “polluted information” to emphasize the state of being polluted and to underscore connections between online and offline toxicity. One of the most important of these connections is just how little motives matter to outcomes. Online and off, pollution still spreads, and still has consequences downstream, whether it’s introduced to the environment willfully, carelessly, or as the result of sincere efforts to help. The impact of industrial-scale polluters online—the bigots, abusers, and chaos agents, along with the social platforms that enable them—should not be minimized. But less obvious suspects can do just as much damage. The truth is one of them.
  • Taking an ecological approach to misinformation

Phil 10.22.19

7:00 – 4:00 ASRC

  • Dissertation – starting the maps section
  • Need to finish the financial OODA loop section
  • Spending the day at a Navy-sponsored miniconference on AI, ethics and the military (no wifi at Annapolis, so I’ll put up notes later). This was an odd mix of higher-level execs in suits, retirees, and midshipmen, with a few technical folks sprinkled in. It is clear that for these people, the technology(?) is viewed as AI/ml. The idea that AI is a thing that we don’t do yet does not emerge at this level. Rather, AI is being implemented using machine learning, and in particular deep learning.

Phil 9.1.19

Still tired from yesterday’s adventure:

Strava

Thinking about @scottbot’s thread on TalkToATransformner.com. In a way, the GPT-2 and other models are an arrangement of all the text that the system has incorporated. I think I’s like to go through the dictionary, one word at a time, and use that word as a seed for building a low-dimensional threaded term model that we can visually examine, in the same way that I used text from Tymora to show the “belief space” around each room. For example, here are some responses to silence:

  • Lecture, Thursday 17 June 2014, 10:30am
    In the course of our study we will observe the behaviour of animals at two distinct locations – one to an animal’s immediate (neonatal) environment, the other to two distinct environments in its home territory – the home environment in which the animal lives. We will also observe the behaviour of animals in an adjacent environment to each animal we observe in one of these new environments; one for the social or ‘prey’ species and the other for the ‘host’ species.
  • In addition to a few other notable games, like the two I mentioned earlier, the team was able to pick up several top players to form one of the premier teams in the region of Korea. At this time, the team was ranked 11th in the league of legends scene.
  • Sensing my heart beating at a frantic pace, I raised my head to look at the place where the girl’s mouth had been. A large, blue, round hole had appeared there.
  •  キャラクシー 土方の親愛神・レイラン Aamir AB Black Bird Bomb AB Bluebird Blues AB Bomb AB Chuck AB Dashing Wingman AB Green Bird Hal AB Hal AB Matilda AB Red AB Red Bird Red AB The Blues AB White Bird Matilda AB Wingman AB Yellow Bird Chuck ABE Archer Blues ABE Capt’n Bomb ABE Cleric Matilda ABE Fencer Red ABE Guard Captain Pig ABE Knight Red ABE Mage Chuck ABE Pirate Bomb ABE Priest Matilda ABE Royal Guard Pig ABE Trickster Blues ABE Wizard Chuck Abiding Sniper Dragonbound, Myr Absolute Zero Hrungnir Abyssal CyberDragon, Sadalmelik Abyssal 
  • is about what you do, not what you say. And the answer is: it’s not something you should say. In fact, I want you to make an effort to talk to others.
  • I was so excited for my wedding day, and I had to be there. My mom had an idea to hold my wedding in a different room and I made it happen. I had my brother, sister, bridesmaid, and a couple of guys wait in the front. It was so beautiful and relaxing. Thank you, thank you, THANK YOU!

My intuition is that these word sequences can be treated like sequential threads, since they are sentences/paragraphs, and have an implicit sense of time. And like like the room conversations in the temple, that there will be denser and sparser areas on the network/landscape that we might be able to map. If we can build these small pieces, we can then experiment with ways to assemble the larger quilt.

Just to reinforce this idea, I thought about using more specific terms or phrases. Here’s what we get with bird sanctuary. I’d a much more constrained landscape:

  • is open only 24 hours a day and is open on the following holidays:
  • Tower of the Winds – Cave of Wonders – Rune Isle
  • The idea of an animal sanctuary for a big-cat sanctuary is one of the most amazing things that a lot of people will ever come up with that they can’t see in the current environment of wildlife protection. 
  • an annual four-day event that promotes conservation efforts.
  • (2) Pescado Bay Nature Preserve (2) Pacific Coast Aquarium (11) Pacific Grove (1) Pacifica Harbor (1) Philadelphia Zoo (1) Philadelphia Museum of Art (1) Philadelphia World’s Fair (2) Piebald Beach (1) Pinnacle Beach (1) Placid Bay (1) Point Park and Wildlife Management area

Based on David Massad’s tweet, I think the phrases to use are news headlines, that can be compared to some sort of ground truth contained in the story.

 

Phil 12.20.18

7:00 – 4:00 ASRC NASA/PhD

  • Goal-directed navigation based on path integration and decoding of grid cells in an artificial neural network
    • As neuroscience gradually uncovers how the brain represents and computes with high-level spatial information, the endeavor of constructing biologically-inspired robot controllers using these spatial representations has become viable. Grid cells are particularly interesting in this regard, as they are thought to provide a general coordinate system of space. Artificial neural network models of grid cells show the ability to perform path integration, but important for a robot is also the ability to calculate the direction from the current location, as indicated by the path integrator, to a remembered goal. This paper presents a neural system that integrates networks of path integrating grid cells with a grid cell decoding mechanism. The decoding mechanism detects differences between multi-scale grid cell representations of the present location and the goal, in order to calculate a goal-direction signal for the robot. The model successfully guides a simulated agent to its goal, showing promise for implementing the system on a real robot in the future.
  • Path integration and the neural basis of the ‘cognitive map’
    • Accumulating evidence indicates that the foundation of mammalian spatial orientation and learning is based on an internal network that can keep track of relative position and orientation (from an arbitrary starting point) on the basis of integration of self-motion cues derived from locomotion, vestibular activation and optic flow (path integration).
    • Place cells in the hippocampal formation exhibit elevated activity at discrete spots in a given environment, and this spatial representation is determined primarily on the basis of which cells were active at the starting point and how far and in what direction the animal has moved since then. Environmental features become associatively bound to this intrinsic spatial framework and can serve to correct for cumulative error in the path integration process.
    • Theoretical studies suggested that a path integration system could involve cooperative interactions (attractor dynamics) among a population of place coding neurons, the synaptic coupling of which defines a two-dimensional attractor map. These cells would communicate with an additional group of neurons, the activity of which depends on the conjunction of movement speed, location and orientation (head direction) information, allowing position on the attractor map to be updated by self-motion information.
    • The attractor map hypothesis contains an inherent boundary problem: what happens when the animal’s movements carry it beyond the boundary of the map? One solution to this problem is to make the boundaries of the map periodic by coupling neurons at each edge to those on the opposite edge, resulting in a toroidal synaptic matrix. This solution predicts that, in a sufficiently large space, place cells would exhibit a regularly spaced grid of place fields, something that has never been observed in the hippocampus proper.
    • Recent discoveries in layer II of the medial entorhinal cortex (MEC), the main source of hippocampal afferents, indicate that these cells do have regularly spaced place fields (grid cells). In addition, cells in the deeper layers of this structure exhibit grid fields that are conjunctive for head orientation and movement speed. Pure head direction neurons are also found there. Therefore, all of the components of previous theoretical models for path integration appear in the MEC, suggesting that this network is the core of the path integration system.
    • The scale of MEC spatial firing grids increases systematically from the dorsal to the ventral poles of this structure, in much the same way as is observed for hippocampal place cells, and we show how non-periodic hippocampal place fields could arise from the combination of inputs from entorhinal grid cells, if the inputs cover a range of spatial scales rather than a single scale. This phenomenon, in the spatial domain, is analogous to the low frequency ‘beats’ heard when two pure tones of slightly different frequencies are combined.
    • The problem of how a two-dimensional synaptic matrix with periodic boundary conditions, postulated to underlie grid cell behaviour, could be self-organized in early development is addressed. Based on principles derived from Alan Turing’s theory of spontaneous symmetry breaking in chemical systems, we suggest that topographically organized, grid-like patterns of neural activity might be present in the immature cortex, and that these activity patterns guide the development of the proposed periodic synaptic matrix through a mechanism involving competitive synaptic plasticity.
  • Wormholes in virtual space: From cognitive maps to cognitive graphs
    • Cognitive maps are thought to have a metric Euclidean geometry.
    • Participants learned a non-Euclidean virtual environment with two ‘wormholes’.
    • Shortcuts reveal that spatial knowledge violates metric geometry.
    • Participants were completely unaware of the wormholes and geometric inconsistencies.
    • Results contradict a metric Euclidean map, but support a labelled ‘cognitive graph’.
  • Back to TimeSeriesML
    • Encryption class – done
      • Create a key and save it to file
      • Read a key in from a file into global variable
      • Encrypt a string if there is a key
      • Decrypt a string if there is a key
    • Postgres class – reading part is done
      • Open a global connection and cursor based on a config string
      • Run queries and return success
      • Fetch results of queries as lists of JSON objects

Phil 12.11.18

7:00 – 4:30 ASRC PhD/NASA

mercator_projection

Somehow, this needs to get into a discussion of the trustworthiness of maps

  • I realized that we can hand-code these initial dungeons, learn a lot and make this a baseline part of the study. This means that we can compare human and machine data extraction for map making. My initial thoughts as to the sequence are:
    • Step 1: Finish running the initial dungeon
    • Step 2: researchers determine a set of common questions that would be appropriate for each room. Something like:
      • Who is the character?
      • Where is the character?
      • What is the character doing?
      • Why is the character doing this?
    • Each answer should also include a section of the text that the reader thinks answers that question. Once this has been worked out on paper, a simple survey website (simpler) can be built that automates this process and supports data collection at moderate scales.
    • Use answers to populate a “Trajectories” sheet in an xml file and build a map!
    • Step 3: Partially automate the extraction to give users a generated survey that lets them select the most likely answer/text for the who/where/what/why questions. Generate more maps!
    • Step 4: Full automation
  • Added these thoughts to the analysis section of the google doc
  • The 11th International Natural Language Generation Conference
    • The INLG conference is the main international forum for the presentation and discussion of all aspects of Natural Language Generation (NLG), including data-to-text, concept-to-text, text-to-text and vision to-text approaches. Special topics of interest for the 2018 edition included:
      • Generating Text with Affect, Style and Personality,
      • Conversational Interfaces, Chatbots and NLG, and
      • Data-driven NLG (including the E2E Generation Challenge)
  • Back to grokking DNNs
    • Still building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
    • array formatting issues. Tricky
    • I think I’m done enough to start debugging. Tomorrow
  • Sprint review

Phil 12.3.18

7:00 – 6:00 ASRC PhD

  • Reading Analyzing Discourse and Text Complexity for Learning and Collaborating, basically to find methods that show important word frequency varying over time.
  • Just in searching around, I also found a bunch of potentially useful resources. I’m emphasizing Python at the moment, because that’s the language I’m using at work right now.
    • 5agado has a bunch of nice articles on Medium, linked to code. In particular, there’s Conversation Analyzer – An Introduction, with associated code.
    • High frequency word entrainment in spoken dialogue
      • Cognitive theories of dialogue hold that entrainment, the automatic alignment between dialogue partners at many levels of linguistic representation, is key to facilitating both production and comprehension in dialogue. In this paper we examine novel types of entrainment in two corpora—Switchboard and the Columbia Games corpus. We examine entrainment in use of high-frequency words (the most common words in the corpus), and its association with dialogue naturalness and flow, as well as with task success. Our results show that such entrainment is predictive of the perceived naturalness of dialogues and is significantly correlated with task success; in overall interaction flow, higher degrees of entrainment are associated with more overlaps and fewer interruptions.
    • Looked some more at the Cornel Toolkit, but it seems focussed on other conversation attributes, with more lexical analysis coming later
    • There is a github topic on discourse-analysis, of which John W. DuBoisrezonator project looks particularly interesting. Need to ask Wayne about how to reach out to someone like that.
      • Recently I’ve been interested in what happens when participants in conversation build off each other, reusing words, structures and other linguistic resources just used by a prior speaker. In dialogic syntax, as I call it, parallelism of structure across utterances foregrounds similarities in function, but also brings out differences. Participants notice even the subtlest contrasts in stance–epistemic, affective, illocutionary, and so on–generated by the resonance between juxtaposed utterances. The theories of dialogic syntax and stance are closely related, and I’m currently working on exploring this linkage–one more example of figuring out how language works on multiple levels simultaneously, uniting structure, meaning, cognition, and social interaction.
  • From Computational Propaganda: If You Make It Trend, You Make It True
    • As an example, searching for “Vitamin K shot” (a routine health intervention for newborns) returns almost entirely anti-vaccine propaganda; anti-vaccine conspiracists write prolific quantities of content about that keyword, actively selling the myth that the shot is harmful, causes cancer, causes SIDS. Searches for the phrase are sparse because medical authorities are not producing counter-content or fighting the SEO battle in response.
    • This is literally a use case where a mapping interface would show that something funny was going on in this belief space
  • Yuanyuan’s proposal defense
    • Surgical telementoring, trainee performing the operation is monitored remotely by expert.
    • These are physical models!
    • Manual coding
    • Tracks communication intention, not lexical content
    • Linear Mixed Model
      • Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure. For example, students could be sampled from within classrooms, or patients from within doctors.
    • DiCoT: a methodology for applying Distributed Cognition to the design of team working systems <– might be worth looking at for dungeon teams
    • Note, a wireless headset mic is nice if there are remote participants and you need to move around the room
    • GLIMMPSE power analysis
  • Add list of publications to the dissertation?
  • Good meeting with Wayne. Brought him up to speed on antibubbles.com. We discussed chiplay 2019 as a good next venue. We also went over what the iConference presentation might be. More as this develope, since it’s not all that clear. Certainly a larger emphasis on video. Also, it will be in the first batch of presentations.

Phil 11.15.18

ASRC PhD, NASA 7:00 – 5:00

  • Incorporate T’s changes – done!
  • Topic Modeling with LSA, PLSA, LDA & lda2Vec
    • This article is a comprehensive overview of Topic Modeling and its associated techniques.
  • More Grokking. Here’s the work for the day:
    # based on https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter5%20-%20Generalizing%20Gradient%20Descent%20-%20Learning%20Multiple%20Weights%20at%20a%20Time.ipynb
    import numpy as np
    import matplotlib.pyplot as plt
    import random
    
    # methods ----------------------------------------------------------------
    def neural_network(input, weights):
        out = input @ weights
        return out
    
    def error_gt_epsilon(epsilon: float, error_array: np.array) -> bool:
        for i in range(len(error_array)):
            if error_array[i] > epsilon:
                return True
        return False
    
    # setup vars --------------------------------------------------------------
    #inputs
    toes_array =  np.array([8.5, 9.5, 9.9, 9.0])
    wlrec_array = np.array([0.65, 0.8, 0.8, 0.9])
    nfans_array = np.array([1.2, 1.3, 0.5, 1.0])
    
    #output goals
    hurt_array  = np.array([0.2, 0.0, 0.0, 0.1])
    wl_binary_array   = np.array([  1,   1,   0,   1])
    sad_array   = np.array([0.3, 0.0, 0.1, 0.2])
    
    weights_array = np.random.rand(3, 3) # initialise with random weights
    '''
    #initialized with fixed weights to compare with the book
    weights_array = np.array([ [0.1, 0.1, -0.3], #hurt?
                             [0.1, 0.2,  0.0], #win?
                             [0.0, 1.3,  0.1] ]) #sad?
    '''
    alpha = 0.01 # convergence scalar
    
    # just use the first element from each array fro training (for now?)
    input_array = np.array([toes_array[0], wlrec_array[0], nfans_array[0]])
    goal_array = np.array([hurt_array[0], wl_binary_array[0], sad_array[0]])
    
    line_mat = [] # for drawing plots
    epsilon = 0.01 # how close do we have to be before stopping
    #create and fill an error array that is big enough to enter the loop
    error_array = np.empty(len(input_array))
    error_array.fill(epsilon * 2)
    
    # loop counters
    iter = 0
    max_iter = 100
    
    while error_gt_epsilon(epsilon, error_array): # if any error in the array is big, keep going
    
        #right now, the dot product of the (3x1) input vector and the (3x3) weight vector that returns a (3x1) vector
        pred_array = neural_network(input_array, weights_array)
    
        # how far away are we linearly (3x1)
        delta_array = pred_array - goal_array
        # error is distance squared to keep positive and weight the system to fixing bigger errors (3x1)
        error_array = delta_array ** 2
    
        # Compute how far and in what direction (3x1)
        weights_d_array = delta_array * input_array
    
        print("\niteration [{}]\nGoal = {}\nPred = {}\nError = {}\nDelta = {}\nWeight Deltas = {}\nWeights: \n{}".format(iter, goal_array, pred_array, error_array, delta_array, weights_d_array, weights_array))
    
        #subtract the scaled (3x1) weight delta array from the weights array
        weights_array -= (alpha * weights_d_array)
    
        #build the data for the plot
        line_mat.append(np.copy(error_array))
        iter += 1
        if iter > max_iter:
            break
    
    plt.plot(line_mat)
    plt.title("error")
    plt.legend(("toes", "win/loss", "fans"))
    plt.show()
  • Here’s a chart! Learning
  • Continuing Characterizing Online Public Discussions through Patterns of Participant Interactions

Phil 8.30.18

7:00 – 5:00  ASRC MKT

  • Target Blue Sky paper for iSchool/iConference 2019: The chairs are particularly looking for “Blue Sky Ideas” that are open-ended, possibly even “outrageous” or “wacky,” and present new problems, new application domains, or new methodologies that are likely to stimulate significant new research. 
  • I’m thinking that a paper that works through the ramifications of this diagram as it relates to people and machines. With humans that are slow responding with spongy, switched networks the flocking area is large. With a monolithic densely connected system it’s going to be a straight line from nomadic to stampede. Nomad-Flocking-Stampede2
    • Length: Up to 4 pages (excluding references)
    • Submission deadline: October 1, 2018
    • Notification date: mid-November, 2018
    • Final versions due: December 14, 2018
    • First versions will be submitted using .pdf. Final versions must be submitted in .doc, .docx or La Tex.
  • More good stuff on BBC Business Daily Trolling for Cash
    • Anger and animosity is prevalent online, with some people even seeking it out. It’s present on social media of course as well as many online forums. But now outrage has spread to mainstream media outlets and even the advertising industry. So why is it so lucrative? Bonny Brooks, a writer and researcher at Newcastle University explains who is making money from outrage. Neuroscientist Dr Dean Burnett describes what happens to our brains when we see a comment designed to provoke us. And Curtis Silver, a tech writer for KnowTechie and ForbesTech, gives his thoughts on what we need to do to defend ourselves from this onslaught of outrage.
  • Exposure to Opposing Views can Increase Political Polarization: Evidence from a Large-Scale Field Experiment on Social Media
    • Christopher Bail (Scholar)
    • There is mounting concern that social media sites contribute to political polarization by creating “echo chambers” that insulate people from opposing views about current events. We surveyed a large sample of Democrats and Republicans who visit Twitter at least three times each week about a range of social policy issues. One week later, we randomly assigned respondents to a treatment condition in which they were offered financial incentives to follow a Twitter bot for one month that exposed them to messages produced by elected officials, organizations, and other opinion leaders with opposing political ideologies. Respondents were re-surveyed at the end of the month to measure the effect of this treatment, and at regular intervals throughout the study period to monitor treatment compliance. We find that Republicans who followed a liberal Twitter bot became substantially more conservative post-treatment, and Democrats who followed a conservative Twitter bot became slightly more liberal post-treatment. These findings have important implications for the interdisciplinary literature on political polarization as well as the emerging field of computational social science.
  • Setup gcloud tools on laptop – done
  • Setup Tensorflow on laptop. Gave up un using CUDA 9.1, but got tf doing ‘hello, tensorflow’
  • Marcom meeting – 2:00
  • Get the concept of behaviors being a more scalable, dependable way of vetting information.
    • Eg Watching the DISI of outrage as manifested in trolling
      • “Uh. . . . not to be nitpicky,,,,,but…the past tense of drag is dragged, not drug.”: An overview of trolling strategies
        • Dr Claire Hardaker (Scholar) (Blog)
          • I primarily research aggression, deception, and manipulation in computer-mediated communication (CMC), including phenomena such as flaming, trolling, cyberbullying, and online grooming. I tend to take a forensic linguistic approach, based on a corpus linguistic methodology, but due to the multidisciplinary nature of my research, I also inevitably branch out into areas such as psychology, law, and computer science.
        • This paper investigates the phenomenon known as trolling — the behaviour of being deliberately antagonistic or offensive via computer-mediated communication (CMC), typically for amusement’s sake. Having previously started to answer the question, what is trolling? (Hardaker 2010), this paper seeks to answer the next question, how is trolling carried out? To do this, I use software to extract 3,727 examples of user discussions and accusations of trolling from an eighty-six million word Usenet corpus. Initial findings suggest that trolling is perceived to broadly fall across a cline with covert strategies and overt strategies at each pole. I create a working taxonomy of perceived strategies that occur at different points along this cline, and conclude by refining my trolling definition.
        • Citing papers
  • FireAnt (Filter, Identify, Report, and Export Analysis Toolkit) is a freeware social media and data analysis toolkit with built-in visualization tools including time-series, geo-position (map), and network (graph) plotting.
  • Fix marquee – done
  • Export to ppt – done!
    • include videos – done
    • Center title in ppt:
      • model considerations – done
      • diversity injection – done
  • Got the laptop running Python and Tensorflow. Had a stupid problem where I accidentally made a virtual environment and keras wouldn’t work. Removed, re-connected and restarted IntelliJ and everything is working!

Phil 8.17.18

7:00 – 4:30 ASRC MKT

Phil 8.16.18

7:00 – 4:30 ASRC MKT

  • R2D3 is an experiment in expressing statistical thinking with interactive design. Find us at @r2d3usR2D3
  • Foundations of Temporal Text Networks
    • Davide Vega (Scholar)
    • Matteo Magnani (Scholar)
    • Three fundamental elements to understand human information networks are the individuals (actors) in the network, the information they exchange, that is often observable online as text content (emails, social media posts, etc.), and the time when these exchanges happen. An extremely large amount of research has addressed some of these aspects either in isolation or as combinations of two of them. There are also more and more works studying systems where all three elements are present, but typically using ad hoc models and algorithms that cannot be easily transferred to other contexts. To address this heterogeneity, in this article we present a simple, expressive and extensible model for temporal text networks, that we claim can be used as a common ground across different types of networks and analysis tasks, and we show how simple procedures to produce views of the model allow the direct application of analysis methods already developed in other domains, from traditional data mining to multilayer network mining.
      • Ok, I’ve been reading the paper and if I understand it correctly, it’s pretty straightforward and also clever. It relates a lot to the way that I do term document matrices, and then extends the concept to include time, agents, and implicitly anything you want to. To illustrate, here’s a picture of a tensor-as-matrix: tensorIn2DThe important thing to notice is that there are multiple dimensions represented in a square matrix. We have:
        • agents
        • documents
        • terms
        • steps
      • This picture in particular is of an undirected adjacency matrix, but I think there are ways to handle in-degree and out-degree, though I think that’s probably better handled by having one matrix for indegree and one for out.
      • Because it’s a square matrix, we can calculate the steps between any node that’s on the matrix, and the centrality, simply by squaring the matrix and keeping track of the steps until the eigenvector settles. We can also weight nodes by multiplying that node’s row and column by the scalar. That changes the centrality, but ot the connectivity. We can also drop out components (steps for example) to see how that changes the underlying network properties.
      • If we want to see how time affects the development of the network, we can start with all the step nodes set to a zero weight, then add them in sequentially. This means, for example, that clustering could be performed on the nonzero nodes.
      • Some or all of the elements could be factorized using NMF, resulting in smaller, faster matrices.
      • Network embedding could be useful too. We get distances between nodes. And this looks really important: Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
      • I think I can use any and all of the above methods on the network tensor I’m describing. This is very close to a mapping solution.
  • The Shifting Discourse of the European Central Bank: Exploring Structural Space in Semantic Networks (cited by the above paper)
    • Convenient access to vast and untapped collections of documents generated by organizations is a valuable resource for research. These documents (e.g., Press releases, reports, speech transcriptions, etc.) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of such large document corpora does not come without challenges. Two of these challenges are 1) the need for appropriate automated methods for text mining and analysis and 2) the redundant and predictable nature of the formalized discourse contained in these collections of texts. Our article proposes an approach that performs well in overcoming these particular challenges for the analysis of documents related to the recent financial crisis. Using semantic network analysis and a combination of structural measures, we provide an approach that proves valuable for a more comprehensive analysis of large and complex semantic networks of formal discourse, such as the one of the European Central Bank (ECB). We find that identifying structural roles in the semantic network using centrality measures jointly reveals important discursive shifts in the goals of the ECB which would not be discovered under traditional text analysis approaches.
  • Comparative Document Analysis for Large Text Corpora
    • This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains—scientific papers and news—demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.
  • Social and semantic coevolution in knowledge networks
    • Socio-semantic networks involve agents creating and processing information: communities of scientists, software developers, wiki contributors and webloggers are, among others, examples of such knowledge networks. We aim at demonstrating that the dynamics of these communities can be adequately described as the coevolution of a social and a socio-semantic network. More precisely, we will first introduce a theoretical framework based on a social network and a socio-semantic network, i.e. an epistemic network featuring agents, concepts and links between agents and between agents and concepts. Adopting a relevant empirical protocol, we will then describe the joint dynamics of social and socio-semantic structures, at both macroscopic and microscopic scales, emphasizing the remarkable stability of these macroscopic properties in spite of a vivid local, agent-based network dynamics.
  • Tensorflow 2.0 feedback request
    • Shortly, we will hold a series of public design reviews covering the planned changes. This process will clarify the features that will be part of TensorFlow 2.0, and allow the community to propose changes and voice concerns. Please join developers@tensorflow.org if you would like to see announcements of reviews and updates on process. We hope to gather user feedback on the planned changes once we release a preview version later this year.

Phil 8.8.18

7:00 – 4:00 ASRC MKT

  • Oh, look, a new Tensorflow (1.10). Time to break things. I like the BigTable integration though.
  • Learning Meaning in Natural Language Processing — A Discussion
    • Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread. Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end. Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:

      1. a summary of the discussion that you will find below,
      2. an interactive view to explore the trees of tweets, and
      3. commented map to get an overview of the main points discussed:
  • The Current Best of Universal Word Embeddings and Sentence Embeddings
    • This post is thus a brief primer on the current state-of-the-art in Universal Word and Sentence Embeddings, detailing a few

      • strong/fast baselines: FastText, Bag-of-Words
      • state-of-the-art models: ELMo, Skip-Thoughts, Quick-Thoughts, InferSent, MILA/MSR’s General Purpose Sentence Representations & Google’s Universal Sentence Encoder.

      If you want some background on what happened before 2017 😀, I recommend the nice post on word embeddings that Sebastian wrote last year and his intro posts.

  • Treeverse is a browser extension for navigating burgeoning Twitter conversations. right_pane
  • Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices
    • With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.
  • Continuing to work on SASO slides – close to done. Got a lot of adversarial herding FB examples from the House Permanent Committee on Intelligence. Need to add them to the slide. Sobering.
  • And this looks like a FANTASTIC ride out of Trento: ridewithgps.com/routes/27552411
  • Fixed the border menu so that it’s a toggle group