Author Archives: pgfeldman

Phil 8.29.18

7:00 – 4:30 ASRC MKT

Editing videos
Need to think about short CHI paper about designing for culture/robot interactions. The trolly problem at scale? How would the sim be set up? The amount of randomness at the initial condition? Stiffness vs. connectivity? Beleif space is still important and is actually used as a concept in path planning
Visual Exploration and Comparison of Word Embeddings
- Word embeddings are distributed representations for natural language words, and have been wildly used in many natural language processing tasks. The word embedding space contains local clusters with semantically similar words and meaningful directions, such as the analogy. However, there are different training algorithms and text corpora, which both have a different impact on the generated word embeddings. In this paper, we propose a visual analytics system to visually explore and compare word embeddings trained by different algorithms and corpora. The word embedding spaces are compared from three aspects, i.e., local clusters, semantic directions and diachronic changes, to understand the similarity and differences between word embeddings.
Much work on slides

Can’t get Google to recognise my account?

curl.exe -H "Content-Type: application/json" -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) https://speech.google
apis.com/v1/speech:recognize -d @sync-request.json
curl: (6) Could not resolve host: ya29.c.EloHBu32-0nBAqimi1Zumlot6rjGtGpUk27qTTESRLW4vtd1LY4ihxBIesU3ga-kmwCaM7YZS-JRo_KNjaC_bj13dWazBcKr4YtAEQYFzSpSBx3DwdS46DTt0bg
{
  "error": {
    "code": 403,
    "message": "The request is missing a valid API key.",
    "status": "PERMISSION_DENIED"
  }
}

No idea what host: ya29.c.EloHBu32-0nBAqimi1Zumlot6rjGtGpUk27qTTESRLW4vtd1LY4ihxBIesU3ga-kmwCaM7YZS-JRo_KNjaC_bj13dWazBcKr4YtAEQYFzSpSBx3DwdS46DTt0bg is

Found a problem with the poster. There are two herding DTW charts. Must be reprinted

Phil 8.27.18

7:00 – 5:00 ASRC MKT

Good chat with Barbara yesterday. She suggests horse racing podcasts, since the question is always the same “who’s going to win today” and the information to discuss is much more constrained. Additionally, there is the wagering information that could be used to determine the level of consensus?
Found an idiom translator! “Swing of the pendulum” occurs at least in French, German and Italian
Downloaded the new videos Need to put them in the ppt when the slides stabilize
Pinged Wayne about getting together today
Changed the questions page to have English, Italian, French and German terms for belief space
Another example of diversity injection (twitter)

Working on podcast text handling

Created the MapsFromPodcasts project in Development
Created an new key and downloaded the key json file

Installed Google Cloud Tools (213.0.0), following the directions of this page. Wow. Lots of stuff!

Output folder: D:\Programs\GoogleCloudAPI
Downloading Google Cloud SDK core.
Extracting Google Cloud SDK core.
Create Google Cloud SDK bat file: D:\Programs\GoogleCloudAPI\cloud_env.bat
Installing components.
Welcome to the Google Cloud SDK!
Your current Cloud SDK version is: 213.0.0
Installing components from version: 213.0.0
+-----------------------------------------------------------------------------+
| These components will be installed. |
+-----------------------------------------------------+------------+----------+
| Name | Version | Size |
+-----------------------------------------------------+------------+----------+
| BigQuery Command Line Tool | 2.0.34 | < 1 MiB |
| BigQuery Command Line Tool (Platform Specific) | 2.0.34 | < 1 MiB |
| Cloud SDK Core Libraries (Platform Specific) | 2018.06.18 | < 1 MiB |
| Cloud Storage Command Line Tool | 4.33 | 3.6 MiB |
| Cloud Storage Command Line Tool (Platform Specific) | 4.32 | < 1 MiB |
| Cloud Tools for PowerShell | | |
| Cloud Tools for PowerShell | 1.0.1.8 | 17.9 MiB |
| Default set of gcloud commands | | |
| Windows command line ssh tools | | |
| Windows command line ssh tools | 2017.09.15 | 1.8 MiB |
| gcloud cli dependencies | 2018.08.03 | 1.3 MiB |
+-----------------------------------------------------+------------+----------+
For the latest full release notes, please visit:
https://cloud.google.com/sdk/release_notes
#============================================================#
#= Creating update staging area =#
#============================================================#
#= Installing: BigQuery Command Line Tool =#
#============================================================#
#= Installing: BigQuery Command Line Tool (Platform Spec... =#
#============================================================#
#= Installing: Cloud SDK Core Libraries (Platform Specific) =#
#============================================================#
#= Installing: Cloud Storage Command Line Tool =#
#============================================================#
#= Installing: Cloud Storage Command Line Tool (Platform... =#
#============================================================#
#= Installing: Cloud Tools for PowerShell =#
#============================================================#
#= Installing: Cloud Tools for PowerShell =#
#============================================================#
#= Installing: Default set of gcloud commands =#
#============================================================#
#= Installing: Windows command line ssh tools =#
#============================================================#
#= Installing: Windows command line ssh tools =#
#============================================================#
#= Installing: gcloud cli dependencies =#
#============================================================#
#= Creating backup and activating new installation =#
#============================================================#
Performing post processing steps...
..............................................................................................................................................................done.
Update done!
This will install all the core command line tools necessary for working with
the Google Cloud Platform.
For more information on how to get started, please visit:
https://cloud.google.com/sdk/docs/quickstarts
Google Cloud SDK has been installed!

Google is sooooooooooooooooooooo Unix/Linux

Meeting with Wayne
- Fix slides some more
- Email about demo and poster – done

Phil 8.26.18

Listening to On Being with guest Mahzarin Banaji (Scholar)

The other thing that I do is to actually create inputs into my mind of my own making. I do think that in some ways our brains are simple and that they will believe that things are real even if they’re not. So, that’s what movies do. That’s what novels do for us. So what if I have a series of 1,000 pictures that rotate through on my screen saver of people who come from many parts of the world that I will never, ever see or even think about. Look, just take an example close by. I have no idea what life for a farmer in Iowa is. I bet it’s hard. I bet I have no idea what they have to deal with. I don’t think I will ever truly understand.But, right now, they are a distant group in my mind. I live in Cambridge, Massachusetts, and I don’t think about farming and farmers. If my screensaver literally just points out the existence of such people and what their issues might be, I believe that my brain is going to begin to care at some level. And if I show myself possibilities that don’t exist easily, that’s even better.
A nice example of diversity injection

Phil 8.24.18

7:00 – 4:00 ASRC MKT

Make more obvious the Inadvertent Social Information and Digital ISI
- ISI
  - Trails
  - Visual clustering
  - Behavior around the commons (waterholes)
  - Presence of young
  - Mating behavior
  - etc.
- DISI
  - Words and their overall source (Social media, website content, contributor content, auto-generated, etc)
  - Votes (likes, kudos, karma points)
  - Money (site income, blockchain ledger)
  - Linking (href, retweet, share)
  - Images & videos
Work more on behavior patterns of humans and animals
- Highly organized (soccer match singing, marching, mass dancing events)
- Wildebeest feeding, defending,migrating and stampeding
AutoKeras is a GitHub project that uses the ENAS algorithm. It can be installed using pip. Since it’s written in Keras it’s quite easy to control and play with, so you can even dive into the ENAS algorithm and try making some modifications. If you prefer TensorFlow or Pytorch, there’s also public code projects for those here and here!
From Zeynep’s twitter
- So, Russian trolls amplified divisive content and helped spread vaccine misinformation. Look, the challenge before us is to redefine *critical thinking* to include figuring out what to believe, not just how to be skeptical. Personal and institutional.
- Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate
  - Whereas bots that spread malware and unsolicited content disseminated antivaccine messages, Russian trolls promoted discord. Accounts masquerading as legitimate users create false equivalency, eroding public consensus on vaccination.
Trying to decode podcasts. Here’s my testhttps://viztales.com/wp-content/uploads/2018/08/oneminute.mp3, and here are the results from Google speech-to-text:
- We were talking about the choices of who’s you can keep two of these three, I guess Adonis Alexander is along for the ride, huh? I thought I was about to I didn’t know I haven’t I haven’t sent it to him. Well, has he been out there? They might missing some guys got to hand. I kept thinking like if to say, they weren’t having these injuries. Like if they have like us to say, okay, but they have these reason iron Marshall and maybe he maybe he’s not available week one, but they don’t want to put them on IR prn’s things up. So maybe they have to add another running back like you so you have to create a roster spot I could imagine this is just speculation Alexander. Somehow gets the mysterious injury to put them on I are clearly my keys ready, right and they they would have five cornerbacks otherwise and you know, yeah, if you’re not going to be ready to go, but you may have to you know, go get okay. Yeah. I mean the he’s he’s a guy that I think is on based on these the way the wrong.
- It’s pretty good as long as people aren’t stepping over each other verbally.
- Good enough to try, I guess. Noisy data is life, right? Look for the bigger signal.
Here’s my current plan. It’s a half-assed first approach, but it should provide some insight.
1. Download a season of a sports podcast and put each podcast into it’s own document Here’s the tutorial for Speech-to-text with REST
2. Use Corpus Manager to convert, using BOW and create an ignore list for common words like “the”
3. then read all the docs into LMN
4. Then set the weight of each successive document (in time) so that its top
5. Take the top ten words and save them to a file
6. Try building a map

Phil 8.23.18

7:00 – 5:30 ASRC MKT

dlr99umvaaed9rk

Slides
- Groups/tribes stay the same, but the topics change
- Past polarizing topics:
  - Confederate statues
  - Kneeling for the national anthem
  - #blacklivesmatter
  - Hoodies
  - Crack cocaine
  - 1968 Olympics Black Power salute
  - Alabama bus boycott
- Stiffening a group creates a stampede (In-group high SIH)
- Adding group-invisible diversity disrupts the velocity and direction of a stampede
- Arendt/Moscovici slide “So we’re doomed, right! Except…”
- See what velocity of the disrupted stampede looks like
Why Trump Supporters Believe He Is Not Corrupt
- The answer may lie in how Trump and his supporters define corruption. In a forthcoming book titled How Fascism Works, the Yale philosophy professor Jason Stanley makes an intriguing claim. “Corruption, to the fascist politician,” he suggests, “is really about the corruption of purity rather than of the law. Officially, the fascist politician’s denunciations of corruption sound like a denunciation of political corruption. But such talk is intended to evoke corruption in the sense of the usurpation of the traditional order.”
Climate science proposals are being reviewed by Ryan Zinke’s old football buddy. Seriously.
- But what if the corruption isn’t hidden at all, but right out in the open? What if, when it’s identified, the perpetrator doesn’t apologize, or demonstrate any remorse or shame, and there’s no punishment? What then? We don’t really have good narratives around what happens in that situation, which is why the Trump administration so often leaves us sputtering and gawking. It can’t just be a motley collection of incompetent grifters, each misruling their own little fiefdom, trying to stay in their boss’s good graces, succeeding less through wits than a congenital lack of shame and the unstinting institutional support of GOP donors. Can it?

Phil 8.22.18

7:00 – 4:00 ASRC MKT

Reza Aslan in conversation with Ryan Bauer (no podcast!)
- Reza Aslan is an internationally acclaimed writer and religious scholar. His books include “No god but God: The Origins, Evolution, and Future of Islam,” “Zealot: The Life and Times of Jesus of Nazareth,” “How to Win a Cosmic War: God, Globalization and the End of the War on Terror” (published in paperback as “Beyond Fundamentalism”), and most recently “God: A Human History.” Aslan’s degrees include a Bachelor of Arts in Religious Studies, a Master of Theological Studies, a PhD in the Sociology of Religions, and a Master of Fine Arts from the University of Iowa, where he was named the Truman Capote Fellow in Fiction. Born in Iran, he now lives in Los Angeles, where he is an associate professor of creative writing at the University of California, Riverside, and a cooperative faculty member in the department of religion.
- One could argue that the clash of monotheisms is the inevitable result of monotheism itself. Whereas a religion of many gods posits many myths to describe the human condition, a religion of one god tends to be monomythic; it not only rejects all other gods, it rejects all other explanations for God. If there is only one God, then there may be only one truth, and that can easily lead to bloody conflicts of irreconcilable absolutisms. Missionary activity, while commendable for providing health and education to the impoverished throughout the world, is nonetheless predicated on the belief that there is but one path to God, and that all other paths lead toward sin and damnation. (source)
COBBS – Collective Behaviour in Biological Systems – lots of good papers!
- Finite-Size Scaling as a Way to Probe Near-Criticality in Natural Swarms.
- Information transfer and behavioral inertia in starling flocks.
- Starling Flock Networks Manage Uncertainty in Consensus at Low Cost
- Emergence of collective changes in travel direction of starling flocks from individual birds’ fluctuations
- Asja Jelic – theory, collective decision making
- Silvio Duarte Queiros – theory, diffusion
Back to slides – progress!
While riding at lunch, I realized I should record agent velocity. I should be able to see a difference between average velocity by phase, which would bear Ardent out. I made changes that save out the velocity by agent, but realized that it wouldn’t give me the info. Instead I used the distance to origin data and synthesized from there. Turns out that Arendt and Moscovici were right…

Phil 8.21.18

7:00 – 3:00 ASRC MKT

Rework the slides
- Explicit introduction, lit review, methods, results, conclusion and discussion slides
- Slide for the difference between opinion dynamics & consensus formation as a static end and part of a dynamic process. (Tribe membership may be static, belief of the tribe is highly dynamic. It’s the story for the group)
- Revisit stampede/flock/nomad slide in the conclusions
- Lose the following slides:
  - Belief space
  - Theory slide replace with a slide that breaks out the to knobs of dimension reduction and social influence horizons. The slide is called “the simple trick” Explain how herding affects these knobs by presenting simple issues and making the network stiffer through weight and connection
- Get rid of optical polarization
Fanning the Flames of Hate: Social Media and Hate Crime
- This paper investigates the link between social media and hate crime using Facebook data. We study the case of Germany, where the recently emerged right-wing party Alternative für Deutschland (AfD) has developed a major social media presence. We show that right-wing anti-refugee sentiment on Facebook predicts violent crimes against refugees in otherwise similar municipalities with higher social media usage. To further establish causality, we exploit exogenous variation in major internet and Facebook outages, which fully undo the correlation between social media and hate crime. We further find that the effect decreases with distracting news events; increases with user network interactions; and does not hold for posts unrelated to refugees. Our results suggest that social media can act as a propagation mechanism between online hate speech and real-life violent crime.
Facebook is rating the trustworthiness of its users on a scale from zero to 1
- Facebook has begun to assign its users a reputation score, predicting their trustworthiness on a scale from zero to 1.
- Tessa Lyons, product manager who is in charge of fighting misinformation (video)
Social Science One
- implements a new type of partnership between academic researchers and private industry to advance the goals of social science in understanding and solving society’s greatest challenges. The partnership enables academics to analyze the increasingly rich troves of information amassed by private industry in responsible and socially beneficial ways. It ensures the public maintains privacy while gaining societal value from scholarly research. And it enables firms to enlist the scientific community to help them produce social good, while protecting their competitive positions.
Lost Causes Is this fashion in economic theory (found via Twitter)?
Poster printing – UMBC Commonvision

Phil 8.19.18

7:00 – 5:30 ASRC MKT

Had a thought that the incomprehension that comes from misalignment that Stephens shows resembles polarizing light. I need to add a slider that enables influence as a function of alignment. Done
- Getting the direction cosine between the source and target belief
```
double interAgentDotProduct = unitOrientVector.dotProduct(otherUnitOrientVector);
double cosTheta = Math.min(1.0, interAgentDotProduct);
double beliefAlignment = Math.toDegrees(Math.acos(cosTheta));
double interAgentAlignment = (1.0 - beliefAlignment/180.0);
```
- Adding a global variable that sets how much influence (0% – 100%) influence from an opposing agent. Just setting it to on/off, because the effects are actually pretty subtle
Add David’s contributions to slide one writeup – done
Start slide 2 writeup
Find casters for Dad’s walker
- mcmaster.com/#9795t34/=1e8kh9a
Submit forms for DME repair
- Drat – I need the ECU number
Practice talk!
- Need to reduce complexity and add clearly labeled sections, in particular methods
I need to start paying attention to attention
Also, keeping this on the list How social media took us from Tahrir Square to Donald Trump by Zeynep Tufekci
Social Identity Threat Motivates Science – Discrediting Online Comments
- Experiencing social identity threat from scientific findings can lead people to cognitively devalue the respective findings. Three studies examined whether potentially threatening scientific findings motivate group members to take action against the respective findings by publicly discrediting them on the Web. Results show that strongly (vs. weakly) identified group members (i.e., people who identified as “gamers”) were particularly likely to discredit social identity threatening findings publicly (i.e., studies that found an effect of playing violent video games on aggression). A content analytical evaluation of online comments revealed that social identification specifically predicted critiques of the methodology employed in potentially threatening, but not in non-threatening research (Study 2). Furthermore, when participants were collectively (vs. self-) affirmed, identification did no longer predict discrediting posting behavior (Study 3). These findings contribute to the understanding of the formation of online collective action and add to the burgeoning literature on the question why certain scientific findings sometimes face a broad public opposition.

Phil 8.18.18

This looks good:

Created almost 25 years ago, when the web was in its infancy, Propaganda Critic is dedicated to promoting techniques of propaganda analysis among critically minded citizens.
In 2018, realizing that traditional approaches to propaganda analysis were not well-suited for making sense out of our contemporary political crisis, we completely overhauled Propaganda Critic to take into account the rise of ‘computational propaganda.’ In addition to updating all of the original content, we added nearly two dozen new articles exploring the rise of computational propaganda, explaining recent research on cognitive biases that influence how we interpret and retain information, and presenting recent case studies of how propaganda techniques have been used to disrupt democracy around the world.

Continuing to work on the SASO writeup – it’s coming along. Slower than I’d like…

This is just too good:

Data Organization in Spreadsheets
- Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.

Phil 8.17.18

7:00 – 4:30 ASRC MKT

Alex Steffen – how economies must adapt to cope with climate effects, such as areas (Miami) and industries (Fossil Fuels) that are overvalued because of costs that are not being factored in.
Going to start writing up (some) of my slides for SASO as a set of essays on Phlog to clarify my thinking
Add cross-referencing to poster – done!
More on Foundations of Temporal Text Networks – done!
More on Graph Laplacians, since this is coming up a lot
- Spectral Partitioning, Part 1 The Graph Laplacian
- Spectral Partitioning Part 2 Springs Fling <- harmonic intuition
- Spectral Partitioning Part 3 Algebraic Connectivity
- Spectral Partitioning, Part 4 Putting It All Together
- Unit 6 7b Spectral Clustering Algorithm <- nice look at PCA
- Spring Systems (Wikipedia)
- Equations of motion for undamped linear systems with many degrees of freedom
Need to spend some time looking into
Ok, here we go…. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
- Github repo (belongs to lead author, Jiezhong Qiu,)
- Since the invention of word2vec, the skip-gram model has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk, LINE, PTE, and node2vec approaches. In this work, we show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Our analysis and proofs reveal that: (1) DeepWalk empirically produces a low-rank transformation of a network’s normalized Laplacian matrix; (2) LINE, in theory, is a special case of DeepWalk when the size of vertices’ context is set to one; (3) As an extension of LINE, PTE can be viewed as the joint factorization of multiple networks» Laplacians; (4) node2vec is factorizing a matrix related to the stationary distribution and transition probability tensor of a 2nd-order random walk. We further provide the theoretical connections between skip-gram based network embedding algorithms and the theory of graph Laplacian. Finally, we present the NetMF method as well as its approximation algorithm for computing network embedding. Our method offers significant improvements over DeepWalk and LINE for conventional network mining tasks. This work lays the theoretical foundation for skip-gram based network embedding methods, leading to a better understanding of latent network representation learning.
  - So far, my basic insight is that matrix factorization is a form of (lossy) dimension reduction into an embedding space. Not sure yet how to use the factoring matrices as coordinates though. For example, a 2D matrix would be size L by M. For a 2D embedding, do you create an Lx2 and a 2xM factor matrices? Need to read more.
- …learning latent representations for networks, a.k.a., network embedding, has been extensively studied in order to automatically discover and map a network’s structural properties into a latent space.
  - ZOMG!

Phil 8.16.18

7:00 – 4:30 ASRC MKT

R2D3 is an experiment in expressing statistical thinking with interactive design. Find us at @r2d3us.
Foundations of Temporal Text Networks
- Davide Vega (Scholar)
- Matteo Magnani (Scholar)
- Three fundamental elements to understand human information networks are the individuals (actors) in the network, the information they exchange, that is often observable online as text content (emails, social media posts, etc.), and the time when these exchanges happen. An extremely large amount of research has addressed some of these aspects either in isolation or as combinations of two of them. There are also more and more works studying systems where all three elements are present, but typically using ad hoc models and algorithms that cannot be easily transferred to other contexts. To address this heterogeneity, in this article we present a simple, expressive and extensible model for temporal text networks, that we claim can be used as a common ground across different types of networks and analysis tasks, and we show how simple procedures to produce views of the model allow the direct application of analysis methods already developed in other domains, from traditional data mining to multilayer network mining.
  - Ok, I’ve been reading the paper and if I understand it correctly, it’s pretty straightforward and also clever. It relates a lot to the way that I do term document matrices, and then extends the concept to include time, agents, and implicitly anything you want to. To illustrate, here’s a picture of a tensor-as-matrix: The important thing to notice is that there are multiple dimensions represented in a square matrix. We have:
    - agents
    - documents
    - terms
    - steps
  - This picture in particular is of an undirected adjacency matrix, but I think there are ways to handle in-degree and out-degree, though I think that’s probably better handled by having one matrix for indegree and one for out.
  - Because it’s a square matrix, we can calculate the steps between any node that’s on the matrix, and the centrality, simply by squaring the matrix and keeping track of the steps until the eigenvector settles. We can also weight nodes by multiplying that node’s row and column by the scalar. That changes the centrality, but ot the connectivity. We can also drop out components (steps for example) to see how that changes the underlying network properties.
  - If we want to see how time affects the development of the network, we can start with all the step nodes set to a zero weight, then add them in sequentially. This means, for example, that clustering could be performed on the nonzero nodes.
  - Some or all of the elements could be factorized using NMF, resulting in smaller, faster matrices.
  - Network embedding could be useful too. We get distances between nodes. And this looks really important: Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
  - I think I can use any and all of the above methods on the network tensor I’m describing. This is very close to a mapping solution.
The Shifting Discourse of the European Central Bank: Exploring Structural Space in Semantic Networks (cited by the above paper)
- Convenient access to vast and untapped collections of documents generated by organizations is a valuable resource for research. These documents (e.g., Press releases, reports, speech transcriptions, etc.) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of such large document corpora does not come without challenges. Two of these challenges are 1) the need for appropriate automated methods for text mining and analysis and 2) the redundant and predictable nature of the formalized discourse contained in these collections of texts. Our article proposes an approach that performs well in overcoming these particular challenges for the analysis of documents related to the recent financial crisis. Using semantic network analysis and a combination of structural measures, we provide an approach that proves valuable for a more comprehensive analysis of large and complex semantic networks of formal discourse, such as the one of the European Central Bank (ECB). We find that identifying structural roles in the semantic network using centrality measures jointly reveals important discursive shifts in the goals of the ECB which would not be discovered under traditional text analysis approaches.
Comparative Document Analysis for Large Text Corpora
- This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains—scientific papers and news—demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.
Social and semantic coevolution in knowledge networks
- Socio-semantic networks involve agents creating and processing information: communities of scientists, software developers, wiki contributors and webloggers are, among others, examples of such knowledge networks. We aim at demonstrating that the dynamics of these communities can be adequately described as the coevolution of a social and a socio-semantic network. More precisely, we will first introduce a theoretical framework based on a social network and a socio-semantic network, i.e. an epistemic network featuring agents, concepts and links between agents and between agents and concepts. Adopting a relevant empirical protocol, we will then describe the joint dynamics of social and socio-semantic structures, at both macroscopic and microscopic scales, emphasizing the remarkable stability of these macroscopic properties in spite of a vivid local, agent-based network dynamics.
Tensorflow 2.0 feedback request
- Shortly, we will hold a series of public design reviews covering the planned changes. This process will clarify the features that will be part of TensorFlow 2.0, and allow the community to propose changes and voice concerns. Please join developers@tensorflow.org if you would like to see announcements of reviews and updates on process. We hope to gather user feedback on the planned changes once we release a preview version later this year.

Phil 8.12.18

7:00 – 4:00 ASRC MKT

Having an interesting chat on recommenders with Robin Berjon on Twitter
Long, but looks really good Neural Processes as distributions over functions
- Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:
  - Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
  - Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions
  Both have their advantages and drawbacks. In the limited data regime, GPs are preferable due to their probabilistic nature and ability to capture uncertainty. This differs from (non-Bayesian) neural networks which represent a single function rather than a distribution over functions. However the latter might be preferable in the presence of large amounts of data as training NNs is computationally much more scalable than inference for GPs. Neural Processes aim to combine the best of these two worlds.
How The Internet Talks (Well, the mostly young and mostly male users of Reddit, anyway)
- To get a sense of the language used on Reddit, we parsed every comment since late 2007 and built the tool above, which enables you to search for a word or phrase to see how its popularity has changed over time. We’ve updated the tool to include all comments through the end of July 2017.
Add breadcrumbs to slides
Download videos – done! Put these in the ppt backup
Fix the DTW emergent population chart on the poster and in the slides. Print!
Set up the LaTex Army BAA framework
Slide walkthough. Good timing. Working on the poster some more

Phil 8.14.18

7:00 – 4:30 ASRC MKT

Presented LaTex talk/workshop. I think it needs to be a more focused SIGCHI workshop that steps through the transition from a template document to a document with all the needed parts
- Will’s document then becomes a resource for how to do a particular task.
Promoted The Radio in Fascist Italy as a Phlog post. Need to add a takeaway section
Georgetown Law Technology Review (Vol 2, Issue 2)
More poster
BAA work? Lots, actually. Dug though the Army’s and found many good leads
Add to the list of things to read: How social media took us from Tahrir Square to Donald Trump
- To understand how digital technologies went from instruments for spreading democracy to weapons for attacking it, you have to look beyond the technologies themselves.

Phil 8.13.18

7:00 – 4:30 ASRC MKT

Naomi Oreskes (Webpage) (Wikipedia)(Scholar)(Amazon)
- (born November 25, 1958)^[1] is an American historian of science. She became Professor of the History of Science and Affiliated Professor of Earth and Planetary Sciences at Harvard University in 2013, after 15 years as Professor of History and Science Studies at the University of California, San Diego.^[2] She has worked on studies of geophysics, environmental issues such as global warming, and the history of science. In 2010, Oreskes co-authored Merchants of Doubt which identified some parallels between the climate change debate and earlier public controversies.^[3]
  Why Believe a Computer? Models, Measures, and Meaning in the Natural World
I think that this may be important and offers insight into how science (high-dimensional, physei-focused) reaches consensus: Consistently Eventual
- Eventual consistency occurs when the value for something is replicated in more than one place, and there is a protocol for these replicas converging. Changes to one or more of the replicas can be done independently, and they will propagate and converge.
Alabama Sea Turtles
Made some good progress on the poster
Good discussion with Aaron about the trajectory from the theory/simulation to the building maps from data in the wild. To a degree, it’s a hail-Mary pass. If it works, great – go back in to fill in the gaps. If it doesn’t progress more steadily from the theory through LSTM to JuryRoom.