Category Archives: research

Phil 11.14.17

7:00 – 4:00 ASRC MKT

Reinforcement Learning: An Introduction (2nd Edition)
- Richard S. Sutton (Scholar): I am seeking to identify general computational principles underlying what we mean by intelligence and goal-directed behavior. I start with the interaction between the intelligent agent and its environment. Goals, choices, and sources of information are all defined in terms of this interaction. In some sense it is the only thing that is real, and from it all our sense of the world is created. How is this done? How can interaction lead to better behavior, better perception, better models of the world? What are the computational issues in doing this efficiently and in realtime? These are the sort of questions that I ask in trying to understand what it means to be intelligent, to predict and influence the world, to learn, perceive, act, and think. In practice, I work primarily in reinforcement learning as an approach to artificial intelligence. I am exploring ways to represent a broad range of human knowledge in an empirical form–that is, in a form directly in terms of experience–and in ways of reducing the dependence on manual encoding of world state and knowledge.
- Andrew G. Barto : Most of my recent work has been about extending reinforcement learning methods so that they can work in real-time with real experience, rather than solely with simulated experience as in many of the most impressive applications to date. Of particular interest to me at present is what psychologists call intrinsically motivated behavior, meaning behavior that is done for its own sake rather than as a step toward solving a specific problem of clear practical value. What we learn during intrinsically motivated behavior is essential for our development as competent autonomous entities able to efficiently solve a wide range of practical problems as they arise. Recent work by my colleagues and me on what we call intrinsically motivated reinforcement learning is aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that form the building blocks for open-ended learning. Visit the Autonomous Learning Laboratory page for some more details.
There was a piece on BBC Business Daily on social network moderators. Aside from it being a horrible job, the show touched on how international criminal cases often rest on video uploaded to services like Twitter and Facebook. This process worked as long as the moderators were human and could tell the difference between criminal activity and the documentation of criminal activity, but now with ML solutions being implemented, these videos are being deleted. First, this shows how ad-hoc the usage of these networks are as a place for legal and journalistic activity. Second, it shows the need for a mechanism that is built to support these activities, where there is a more expansive role of reporter/researcher and editor. This is near the center of gravity for the TACJOUR project.
Flying home yesterday, I was thinking about how the maps need to get built. One way of thinking about it is that you are given a set of directions that run through a geographic area and have to build a map from that. We know the adjacencies by the sequence of the directions. It follows that we should be able to build a map by overlaying all the routes in an n-dimensional space. I was then reading Technical Perspective: Exploring a Kingdom by Geodesic Measures, and at least some of the concepts appear related. In the case of the game at least, we have the center ‘post’, which is the discussion starting point. The discussion is (can be) a random walk towards the poles created in that iteration. Multiple walks create multiple paths over this unknown Manifold. I’m thinking that this should be enough information to build a self organizing map. This might help: Visual analysis of self-organizing maps
- Had some discussions with Arron about this. It should be pretty straightforward to build a map, grid or hex that trajectories can be recorded from. Then the trajectories can be used to reconstruct the map. Success is evaluated by the similarity between the source map and the reconstructed one.
- I could also add recorded trajectories to the generated spreadsheet. It could be a list of cells that the agent traverses. Comparing explore, flocking and stampede behaviors in their reconstructed maps?
Continuing with From Keyword Search to Exploration
- The mSpace Browser is a multi faceted column based client for exploring large data sets in the way that makes sense to you. You decide the columns and the order that best suits your browsing needs.
- Yippy search
- Exalead search
- pg 62, animation
Continuing along with Angular
Multiple discussions with Aaron about next steps, particularly for anomaly detection

Phil 11.9.17

Instagram, Meme Seeding, and the Truth about Facebook Manipulation, Pt. 1

Jonathan Albright is the Research Director at the Tow Center for Digital Journalism. Previously an assistant professor of media analytics in the school of communication at Elon University, Dr. Albright’s work focuses on the analysis of socially-mediated news events, misinformation/propaganda, and trending topics, applying a mixed-methods, investigative data-driven storytelling approach.
The last couple of weeks have brought us the first new major revelations about the reach and scope of the IRA media influence campaign. Yet the most important development about the ongoing Facebook investigation isn’t the tenfold increase in the company’s updated estimate of the organic reach of “ads” on its platform.

While the estimate increasing the reach of IRA content from 10 million people to 126 million people is surely a leap, after last week’s testimony, the real question we should be asking is: how did we suddenly arrive at 150 million?

The answer is Instagram.

Reading The Group Polarization Phenomenon working on the PolarizationGame. Some thoughts:

There needs a way for each player to state their support/oppose state on a slider before the debate begins. We could even color code the threads using that information, though maybe only when viewing after the debate is complete.
What about teams?

The Emergence of a Fovea while Learning to Attend

Everything is about how we deal as individuals and groups with imperfect information. Which is why a attention-based economy is crazy

Identifying Dogmatism in Social Media: Signals and Models

We explore linguistic and behavioral features of dogmatism in social media and construct statistical models that can identify dogmatic comments. Our model is based on a corpus of Reddit posts, collected across a diverse set of conversational topics and annotated via paid crowdsourcing. We operationalize key aspects of dogmatism described by existing psychology theories (such as over-confidence), finding they have predictive power. We also find evidence for new signals of dogmatism, such as the tendency of dogmatic posts to refrain from signaling cognitive processes. When we use our predictive model to analyze millions of other Reddit posts, we find evidence that suggests dogmatism is a deeper personality trait, present for dogmatic users across many different domains, and that users who engage on dogmatic comments tend to show increases in dogmatic posts themselves.

Phil 11.7.17

7:00 – 6:00 ASRC MKT

Renting a spec Miata at Summit Point
This is really good: The Human Strategy A Conversation With Alex “Sandy” Pentland [10.30.17]
- Human behavior is determined as much by the patterns of our culture as by rational, individual thinking. These patterns can be described mathematically, and used to make accurate predictions. We’ve taken this new science of “social physics” and expanded upon it, making it accessible and actionable by developing a predictive platform that uses big data to build a predictive, computational theory of human behavior.
Rerunning the DTW with the selected agent weight being the specified weight rather than scaled by the distance from the angle so that it matches better the RANDOM_AGENT and the RANDOM_AGENTS settings.
Ok, here’s the results. The relationships between the populations appears more consistent, but that could be normal variability. Time for some true statistics to see if these are actually distinct populations. I can also increase power by doing more runs. Possibly also increasing the population size, though there might be confounding effects.
Pandas can read in a specific Excel sheet and numpy can run bootstrap on DataFrames, so I can automate the analysis. Going to talk to Aaron first, since he might be the one to go down this road.
I think the next step is to start on the UI for the polarization game. Angular?
- - Installing NodeJS
  - npm install -g @angular/cli -> added 968 packages in 56.599s. That is a lot of packages. The IntelliJ plugin seems to be working, the @angular/cli package is visible:
  - Creating a new project is reasonable
  - Once the project is running, the way to compile and run seems to be to run ng serve –open in the IntelliJ terminal (Note: When running as non-admin, do this in a terminal with admin privileges). It then does a whole bunch of things when a code change is made:
```
** NG Live Development Server is listening on localhost:4200, open your browser on http://localhost:4200/ **
 10% building modules 8/10 modules 2 active ...\PolarizationGameOneUI\src\styles.csswebpack: wait until bundle finished: /                                                              Date: 2017-11-07T15:50:25.164Z
Hash: b3174f5198d14bdc05ac
Time: 4708ms
chunk {inline} inline.bundle.js (inline) 5.79 kB [entry] [rendered]
chunk {main} main.bundle.js (main) 20.8 kB [initial] [rendered]
chunk {polyfills} polyfills.bundle.js (polyfills) 553 kB [initial] [rendered]
chunk {styles} styles.bundle.js (styles) 33.8 kB [initial] [rendered]
chunk {vendor} vendor.bundle.js (vendor) 7.02 MB [initial] [rendered]

webpack: Compiled successfully.
webpack: Compiling...
Date: 2017-11-07T15:51:07.132Z
Hash: 7b89b5a301e4a411e92d
Time: 703ms
```
  - Everything is then sent to localhost:4200/, so all the browser debuggers are available
  - And you can change the picture in the app.component.html file. re-renders on the fly. Pretty nifty. Yep verified:The ng serve command builds the app, starts the development server, watches the source files, and rebuilds the app as you make changes to those files.The --open flag opens a browser to http://localhost:4200/.
  - Pleasantly, if the install fails, ng serve –open will complete the install nd then start the server.
  - Added the ‘heroes’ component:
  - Then I got this error message:
```
ERROR in src/app/heroes/heroes.component.ts(7,18): error TS2304: Cannot find name 'ViewEncapsulation'.
```
  - Turns out that I had to add ViewEncapsulation to the imports in heroes.components:
```
import {Component, OnInit, ViewEncapsulation} from '@angular/core';

@Component({
  selector: 'app-heroes',
  templateUrl: './heroes.component.html',
  styleUrls: ['./heroes.component.css'],
  encapsulation: ViewEncapsulation.None
})
export class HeroesComponent implements OnInit {
  constructor() { }
  ngOnInit() {
  }
}
```
    Once added in, the rebuild happened and everything functioned normally. Correct error message in the IDE and everything!
Talked to Aaron about next steps with the herding data. We need to do something with NNs, and this could be a good fit
And now I have a nice little certificate of candidacy!

Phil 11.6.17

7:00 – 4:00 ASRC MKT

Going to try a batch job that runs the sim on a single population with a .2 radius and see if I can see a difference between the behaviors using DTW.
I had created a few bugs with changing the names of the flocks to Red and Green. Also, I had never run in batch mode with StorageAndRetreival. And calculations for an average center don’t work when there are no members of your flock. So fixing bugs.
First set of outputs from the batch jobs. Here’s the headings:
And here’s the DTW for the same settings (smaller stage though for proportionally greater differences):
The first really obvious thing it that NoHerding is distinct from the other settings, which are more like echo chambers. Groupings tighten up as the radius increases, and the average heading approach may be statistically better than the random agents, but not by much. Lastly, RANDOM_AGENTS and RANDOM_AGENT lie on a continuum. As the switch between each agent takes longer, the more AGENTS will start to look like AGENT.

Phil 11.3.17

7:00 – ASRC MKT

Good comments from Cindy on yesterday’s work
Facebook’s 2016 Election Team Gave Advertisers A Blueprint To A Divided US
Some flocking activity?
I realized that I had not added the herding variables to the Excel output. Fixed.
DINH Q. LÊ: South China Sea Pishkun
- In his new work, South China Sea Pishkun, Dinh Q. Lê references the horrifying events that occurred on April 30th 1975 (the day Saigon fell) as hundreds of thousands of people tried to flee Saigon from the encroaching North Vietnamese Army and Viet Cong. The mass exodus was a “Pishkun” a term used to describe the way in which the Blackfoot American Indians would drive roaming buffalo off cliffs in what is known as a buffalo jump.
Back to writing – got some done, mostly editing.
Stochastic gradient descent with momentum
Referred to in this: There’s No Fire Alarm for Artificial General Intelligence
- AlphaGo did look like a product of relatively general insights and techniques being turned on the special case of Go, in a way that Deep Blue wasn’t. I also updated significantly on “The general learning capabilities of the human cortical algorithm are less impressive, less difficult to capture with a ton of gradient descent and a zillion GPUs, than I thought,” because if there were anywhere we expected an impressive hard-to-match highly-natural-selected but-still-general cortical algorithm to come into play, it would be in humans playing Go.
In another article: The AI Alignment Problem: Why It’s Hard, and Where to Start
- This is where we are on most of the AI alignment problems, like if I ask you, “How do you build a friendly AI?” What stops you is not that you don’t have enough computing power. What stops you is that even if I handed you a hypercomputer, you still couldn’t write the Python program that if we just gave it enough memory would be a nice AI.
- I think this is where models of flocking and “healthy group behaviors” matters. Explore in small numbers is healthy – it defines the bounds of the problem space. Flocking is a good way to balance bounded trust and balanced awareness. Runaway echo chambers are very bad. These patterns are recognizable, regardless of whether they come from human, machine, or bison.
Added contacts and invites. I think the DB is ready:
While out riding, I realized what I can do to show results in the herding paper. There are at least three ways to herd:
1. No herding
2. Take the average of the herd
3. Weight a random agent
4. Weight random agents (randomly select an agent and leave it that way for a few cycles, then switch
Look at the times it takes for these to converge and see which one is best. Also look at the DTW to see if they would be different populations.
Then re-do the above for the two populations inverted case (max polarization)
Started to put in the code changes for the above. There is now a combobox for herding with the above options.

Phil 11.2.17

ASRC MKT 7:00 – 4:30

Add a switch to the GPM that makes the adversarial herders point in opposite directions, based on this: Russia organized 2 sides of a Texas protest and encouraged ‘both sides to battle in the streets’
It’s in and running. Here’s a screenshot: There are some interesting things to note. First, the vector is derived from the average heading of the largest group (green in this case). This explains why the green agents are more tightly clustered than the red ones. In the green case, the alignment is intrinsic. In the red case, it’s extrinsic. What this says to me is that although adversarial herding works well when amplifying the heading already present, it is not as effective when enforcing a heading that does not already predominant. That being said, when we have groups existing in opposition to each other, that is a tragically easy thing to enhance.
Hierarchical Representations for Efficient Architecture Search
- We explore efficient neural architecture search methods and present a simple yet powerful evolutionary algorithm that can discover new architectures achieving state of the art results. Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algorithm efficiently discovers architectures that outperform a large number of manually designed models for image classification, obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches and represents the new state of the art for evolutionary strategies on this task. We also present results using random search, achieving 0.3% less top-1 accuracy on CIFAR-10 and 0.1% less on ImageNet whilst reducing the architecture search time from 36 hours down to 1 hour.
Continuing with the schema. Here’s where we are today:

Phil 10.27.17

7:00 – 5:00 ASRC MKT

Nicely written paper on GANs:
- Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly ﬁne details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CELEBA images at 10242. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally,we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CELEBA dataset.
- With cool video
- And code
Working on adding UI and batch interaction for the adversarial herding
- Enable/disable switch – Done
- Field for power – don’t know what the scale should be so no slider yet – Done
- Set<String, Set<Flockingshape, weight>> If this doesn’t work, make shape comparable by name. Done!
```
HashMap<FlockingShape, Double> alignedShapeMap;
if(flock.size() > 0 && !alignedFlockMap.containsKey(flockName)){
    alignedShapeMap = new HashMap<>();
    alignedFlockMap.put(flockName, alignedShapeMap);
}else{
    alignedShapeMap = alignedFlockMap.get(flockName);
}
```
- Do I want to delay the triggering of the herding on a separate timer? Waiting on this.
- It’s done, and the results are kind of scary. If I set the weight of the herder to 15, I can change the change the flocking behavior of the default to echo chamber.
- Normal:
- Herding weight set to 15, other options the same:
Did some additional tweaking to see if having highly-weighted herders ignore each other (they would be coordinated through C&C) would have any effect. It doesn’t. There is enough interaction through the regular populations to keep the alignment space reduced.
It looks like there is a ‘sick echo chamber’ pattern. If the borders are reflective, and the herding weight + influence radius is great enough, then a wall-hugging pattern will emerge.
- The influence weight is sort of a credibility score. An agent that has a lot of followers, or says a lot of the things that I agree with has a lot of influence weight The range weight is reach.
- Since a troll farm or botnet can be regarded as a single organization, interacting with any one of the agents is really interacting with the root entity. So a herding agent has high influence and high reach. The high reach explains the border hugging behavior.
- It’s like there’s someone at the back of the stampede yelling YOUR’E GOING THE RIGHT WAY! KEEP AT IT! And they never go off the cliff because they are a swarm Or, it never goes of the cliff, because it manifests as a swarm.
- A loud, distributed voice pointing in a bad direction means wall hugging. Note that there is some kind of floating point error that lets wall huggers creep off the edge.
- With a respawn border, we get the situation where the overall heading of the flock doesn’t change even as it gets destroyed as it goes over the border. Again, since the herding algorithm is looking at the overall population, it never crosses the border but influences all the respawned agents to head towards the same edge:
Paper thoughts:
- Armys have different patterns from emergent groups. They are imposed formations and reflect a commander’s will
- From a distance, they look different, but close up, they may look the same. One of the reasons for the success of the Roman Legion was the use of formations against the less sophisticated structures of their adversaries [ref]

Phil 10.10.17

6:30 – 5:30 ASRC MKT

Spent about an hour going over Aaron’s presentation for tomorrow
DC submission is tomorrow at 3:00. No word back from Wayne about an AM meeting, so I guess it will be this afternoon?
Read Cindy’s comments. Interesting and perceptive.
More followup on yesterday’s discussions. Here are some strawman screen mockups for the game:
- Roughly, the idea is to turn a chat room into a “polarization game”. For phase 1,
  - Players are randomly chosen from the pool of available players. If we have cross-platform texting, we could handle this in a cross-platform way. Some of the controls from the browser version would have to be implemented in some compatible way. Maybe emoji characters? (Arrows, etc)
  - There is some scenario that the users discuss.
  - The game ends when all players agree on an outcome.
  - Something to evaluate is how much of the discussion should be visible.
    - Should it “fade out” (as shown), or should there be a searchable history? Parallel Version with History
    - Should all threads be shown simultaniously
  - Points are given to participants of a game that unanimously agree
  - Double points are given to the person who comes up with the agreed-upon outcome
  - Points are retained across games. Honor, glory, and prizes are awarded the winners.
    - This means leaderboards and other associated social promotion mechanisms.
    - Registration page, icon choice, etc
  - Might as well build in biometrics and ip address tracking so that we can flag suspicious games (E.g. where one person plays all roles)
- The initial runs will be in a controlled setting (at UMBC), so we can evaluate more aspects of the player’s experience.
  - Semi-structured interviews
  - Surveys (which could be an add-on to the game that pays in points)
Starting to do a deep dive into the Twillo API. Starting with a chat app.
Discussion with the interns about ways they would like to use the system, just to see if there was a strong need to support chat. Here’s the whiteboard:
- Some discussion about how long the game would last. If it were quick/real-time-ish, then it could live on a browser. Long term needs push notifications.
- Although the user has a login, create anonymous discussants so that a history doesn’t build up that other users can react against
- How do the posts get displayed? Time? Score?
- Is there feedback on who’s arguments are getting the most votes?
- To keep things playable, there may need to be a character cap. More than 140, less than xxx.
- Cut scenes of the resolution of the dilemma would be cool.
Looking at the setup of the umbc server
- Got the vpn (https://vpn.umbc.edu) set up and running
- As configured, the box is PHP/mysql. I can live with that. I can’t remember the mySQL password though. Doh!
Meeting with Wayne
- Got the edits back for the CHI DC. Leaning towards the CHIIR DC though. Amy agrees – says that the CHI DC is a ‘cattle call’
Some discussion about my review. Discovered that the article process for a journal is much more relaxed. There is time for multiple interactions with the authors.

Phil 9.21.17

6:00 – 10:30, 1:00 – 6:00 ASRC MKT

I think there is a difference between exploring, a deliberate exposing to things unknown and serendipity, an accidental encounter with the unknown. In the first case, the mind is prepared for the situation. In the second, the mind needs to be receptive to the serendipity. I think that design may matter a lot here. A serendipitous result low on a list may not have the same impact as a point on a map or a line in a story.
Oxford English dictionary’’s definitions of:
- serendipity: “the faculty of making happy and unexpected discoveries by accident”.
- explore: An act of exploring an unfamiliar place; an exploration, an excursion.
- discover: To disclose, reveal, etc., to others or (later) oneself; to find out.
- sagacity: Acuteness of mental discernment; aptitude for investigation or discovery; keenness and soundness of judgement in the estimation of persons and conditions, and in the adaptation of means to ends; penetration, shrewdness.
- synchronicity: the phenomenon of events which coincide in time and appear meaningfully related but have no discoverable causal connection.
Skimming these
- The bohemian bookshelf: supporting serendipitous book discoveries through information visualization
  - A Thudt, U Hinrichs, S Carpendale
  - Serendipity, a trigger of exciting discoveries when we least expect it, is currently being discussed as an often neglected but still important factor in information seeking processes, research, and ideation. In this paper we explore serendipity as an information visualization goal. In particular, we introduce the Bohemian Bookshelf visualization that aims to support serendipitous exploration of digital book collections. The Bohemian Bookshelf consists of five interlinked visualizations, each representing a unique (over)view of the collection. It facilitates serendipitous discoveries by (1) offering multiple access points by providing visualizations of different perspectives on the book collection, (2) enticing curiosity through abstract, metaphorical, and visually distinct representations of the collection, (3) highlighting alternate adjacencies between books, (4) providing multiple pathways for exploring the data collection in a flexible way, (5) supporting immediate previews of books, and (6) enabling a playful approach to information exploration. Our design goals and their exploration through the Bohemian Bookshelf visualization opens up a discussion on how to promote serendipity through information visualization.
  - six design goals that we have derived for promoting serendipitous discoveries through information visualization.
  - Austin coined the term altamirage that describes serendipitous discoveries as a result of chance paired with individual traits of the exploring person [2, 29].
  - This is closely related to the notion of synchronicity where related ideas may manifest as simultaneous occurrences that seem acausal but still meaningful [29].
  - The prevalence of these ideas of chance, fortuity, and coincidence in the discussion around serendipity has led to a tendency to trivialize this complex concept by assuming that serendipity can be supported simply through the introduction of randomness.
  - The design of the Bohemian Bookshelf offers multiple pathways through the book collection by (1) providing multiple interactive overviews of the book collection that can guide the information seeker into different and interesting directions, (2) the presentation of adjacent data that can act as visual signposts providing alternatives for the viewer to move through the dataset by following up on related books, and (3) emphasizing cross visualization attributes by mutual highlighting as in coordinated views [3, 7]
  - multiple pathways through the book collection that can provide guidance in a serendipitous way. The visual overviews can provide one way of exploring books. For instance, visitors can systematically browse through all books of their favourite colour and, in this way, possibly encounter books that are of interest to them but that they did not think of to search for directly. Furthermore, emphasizing adjacent books can be considered as visual signposts. For instance, following up on highlighted books in the Book Pile is likely to rapidly guide people serendipitously to different topical areas of the book collection. As a third approach to multiple pathways, all visualizations of the Bohemian Bookshelf are interlinked with each other. Therefore, every selection of a book in one visualization can be considered a cross road to the other visualizations that highlight this selection as well in their particular context.
  - We deliberately designed the Bohemian Bookshelf to provide multiple overviews of the entire book collection to provide opportunities to discover unexpected trends and relations within the collection.
- Discovery is never by chance: designing for (un)serendipity – finished. Good paper!
  - P André, J Teevan, ST Dumais
  - Serendipity has a long tradition in the history of science as having played a key role in many significant discoveries. Computer scientists, valuing the role of serendipity in discovery, have attempted to design systems that encourage serendipity. However, that research has focused primarily on only one aspect of serendipity: that of chance encounters. In reality, for serendipity to be valuable chance encounters must be synthesized into insight. In this paper we show, through a formal consideration of serendipity and analysis of how various systems have seized on attributes of interpreting serendipity, that there is a richer space for design to support serendipitous creativity, innovation and discovery than has been tapped to date. We discuss how ideas might be encoded to be shared or discovered by “association-hunting” agents. We propose considering not only the inventor‘s role in perceiving serendipity, but also how that inventor‘s perception may be enhanced to increase the opportunity for serendipity. We explore the role of environment and how we can better enable serendipitous discoveries to find a home more readily and immediately.
    - there is “no discovery of a thing you are looking for“
    - However, most systems designed to induce or facilitate serendipity have focused on the first aspect, subtly encouraging chance encounters, while ignoring the second part, making use of those encounters in a productive way.
    - Especially, however, we want to offer approaches to get at
      the desired effect of serendipity: insight
    - For us, serendipity is:
      1. the finding of unexpected information (relevant to the goal or not) while engaged in any information activity,
      2. the making of an intellectual leap of understanding with that information to arrive at an insight
    - In our study, a number of participants remarked that they thought of themselves as ‘serendipitous’, and were surprised to find no instances of it in their search behaviour.
      - This is because exploring is not serendipity. See first point above
    - Click entropy, a direct measure of how varied the result clicks are for the query, was found to be significant. That is, a positive correlation between entropy and the number of potentially serendipitous results suggests that people may have clicked varied results not just because they could not find what they wanted, but because they considered more things interesting, or were more willing to go off at a tangent.
    - Arguably however, almost all visualization systems are designed to support such a goal: identifying interesting, but unknown, trends or patterns in data that would not have been visible otherwise.
    - Erdelez‘s [12] so-called ‘super-encounterers’, encountering unexpected information on a regular basis, even counting on it as an important element in information acquisition.
    - Instead of treating serendipity as arcane, mysterious and accidental, we embrace the ability of computers to help us perceive connections and opportunities in various pieces of information
    - presenting such information to users has the potential to increase the overall information the user must interact with. This can lead to two problems: distraction or overload, and the negative consequences of incorrect or problematic recommendations or assumptions
    - It is widely acknowledged that serendipitous discoveries are preceded by a period of preparation and incubation [7]. They are, in that respect, not as ‗serendipitous‘ as we might expect, being the product of mental preparation as well as of an open and questioning mind
    - The challenge from a design perspective may not necessarily be discovering domain literature opportunities, but defining mechanisms for presenting these suggestions in ways that are effective for the investigator. Further to creating a reading list is defining the space to deliver them opportunistically
    - This idea again supposes a form of common language model, a way to express interest or expertise in particular areas, and a way to search for results.
    - In this spectrum, we have also demonstrated that computer science has spent most of it’s design effort perhaps overly focused on trying to create insight (effect of serendipity), by recreating the cause (chance), rather than on, for instance, increasing the rate and accuracy of proposed candidates for serendipitous insight, or developing domain expertise

Ordered this, too: Information Visualization: Beyond the Horizon. Has quite a bit on maps that’s going to be needed in the implications for design section
What is a Diagram?
- This paper responds to renewed interest in the centuries old question of what is a diagram. Existing status of our understanding of diagrams is seen as unsatisfactory and confusing. This paper responds to this by proposing a framework for understanding diagrams based on symbolic and spatial mapping. The framework deals with some complex problems any useful definition of diagrams has to deal with. These problems are the variety of diagrams, meaningful dynamics of diagramming, handling change in diagrams in a well formed way, and all of this in the context of semantically mixed diagrams. A brief description of the framework is given discussing how it addresses the problems.
Supporting serendipity: Using ambient intelligence to augment user exploration for data mining and web browsing.
- Has some very Research-Browser-ish bits in it
- an agent-based system to support internet browsing. It models the user‘s behaviour to look ahead at linked web pages and their word frequencies, using a Bayesian approach to determine relevance. It then colours links on the page depending on their relevance. In evaluation, the colouring was seen as successful, with people tending to follow the strongly advised links most of the time.
Retroactive answering of search queries
- Major search engines currently use the history of a user’s actions (e.g., queries, clicks) to personalize search results. In this paper, we present a new personalized service, query-specific web recommendations (QSRs), that retroactively answers queries from a user’s history as new results arise. The QSR system addresses two important subproblems with applications beyond the system itself: (1) Automatic identification of queries in a user’s history that represent standing interests and unfulfilled needs. (2) Effective detection of interesting new results to these queries. We develop a variety of heuristics and algorithms to address these problems, and evaluate them through a study of Google history users. Our results strongly motivate the need for automatic detection of standing interests from a user’s history, and identifies the algorithms that are most useful in doing so. Our results also identify the algorithms, some which are counter-intuitive, that are most useful in identifying interesting new results for past queries, allowing us to achieve very high precision over our data set.

Phil 9.12.17

7:00 – 5:00 ASRC MKT

Meeting with Wayne yesterday after Fika. Get him a draft by the end of the week to discuss Monday?
More writing
Herding in humans (Ramsey M. Raafat, Nick Chater, and Chris Frith)
- Herding is a form of convergent social behaviour that can be broadly defined as the alignment of the thoughts or behaviours of individuals in a group (herd) through local interaction and without centralized coordination. We suggest that herding has a broad application, from intellectual fashion to mob violence; and that understanding herding is particularly pertinent in an increasingly interconnected world. An integrated approach to herding is proposed, describing two key issues: mechanisms of transmission of thoughts or behaviour between agents, and patterns of connections between agents. We show how bringing together the diverse, often disconnected, theoretical and methodological approaches illuminates the applicability of herding to many domains of cognition and suggest that cognitive neuroscience offers a novel approach to its study.
Alignment in social interactions (M.Gallotti, M.T.Fairhurst, C.D.Frith)
- According to the prevailing paradigm in social-cognitive neuroscience, the mental states of individuals become shared when they adapt to each other in the pursuit of a shared goal. We challenge this view by proposing an alternative approach to the cognitive foundations of social interactions. The central claim of this paper is that social cognition concerns the graded and dynamic process of alignment of individual minds, even in the absence of a shared goal. When individuals reciprocally exchange information about each other’s minds processes of alignment unfold over time and across space, creating a social interaction. Not all cases of joint action involve such reciprocal exchange of information. To understand the nature of social interactions, then, we propose that attention should be focused on the manner in which people align words and thoughts, bodily postures and movements, in order to take one another into account and to make full use of socially relevant information.
Herding and escaping responses of juvenile roundfish to square mesh window in a trawl cod end (This is the only case I can find of 3-D stampeding. Note the [required?] dimension reduction)
- The movements of juvenile roundfish, mainly haddock Melanogrammus aeglefinus and whiting Merlangius merlangus, reacting to a square mesh window in the cod end of a bottom trawl were observed during fishing experiments in the North Sea. Two typical behavioral responses of roundfish are described as the herding response and the escaping response, which were analyzed from video recordings by time sequences of the movement parameters. It was found that most of the actively escaping fish approached the square mesh window at right angles by swimming straight ahead with very little change in direction, while most of the herded fish approached the net at obtuse angles and retreated by sharp turning. The herding and escaping responses showed significant difference when characterized by frequency distributions of swimming speed and angular velocity, and both responses showed large and irregular variations in swimming movement parameters like the panic erratic responses. It is concluded that an escaping or herding response to the square mesh window could be decided by an interaction between the predictable parameters that describe the stimuli of net and angular changes of fish response, such as approaching angle, turning angle and angular velocity.
Assessing the Effect of “Disputed” Warnings and Source Salience on Perceptions of Fake News Accuracy
- What are effective techniques for combating belief in fake news? Tagging fake articles with “Disputed by 3rd party fact-checkers” warnings and making articles’ sources more salient by adding publisher logos are two approaches that have received large-scale rollouts on social media in recent months. Here we assess the effect of these interventions on perceptions of accuracy across seven experiments (total N=7,534). With respect to disputed warnings, we find that tagging articles as disputed did significantly reduce their perceived accuracy relative to a control without tags, but only modestly (d=.20, 3.7 percentage point decrease in headlines judged as accurate). Furthermore, we find a backfire effect – particularly among Trump supporters and those under 26 years of age – whereby untagged fake news stories are seen as more accurate than in the control. We also find a similar spillover effect for real news, whose perceived accuracy is increased by the presence of disputed tags on other headlines. With respect to source salience, we find no evidence that adding a banner with the logo of the headline’s publisher had any impact on accuracy judgments whatsoever. Together, these results suggest that the currently deployed approaches are not nearly enough to effectively undermine belief in fake news, and new (empirically supported) strategies are needed.
Some meetings on marketing. Looks like we’re trying to get on this panel. Wrote bioblurbs!
More writing. Reasonable progress.

Phil 9.5.17

7:00 – 4:00 ASRC IRAD

Read some more Understanding Ignorance. He hasn’t talked about it, but it makes me look at game theory in a different way. GT is about making decisions with incomplete information. Ignorance results in decisions made using no or incorrect information. This is a modellable condition, and should result in observable results. Maybe something about output behaviors not mapping (at all? statistically equal to chance or worse?) to input information.
Heat maps!!!!
Playing around with the drawing so we’re working off of a white background. Not sure if it’s better?
Adding a decay factor so new patterns don’t get overwhelmed by old ones 0.999 seems to be pretty good.
Need to export to excel – Done!
Advanced Analytic Status meeting.
NOAA meeting. Looks like they want VISIBILITY. Need to write up scenarios from spreadsheet generation to complete integration from allocation to contract to deliverable. With dashboards.
Latest version of the heatmaps, This produced the excel sheets above (dbTest_09_06_17-07_01_51) Going to leave it like this while I write the paper:

Phil 8.23.17

Research

Started the ball rolling on 899 approval and getting together waith Wayne for a chat
899 is set up, Wayne is going to ECSCW, so sometime after that

8:30 – 5:30 ASRC

BRI suspended payment on the contract, so much churn. Lots of discussions with many people. Looks like an interview on Friday?
Had to reinstall Office. Getting coffee…
Connnected to my SVN and got the LabeledTensor code to work on. Need to add the support for labels and more than 32k entries.

Phil 8.18.17

7:00 – 8:00 Research

Got indexFromLocation() working. It took some fooling around with Excel. Here’s the method:

public int[] indexFromLocation(double[] loc){
    int[] index = new int[loc.length];
    for(int i = 0; i < loc.length; ++i){
        double findex = loc[i]/mappingStep;
        double roundDown = Math.floor(findex);
        double roundUp = Math.ceil(findex);
        double lowdiff = findex - roundDown;
        double highdiff = roundUp - findex;
        if(lowdiff < highdiff){
            index[i] = (int)roundDown;
        }else{
            index[i] = (int)roundUp;
        }
    }
    return index;
}

And here are the much cleaner results:
- [0.00, 0.00] = [0, 0]
  [0.00, 0.10] = [0, 0]
  [0.00, 0.20] = [0, 1]
  [0.00, 0.30] = [0, 1]
  [0.00, 0.40] = [0, 2]
  [0.00, 0.50] = [0, 2]
  [0.00, 0.60] = [0, 2]
  [0.00, 0.70] = [0, 3]
  [0.00, 0.80] = [0, 3]
  [0.00, 0.90] = [0, 4]
  [0.00, 1.00] = [0, 4]
  …
  [1.00, 0.00] = [4, 0]
  [1.00, 0.10] = [4, 0]
  [1.00, 0.20] = [4, 1]
  [1.00, 0.30] = [4, 1]
  [1.00, 0.40] = [4, 2]
  [1.00, 0.50] = [4, 2]
  [1.00, 0.60] = [4, 2]
  [1.00, 0.70] = [4, 3]
  [1.00, 0.80] = [4, 3]
  [1.00, 0.90] = [4, 4]
  [1.00, 1.00] = [4, 4]
Another thought that struck me as far as the (int) constraint is that I can have a number of ArrayLists that are embedded in a an object that has the first and last index in it. These would be linked together to provide unconstrained (MAX_VALUE or 2,147,483,647 lists) storage

8:30 – 4:30 BRI

I realized yesterday that the Ingest and Query microservices need to access the same GeoMesa Spring service. That keeps all the general store/query GeoMesa access code in one place, simplifies testing and allows for DI to provide the correct (hbase, accumulo, etc) implementation through a facade interface.
Got tangled up with getting classpaths right and importing the proper libraries
Got the maven files behaving, or at least not complaining on mvn clean and mvn compile!
Well that’s a new error: Error: Could not create the Java Virtual Machine. I get that running the new installation with the geomesa-quickstart-hbase
- Ah, that’s what will happen when you paste your command-line arguments into the VM arguments space just above where it should go…
- Wednesday’s goal will to verify that HBaseQuickStart is running correctly in its new home and start to turn it into a service.

Phil 8.17.17

BRI – one hour chasing down research hours from Jan – May

7:00 – 6:00 Research

Found this on negative flocking influences: The rise of negative partisanship and the nationalization of US elections in the 21st century. Paper saved to Lit Review
- One of the most important developments affecting electoral competition in the United States has been the increasingly partisan behavior of the American electorate. Yet more voters than ever claim to be independents. We argue that the explanation for these seemingly contradictory trends is the rise of negative partisanship. Using data from the American National Election Studies, we show that as partisan identities have become more closely aligned with social, cultural and ideological divisions in American society, party supporters including leaning independents have developed increasingly negative feelings about the opposing party and its candidates. This has led to dramatic increases in party loyalty and straight-ticket voting, a steep decline in the advantage of incumbency and growing consistency between the results of presidential elections and the results of House, Senate and even state legislative elections. The rise of negative partisanship has had profound consequences for electoral competition, democratic representation and governance.
Working on putting together an indexable high-dimension matrix that can contain objects. Generally, I’d expect it to be doubles, but I can see Strings and Objects as well.
Starting off by seeing what’s in the newest Apache Commons Math (v 3.6.1)
- Well, there’s a cool self organizing map, but the space partitioning has been deprecated.
Found SimpleTensor, which uses the Efficient Java Matrix Library (EJML) and creates a 3D block of rows, columns and slices. THought it was what I wanted, but nope
Looks like there isn’t a class that would do what I need to do, or that I can even modify. I’m thinking that the best option is to use org.apache.commons.math3.linear.AbstractRealMatrix as a template.
Nope, coudn’t figure out how to do things as nested lists. So I’m doing it C-Style, where you really only have one array that you index into. Here’s a 4x4x4x4 Tensor filled with zeroes:
Total elements = 256
0.0:[0, 0, 0, 0], 0.0:[1, 0, 0, 0], 0.0:[2, 0, 0, 0], 0.0:[3, 0, 0, 0],
0.0:[0, 1, 0, 0], 0.0:[1, 1, 0, 0], 0.0:[2, 1, 0, 0], 0.0:[3, 1, 0, 0],
0.0:[0, 2, 0, 0], 0.0:[1, 2, 0, 0], 0.0:[2, 2, 0, 0], 0.0:[3, 2, 0, 0],
0.0:[0, 3, 0, 0], 0.0:[1, 3, 0, 0], 0.0:[2, 3, 0, 0], 0.0:[3, 3, 0, 0],
0.0:[0, 0, 1, 0], 0.0:[1, 0, 1, 0], 0.0:[2, 0, 1, 0], 0.0:[3, 0, 1, 0],
….
0.0:[0, 2, 3, 3], 0.0:[1, 2, 3, 3], 0.0:[2, 2, 3, 3], 0.0:[3, 2, 3, 3],
0.0:[0, 3, 3, 3], 0.0:[1, 3, 3, 3], 0.0:[2, 3, 3, 3], 0.0:[3, 3, 3, 3]
The only issue that I currently have is that ArrayLists are indexed by int, so the total size is 32k elements. That should be good enough for now, but it will need to be fixed.

set() and get() work nicely:

lt.set(new int[]{0, 1, 0, 0}, 9.9);
lt.set(new int[]{3, 3, 3, 3}, 3.3);

System.out.println("[0, 1, 0, 0] = " + lt.get(new int[]{0, 1, 0, 0}));
System.out.println("[3, 3, 3, 3] = " + lt.get(new int[]{3, 3, 3, 3}));

[0, 1, 0, 0] = 9.9
[3, 3, 3, 3] = 3.3

Started the indexFromLocation method, but this is too sloppy:
```
index[i] = (int)Math.floor(Math.round(loc[i]/mappingStep));
```

Phil 8.16.17

7:00 – 8:00 Research

Added takeaway thoughts to my C&C writeup.
Working out how to add capability to the sim for P&RCH paper. My thoughts from vacation:
- The agents contribution is the heading and speed
- The UI is what the agent’s can ‘see’
- The IR is what is available to be seen
- An additional part might be to add the ability to store data in the space. Then the behavior of the IR (e.g. empty areas) would b more apparent, as would the effects of UI (only certain data is visible, or maybe only nearby data is visible) Data could be a vector field in Hilbert space, and visualized as color.
Updated IntelliJ
Working out how to to have a voxel space for the agents to move through that can also be drawn. It’s any number of dimensions, but it has to project to 2D. In the case of the agents, I just choose the first two axis. Each agent has an array of statements that are assembled into a belief vector. The space can be an array of beliefs. Are these just constructed so that they fill a space according to a set of rules? Then the xDimensionName and yDimensionName axis would go from (0, 1), which would scale to stage size? IR would still be a matter of comparing the space to the agent’s vector. Hmm.
This looks really good from an information horizon perspective: The Role of the Information Environment in Partisan Voting
- Voters are often highly dependent on partisanship to structure their preferences toward political candidates and policy proposals. What conditions enable partisan cues to “dominate” public opinion? Here I theorize that variation in voters’ reliance on partisanship results, in part, from the opportunities their environment provides to learn about politics. A conjoint experiment and an observational study of voting in congressional elections both support the expectation that more detailed information environments reduce the role of partisanship in candidate choice.

9:00 – 5:00 BRI

Good lord, the BoA corporate card comes with SIX seperate documents to read.
Onward to Chapter Three and Spring database interaction

Well that’s pretty clean. I do like the JdbcTemplate behaviors. Not sure I like the way you specify the values passed to the query, but I can’t think of anything better if you have more than one argument:

@Repository
public class EmployeeDaoImpl implements EmployeeDao {
    @Autowired
    private DataSource dataSource;

    @Autowired
    private JdbcTemplate jdbcTemplate;

    private RowMapper<Employee> employeeRowMapper = new RowMapper<Employee>() {
        @Override
        public Employee mapRow(ResultSet rs, int i) throws SQLException {
            Employee employee = new EmployeeImpl();
            employee.setEmployeeAge(rs.getInt("Age"));
            employee.setEmployeeId(rs.getInt("ID"));
            employee.setEmployeeName(rs.getString("FirstName") + " " + rs.getString("LastName"));
            return employee;
        }
    };

    @Override
    public Employee getEmployeeById(int id) {
        Employee employee = null;

        employee = jdbcTemplate.queryForObject(
                "select * from Employee where id = ?",
                new Object[]{id},
                employeeRowMapper
        );
        return employee;
    }

    public List<Employee> getAllEmployees() {
        List<Employee> eList = jdbcTemplate.query(
                "select * from Employee",
                employeeRowMapper
        );
        return eList;
    }
}

Here’s the xml to wire the thing up:

<context:component-scan base-package="org.springframework.chapter3.dao"/>
<bean id="employeeDao" class="org.springframework.chapter3.dao.EmployeeDaoImpl"/>

<bean id="dataSource"
      class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="${jdbc.driverClassName}" />
    <property name="url" value="${jdbc.url}" />
    <property name="username" value="xxx"/>
    <property name="password" value="yyy"/>
</bean>

<bean id="jdbcTemplate" class="org.springframework.jdbc.core.JdbcTemplate">
    <property name="dataSource" ref="dataSource" />
</bean>

<context:property-placeholder location="jdbc.properties" />

And here’s the properties. Note that I had to disable SSL:

jdbc.driverClassName=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:3306/sandbox?autoReconnect=true&useSSL=false

viztales

Dimension reduction, State, Orientation, and Speed