Category Archives: Conferences

Phil 8.19.18

7:00 – 5:30 ASRC MKT

  • Had a thought that the incomprehension that comes from misalignment that Stephens shows resembles polarizing light. I need to add a slider that enables influence as a function of alignment. Done
    • Getting the direction cosine between the source and target belief
      double interAgentDotProduct = unitOrientVector.dotProduct(otherUnitOrientVector);
      double cosTheta = Math.min(1.0, interAgentDotProduct);
      double beliefAlignment = Math.toDegrees(Math.acos(cosTheta));
      double interAgentAlignment = (1.0 - beliefAlignment/180.0);
    • Adding a global variable that sets how much influence (0% – 100%) influence from an opposing agent. Just setting it to on/off, because the effects are actually pretty subtle
  • Add David’s contributions to slide one writeup – done
  • Start slide 2 writeup
  • Find casters for Dad’s walker
  • Submit forms for DME repair
    • Drat – I need the ECU number
  • Practice talk!
    • Need to reduce complexity and add clearly labeled sections, in particular methods
  • I need to start paying attention to attention
  • Also, keeping this on the list How social media took us from Tahrir Square to Donald Trump by Zeynep Tufekci
  • Social Identity Threat Motivates Science – Discrediting Online Comments
    • Experiencing social identity threat from scientific findings can lead people to cognitively devalue the respective findings. Three studies examined whether potentially threatening scientific findings motivate group members to take action against the respective findings by publicly discrediting them on the Web. Results show that strongly (vs. weakly) identified group members (i.e., people who identified as “gamers”) were particularly likely to discredit social identity threatening findings publicly (i.e., studies that found an effect of playing violent video games on aggression). A content analytical evaluation of online comments revealed that social identification specifically predicted critiques of the methodology employed in potentially threatening, but not in non-threatening research (Study 2). Furthermore, when participants were collectively (vs. self-) affirmed, identification did no longer predict discrediting posting behavior (Study 3). These findings contribute to the understanding of the formation of online collective action and add to the burgeoning literature on the question why certain scientific findings sometimes face a broad public opposition.

Phil 8.17.18

7:00 – 4:30 ASRC MKT

Phil 8.12.18

7:00 – 4:00 ASRC MKT

  • Having an interesting chat on recommenders with Robin Berjon on Twitter
  • Long, but looks really good Neural Processes as distributions over functions
    • Neural Processes (NPs) caught my attention as they essentially are a neural network (NN) based probabilistic model which can represent a distribution over stochastic processes. So NPs combine elements from two worlds:
      • Deep Learning – neural networks are flexible non-linear functions which are straightforward to train
      • Gaussian Processes – GPs offer a probabilistic framework for learning a distribution over a wide class of non-linear functions

      Both have their advantages and drawbacks. In the limited data regime, GPs are preferable due to their probabilistic nature and ability to capture uncertainty. This differs from (non-Bayesian) neural networks which represent a single function rather than a distribution over functions. However the latter might be preferable in the presence of large amounts of data as training NNs is computationally much more scalable than inference for GPs. Neural Processes aim to combine the best of these two worlds.

  • How The Internet Talks (Well, the mostly young and mostly male users of Reddit, anyway)
    • To get a sense of the language used on Reddit, we parsed every comment since late 2007 and built the tool above, which enables you to search for a word or phrase to see how its popularity has changed over time. We’ve updated the tool to include all comments through the end of July 2017.
  • Add breadcrumbs to slides
  • Download videos – done! Put these in the ppt backup
  • Fix the DTW emergent population chart on the poster and in the slides. Print!
  • Set up the LaTex Army BAA framework
  • Olsson
  • Slide walkthough. Good timing. Working on the poster some more AdversarialHerding2

Phil 8.14.18

7:00 – 4:30 ASRC MKT

  • Presented LaTex talk/workshop. I think it needs to be a more focused SIGCHI workshop that steps through the transition from a template document to a document with all the needed parts
    • Will’s document then becomes a resource for how to do a particular task.
  • Promoted The Radio in Fascist Italy as a Phlog post. Need to add a takeaway section
  • Georgetown Law Technology Review (Vol 2, Issue 2)
  • More poster AdversarialHerding2
  • BAA work? Lots, actually. Dug though the Army’s and found many good leads
  • Add to the list of things to read: How social media took us from Tahrir Square to Donald Trump
    • To understand how digital technologies went from instruments for spreading democracy to weapons for attacking it, you have to look beyond the technologies themselves.

Phil 8.10.18

7:00 – ASRC MKT

  • Finished the first pass through the SASO slides. Need to start working on timing (25 min + 5 min questions)
  • Start on poster (A0 size)
  • Sent Wayne a note to get permission for 899
  • Started setting up laptop. I hate this part. Google drive took hours to synchronize
    • Java
    • Python/Nvidia/Tensorflow
    • Intellij
    • Visual Studio
    • MikTex
    • TexStudio
    • Xampp
    • Vim
    • TortoiseSVN
    • WinSCP
    • 7-zip
    • Creative Cloud
      • Acrobat
      • Reader
      • Illustrator
      • Photoshop
    • Microsoft suite
    • Express VPN

Phil 8.9.18

7:00 – 3:00 ASRC MKT

  • Working on the herding slide
  • Animals Teach Robots to Find Their Way
    • Michael Milford – “I always regard spatial intelligence as a gateway to understanding higher-level intelligence. It’s the mechanism by which we can build on our understanding of how the brain works.”
  • Direct recordings of grid-like neuronal activity in human spatial navigation
    • Grid cells in the entorhinal cortex appear to represent spatial location via a triangular coordinate system. Such cells, which have been identified in rats, bats, and monkeys, are believed to support a wide range of spatial behaviors. By recording neuronal activity from neurosurgical patients performing a virtual-navigation task we identified cells exhibiting grid-like spiking patterns in the human brain, suggesting that humans and simpler animals rely on homologous spatial-coding schemes. Human grid cells
  • The cognitive map in humans: spatial navigation and beyond
    • The ‘cognitive map’ hypothesis proposes that brain builds a unified representation of the spatial environment to support memory and guide future action. Forty years of electrophysiological research in rodents suggest that cognitive maps are neurally instantiated by place, grid, border and head direction cells in the hippocampal formation and related structures. Here we review recent work that suggests a similar functional organization in the human brain and yields insights into how cognitive maps are used during spatial navigation. Specifically, these studies indicate that (i) the human hippocampus and entorhinal cortex support map-like spatial codes, (ii) posterior brain regions such as parahippocampal and retrosplenial cortices provide critical inputs that allow cognitive maps to be anchored to fixed environmental landmarks, and (iii) hippocampal and entorhinal spatial codes are used in conjunction with frontal lobe mechanisms to plan routes during navigation. We also discuss how these three basic elements of cognitive map based navigation—spatial coding, landmark anchoring and route planning—might be applied to nonspatial domains to provide the building blocks for many core elements of human thought.
  • Spatial scaffold effects in event memory and imagination
    • Jessica Robin
    • Spatial context is a defining feature of episodic memories, which are often characterized as being events occurring in specific spatiotemporal contexts. In this review, I summarize research suggesting a common neural basis for episodic and spatial memory and relate this to the role of spatial context in episodic memory. I review evidence that spatial context serves as a scaffold for episodic memory and imagination, in terms of both behavioral and neural effects demonstrating a dependence of episodic memory on spatial representations. These effects are mediated by a posterior-medial set of neocortical regions, including the parahippocampal cortex, retrosplenial cortex, posterior cingulate cortex, precuneus, and angular gyrus, which interact with the hippocampus to represent spatial context in remembered and imagined events. I highlight questions and areas that require further research, including differentiation of hippocampal function along its long axis and subfields, and how these areas interact with the posterior-medial network.
  • Identifying the cognitive processes underpinning hippocampal-dependent tasks (preprint, not peer-reviewed)
    • Autobiographical memory, future thinking and spatial navigation are critical cognitive functions that are thought to be related, and are known to depend upon a brain structure called the hippocampus. Surprisingly, direct evidence for their interrelatedness is lacking, as is an understanding of why they might be related. There is debate about whether they are linked by an underlying memory-related process or, as has more recently been suggested, because they each require the endogenous construction of scene imagery. Here, using a large sample of participants and multiple cognitive tests with a wide spread of individual differences in performance, we found that these functions are indeed related. Mediation analyses further showed that scene construction, and not memory, mediated (explained) the relationships between the functions. These findings offer a fresh perspective on autobiographical memory, future thinking, navigation, and also on the hippocampus, where scene imagery appears to play a highly influential role.
  • Home early to wait for FedEx. And here’s a fun thing: dkgpgukx0aatbal

Phil 8.8.18

7:00 – 4:00 ASRC MKT

  • Oh, look, a new Tensorflow (1.10). Time to break things. I like the BigTable integration though.
  • Learning Meaning in Natural Language Processing — A Discussion
    • Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread. Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end. Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:

      1. a summary of the discussion that you will find below,
      2. an interactive view to explore the trees of tweets, and
      3. commented map to get an overview of the main points discussed:
  • The Current Best of Universal Word Embeddings and Sentence Embeddings
    • This post is thus a brief primer on the current state-of-the-art in Universal Word and Sentence Embeddings, detailing a few

      • strong/fast baselines: FastText, Bag-of-Words
      • state-of-the-art models: ELMo, Skip-Thoughts, Quick-Thoughts, InferSent, MILA/MSR’s General Purpose Sentence Representations & Google’s Universal Sentence Encoder.

      If you want some background on what happened before 2017 😀, I recommend the nice post on word embeddings that Sebastian wrote last year and his intro posts.

  • Treeverse is a browser extension for navigating burgeoning Twitter conversations. right_pane
  • Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices
    • With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.
  • Continuing to work on SASO slides – close to done. Got a lot of adversarial herding FB examples from the House Permanent Committee on Intelligence. Need to add them to the slide. Sobering.
  • And this looks like a FANTASTIC ride out of Trento:
  • Fixed the border menu so that it’s a toggle group

Phil 8.2.18

7:00 – 5:00 ASRC MKT

  • Joshua Stevens (Scholar)
    • At Penn State I researched cartography and geovisual analytics with an emphasis on human-computer interaction, interactive affordances, and big data. My work focused on new forms of map interaction made possible by well constructed visual cues.
  • A Computational Analysis of Cognitive Effort
    • Cognitive effort is a concept of unquestionable utility in understanding human behaviour. However, cognitive effort has been defined in several ways in literature and in everyday life, suffering from a partial understanding. It is common to say “Pay more attention in studying that subject” or “How much effort did you spend in resolving that task?”, but what does it really mean? This contribution tries to clarify the concept of cognitive effort, by introducing its main influencing factors and by presenting a formalism which provides us with a tool for precise discussion. The formalism is implementable as a computational concept and can therefore be embedded in an artificial agent and tested experimentally. Its applicability in the domain of AI is raised and the formalism provides a step towards a proper understanding and definition of human cognitive effort.
  • Efficient Neural Architecture Search with Network Morphism
    • While neural architecture search (NAS) has drawn increasing attention for automatically tuning deep neural networks, existing search algorithms usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling a more efficient training during the search. However, network morphism based NAS is still computationally expensive due to the inefficient process of selecting the proper morph operation for existing architectures. As we know, Bayesian optimization has been widely used to optimize functions based on a limited number of observations, motivating us to explore the possibility of making use of Bayesian optimization to accelerate the morph operation selection process. In this paper, we propose a novel framework enabling Bayesian optimization to guide the network morphism for efficient neural architecture search by introducing a neural network kernel and a tree-structured acquisition function optimization algorithm. With Bayesian optimization to select the network morphism operations, the exploration of the search space is more efficient. Moreover, we carefully wrapped our method into an open-source software, namely Auto-Keras for people without rich machine learning background to use. Intensive experiments on real-world datasets have been done to demonstrate the superior performance of the developed framework over the state-of-the-art baseline methods.
  • I think I finished the Dissertation Review slides. Walkthrough tomorrow!

Phil 7.31.18

7:00 – 6:00 ASRC MKT

  • Thinking that I need to forward the opinion dynamics part of the work. How heading differs from position and why that matters
  • Found a nice adversarial herding chart from The EconomistBrexit
  • Why Do People Share Fake News? A Sociotechnical Model of Media Effects
    • Fact-checking sites reflect fundamental misunderstandings about how information circulates online, what function political information plays in social contexts, and how and why people change their political opinions. Fact-checking is in many ways a response to the rapidly changing norms and practices of journalism, news gathering, and public debate. In other words, fact-checking best resembles a movement for reform within journalism, particularly in a moment when many journalists and members of the public believe that news coverage of the 2016 election contributed to the loss of Hillary Clinton. However, fact-checking (and another frequently-proposed solution, media literacy) is ineffectual in many cases and, in other cases, may cause people to “double-down” on their incorrect beliefs, producing a backlash effect.
  • Epistemology in the Era of Fake News: An Exploration of Information Verification Behaviors among Social Networking Site Users
    • Fake news has recently garnered increased attention across the world. Digital collaboration technologies now enable individuals to share information at unprecedented rates to advance their own ideologies. Much of this sharing occurs via social networking sites (SNSs), whose members may choose to share information without consideration for its authenticity. This research advances our understanding of information verification behaviors among SNS users in the context of fake news. Grounded in literature on the epistemology of testimony and theoretical perspectives on trust, we develop a news verification behavior research model and test six hypotheses with a survey of active SNS users. The empirical results confirm the significance of all proposed hypotheses. Perceptions of news sharers’ network (perceived cognitive homogeneity, social tie variety, and trust), perceptions of news authors (fake news awareness and perceived media credibility), and innate intentions to share all influence information verification behaviors among SNS members. Theoretical implications, as well as implications for SNS users and designers, are presented in the light of these findings.
  • Working on plan diagram – done
  • Organizing PhD slides. I think I’m getting near finished
  • Walked through slides with Aaron. Need to practice the demo. A lot.

Phil 7.27.18

Ted Underwood

  • my research is as much about information science as literary criticism. I’m especially interested in applying machine learning to large digital collections
  • Git repo with code for upcoming book: Distant Horizons: Digital Evidence and Literary Change
  • Do topic models warp time?
    • The key observation I wanted to share is just that topic models produce a kind of curved space when applied to long timelines; if you’re measuring distances between individual topic distributions, it may not be safe to assume that your yardstick means the same thing at every point in time. This is not a reason for despair: there are lots of good ways to address the distortion. The mathematics of cosine distance tend to work better if you average the documents first, and then measure the cosine between the averages (or “centroids”).
  • The Historical Significance of Textual Distances
    • Measuring similarity is a basic task in information retrieval, and now often a building-block for more complex arguments about cultural change. But do measures of textual similarity and distance really correspond to evidence about cultural proximity and differentiation? To explore that question empirically, this paper compares textual and social measures of the similarities between genres of English-language fiction. Existing measures of textual similarity (cosine similarity on tf-idf vectors or topic vectors) are also compared to new strategies that use supervised learning to anchor textual measurement in a social context.

7:00 – 8:00 ASRC MKT

  • Continued on slides. I think I have the basics. Need to start looking for pictures
  • Sent response to the SASO folks about who’s presenting what.

9:00 – ASRC IRAD

Phil 7.23.18

7:00 – ASRC MKT

  • Starting on the SASO slides. Found my diversity injection slide story:
    • Max Hawkins
      • (From NPR’s Invisibilia) “I just started thinking about these loops that we get into,” he says. “And about how the structure of your life … completely determines what happens in it.” Max’s once beautiful routine suddenly seemed unfulfilling. He felt like he was growing closer to people in his own bubble and becoming isolated from those outside of it. “There was something … that just made me feel trapped,” he says. “Like I was reading a story that I’d read before or I was playing out someone else’s script.” As any computer developer would do, Max turned to technology to craft his way out — a series of randomization applications.
    • Reading Review: Totalitarianism: The Revised Standard Version
      • …they have chosen to identify totalitarianism in terms of a set of six interrelated traits or characteristics-Fried- rich’s oft-referred-to “totalitarian syndrome” (9-io).25 The syndrome includes an official ideology (orientation), a single party typically led by one man (dimension reduction), a terroristic police (herding), a communications monopoly (social influence horizon), a weapons monopoly (??) and a centrally directed economy (dimension reduction)
  • Continued to spin up on LSTM effort. Got my dev environment COMPLETELY up to date. Continued with Deep learning & Keras

3:00 – 5:00 Fika & meeting with Wayne

  • Worked on the slides for PhD status. I realize that this is actually a good time to have demos with conclusions.
  • Talked about options if IRAD falls through
  • Need to think about what are the best ways for the work to have impact

Phil 7.18.18


There was no colusion“…”Anyone involved in that meddling to justice.

Premises for Data Science Magical Realism

  • What follows are some premises for data science magical realism stories based (very, very loosely) on experiences I’ve had or heard about — premises, that is, for stories about impossible, absurd, magical things happening to data scientists in ordinary data science situations. Enjoy!
  • More from David Masad

Program Synthesis in 2017-18

  • A high-level overview of the recent ideas and representative papers in program synthesis as of mid-2018.
  • Alex (Oleksandr) Polozov, a researcher in the Deep Procedural Intelligence group at Microsoft Research AI, Redmond. I work on neural program synthesis from input-output examples and natural language, intersections of machine learning and software engineering, and neuro-symbolic architectures. I am particularly interested in combining neural and symbolic techniques to tackle the next generation of AI problems, including program synthesis, planning, and reasoning.

UMAP Uniform Manifold Approximation and Projection for Dimension Reduction | SciPy 2018 |(video) (paper)

  • UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
  • This could be nice for building maps

7:00 – 5:00 ASRC MKT

  • Progress on getting my keys back!
  • Got everyone’s response on the Doodle, but only 4 of the 5 line up…
  • Finish first pass through PhD review slides
  • Start SASO slides and poster?
  • Continue with exporting terms from the sim and importing them into python. One of the things that will matter is the tagging of the data with the seed terms from the sim as well as the cell name so that reconstructions can be compared for accuracy.
  • Added the cell location to each <sampleData> so that there can be some kind of tagging/ground truth about the maps we’re inferring.
  • Working on iterating through the etree hierarchy. I can now read in the file, parse it and get elements that I’m looking for.
  • Tomorrow will be pulling the seed words out of the code in an ordered list. Generated sentences will need to be timestamped to that conversations can be reconstructed. That being said, it could be interesting to take seed words out of a generated sentence and add them to the embedding seed words. Something to think about.

Phil 7.17.18

I wrote up some thoughts about Trump’s press conference with Putin.

7:00 – 4:30 ASRC MKT

  • Still can’t connect to the Service center (Betriebsdienst Zentrum) at Zurich U. Tried pinging the conference organizer, who appears to be based on the campus – done. And some progress!
  • Travel report for SASO – done
  • Hotel in Trento – wait till tomorrow.
  • Ping Aaron M. about Doodle – Done
  • Set up meeting with Don – done
  • Start on slides – started

Phil 7.16.18

Vacation is over. Here are some pix

7:00 – 3:00 ASRC MKT

  • No problem logging into timesheet or email from the US. Odd.
  • Expense Report. Bring Receipts!
  • Call Zurich about keys – called. No one there today, call tomorrow before 9:00 +41 44 634 03 09
  • Get hotel in Trento

3:00 – 6:00 Fika, then meeting with Wayne

  • Schedule a meeting with Don to discuss LSTM agent text, and composer/choreographer for Dance my PhD
  • Put together a proposal for the mid-PhD that includes
    • Current work
    • LSTM next step
    • The Wayne Problem
      • Keep the committee as is (defend summer of 2019)
      • Adjust committee (who becomes co-chair?)
    • What to do about JuryRoom
      • Make it post-PhD work
      • Build an instantiation of the theory, but don’t do anything with it (unpublishable, but next steps would be)
      • Build a low-fi version of the website for lab testing
      • Build a 1,000 – 10,000 user version (MySQL, PHP, Angular)
      • Build a 10,000 – 1,000,000 user version
      • Build a fully scaled version

phil 7.12.18

Stampede thinking:

  • Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning
    • Gordon Pennycook
    • David Rand
    • Why do people believe blatantly inaccurate news headlines (“fake news”)? Do we use our reasoning abilities to convince ourselves that statements that align with our ideology are true, or does reasoning allow us to effectively differentiate fake from real regardless of political ideology? Here we test these competing accounts in two studies (total N = 3446 Mechanical Turk workers) by using the Cognitive Reflection Test (CRT) as a measure of the propensity to engage in analytical reasoning. We find that CRT performance is negatively correlated with the perceived accuracy of fake news, and positively correlated with the ability to discern fake news from real news – even for headlines that align with individuals’ political ideology. Moreover, overall discernment was actually better for ideologically aligned headlines than for misaligned headlines. Finally, a headline-level analysis finds that CRT is negatively correlated with perceived accuracy of relatively implausible (primarily fake) headlines, and positively correlated with perceived accuracy of relatively plausible (primarily real) headlines. In contrast, the correlation between CRT and perceived accuracy is unrelated to how closely the headline aligns with the participant’s ideology. Thus, we conclude that analytic thinking is used to assess the plausibility of headlines, regardless of whether the stories are consistent or inconsistent with one’s political ideology. Our findings therefore suggest that susceptibility to fake news is driven more by lazy thinking than it is by partisan bias per se – a finding that opens potential avenues for fighting fake news.

From Alessandro Bozzon (Scholar):

  • I am Assistant Professor with the Web Information Systemsgroup, at the Delft University of Technology. I am Research Fellow at the AMS Amsterdam Institute for Advanced Metropolitan Solutions, and a Faculty Fellow with the IBM Benelux Center of Advanced Studies.

    My research lies at the intersection of crowdsourcing, user modeling, and web information retrieval. I study and build novel Social Data science methods and tools that combine the cognitive and reasoning abilities of individuals and crowds, with the computational powers of machines, and the value of big amounts of heterogeneous data.

    I am currently active in three investigation lines related to Social Data Science: Intelligent Cities (SocialGlass; Crowdsourced Knowledge Creation in Online Social Communities (SEALINCMedia COMMIT/StackOverflow); and Enterprise Crowdsourcing (with IBM Benelux CAS).

  • Modeling CrowdSourcing Scenarios in Socially-Enabled Human Computation Applications
    • User models have been defined since the 1980s, mainly for the purpose of building context-based, user-adaptive applications. However, the advent of social networked media, serious games, and crowdsourcing/human computation platforms calls for a more pervasive notion of user model, capable of representing the multiple facets of social users and performers, including their social ties, interests, capabilities, activity history, and topical affinities. In this paper, we define a comprehensive model able to cater for all the aspects relevant for applications involving social networks and human computation; we capitalize on existing social user models and content description models, enhancing them with novel models for human computation and gaming activities representation. Finally, we report on our experiences in adopting the proposed model in the design and implementation of three socially enabled human computation platforms.
  • Sparrows and Owls: Characterisation of Expert Behaviour in StackOverflow
    • Question Answering platforms are becoming an important repository of crowd-generated knowledge. In these systems a relatively small subset of users is responsible for the majority of the contributions, and ultimately, for the success of the Q/A system itself. However, due to built-in incentivization mechanisms, standard expert identification methods often misclassify very active users for knowledgable ones, and misjudge activeness for expertise. This paper contributes a novel metric for expert identification, which provides a better characterisation of users’ expertise by focusing on the quality of their contributions. We identify two classes of relevant users, namely sparrows and owls, and we describe several behavioural properties in the context of the StackOverflow Q/A system. Our results contribute new insights to the study of expert behaviour in Q/A platforms, that are relevant to a variety of contexts and applications.