Phil 3.23.16

7:00 – 4:00 VTX

Continuing The Law of Group Polarization. Slow going. Mostly because there is so much good stuff.
- Overall, I’m arguing that viewing Group Polarization through the lens of Connectivism, we can see how networked communities are often driven into bubbles and that property can be used to evaluate the trustworthiness of an information source. This has implications for design at different levels of abstraction.At the UI level, it implies that giving a user more interactive control over the makeup of their news feed can inform them about the range of diversity in views about a particular topic and where their feed falls on that spectrum. Because this implies the presence of a larger group, it it is possible to provide the user with the means (through direct manipulation) to interactively adjust the makeup of their news feeds and expose them to more trustworthy sourcesAt the document level, it imples that a mix of lexical and link analysis should be sufficient to allow for indexing a document on a trustworthiness scale.At the network level, it implies that the relationships of documents within a network should be sufficient to place documents on a trustworthiness scale.
- Page 182 – And when one or more people in a group know the right answer to a factual question, the group is likely to shift in the direction of accuracy.
  - This is the effect of the Star Pattern. So how does someone find the right answer?
- Looking around for automated ways of doing Delphi Method
  - Consensus measurement in Delphi studies Review and implications for future quality assurance
  - Constructing Semantically Scalable Cognitive Spaces – probably useful for methods section. Add in link analysis and it’s certainly worth looking at
- Page 184: Group polarization has particular implications for insulated “outgroups” and (in the extreme case) for the treatment of conspiracies. Recall that polarization increases when group members identify themselves along some salient dimension, and especially when the group is able to define itself by contrast to another group. Outgroups are in this position-of self-contrast to others-by definition. Excluded by choice or coercion from discussion with others, such groups may become polarized in quite extreme directions, often in part because of group polarization. It is for this reason that outgroup members can sometimes be led, or lead themselves, to violent acts
- Stopped at pg 186 – III. DELIBERATIVE TROUBLE.
Looking at IBM Bluemix briefly in case we have to go down that route
- Registered.
- Chrome, or at least the way I set up Chrome and bluemix do not get along. trying Firefox. Still not great, but better.
- Since it looks like we’re not going to do wacky mash-ups, back to work on the rating app.
Hit the MySql max_packet limit. Changed to 4M. Other follow-on changes:

## of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 64M
innodb_additional_mem_pool_size = 8M
## Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 20M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

That seems to fixed things. Will have to remember for deployment though.
PageReader.useJsoupPattern IOException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
- http://stackoverflow.com/questions/7744075/how-to-connect-via-https-using-jsoup
- http://stackoverflow.com/questions/6755180/java-ssl-connect-add-server-cert-to-keystore-programatically
Just found http://boardsanctions.com/. I think that this is a site we should be crawling regularly…
Sprint review – note that we need to add version control to our blacklists.
Got inbound links working. The human-readable version is

link:http://dotearth.blogs.nytimes.com
In the search bar, this becomes

https://www.google.com/search?safe=off&q=link%3Ahttp%3A%2F%2Fdotearth.blogs.nytimes.com
And to make it into a CSE query that produces JSON

https://www.googleapis.com/customsearch/v1?key=AIzaSyAHnwXn3E3yqDeN0GVD9FeXpycRjd61s6g&cx=006834724223295726872:lxad5jdnfkq&q=link%3Ahttp%3A%2F%2Fdotearth.blogs.nytimes.com
Next, put that in the rating app

Phil 3.22.16

7:00 – 7:30

I think I want to install this??? https://github.com/dthree/cash
Still thinking about social trust and system trust. Today, Brussels was attacked by ISIS or ISIS sympathisers. An official when interviewed said that Belgium had been ‘prepared’ and was ready. No one was surprised that one group of people would try to kill another group of people. In other news, the iPhone from another set of killers was unflaggingly resisting attempts to unlock it. In many ways, every day (ironically because of the news) we are informed how horrible and untrustworthy people can be. And at the same time, every day, our machines generally do what they are supposed to do, and when looked at over time, get better at it. Is it any wonder that we have high system trust and low social trust (or high cynicism?).
This isn’t really new. Music can be pure. Musicians can be awful.
Continuing The Law of Group Polarization.
- Page 181: Thus when the context emphasizes each person’s membership in the social group engaging in deliberation, polarization increases. This finding is in line with more general evidence that social ties among deliberating group members tend to suppress dissent and in that way to lead to inferior decisions.
  - So a website with a strong point of view (Breitbart or Moveon or PETA for example) should have less variance among commenters, while more balanced should have more variance? Data may be here: http://www.journalism.org/2014/10/21/political-polarization-media-habits/. I would think that these could be compared against edit histories on Wikipedia for a more Star-like pattern?
- Persuasive Arguments Theory (PAT)?
- Interaction with others increases decision confidence but not decision quality: evidence against information collection views of interactive decision making.
  - So in this case, the paper was scanned and protected, so I couldn’t do OCR on it. The workaround was to export as jpg, then open the first jpg in Acrobat DC, select Tools->organize pages then Inset->from file, shift-click all the pages, select ‘insert after’ and read them in. Once that’s done go to ‘Enhance scans’ and run OCR on the file.
  - Anyway, the paper looks interesting, with quantitative support. I wonder why all this research seems to be focussed in the 1990s through early 2000s? The Wikipedia page on Group Polarization has a wider date range.
Working on the rating app. Worried that jsoup doesn’t seem to be pulling down pages that well
- Got a 403 on https://stackoverflow.com/questions/10716828/joptionpane-showconfirmdialog using URL.openStream, but it works on Google.
- Going to try a more web-scapey pattern. Checking out Jaunt.
Changing the selection lists
Adding a check to see what ratings have changed as a user check – Done
Need to start on the backlinks.
Meeting with Aaron about next steps based on the

Phil 3.21.16

7:30 – 4:30 VTX

Class today
- Two things – First, I wonder if we as researchers need to use the GSA standards for storing PII:
  - Encryption. Encrypt, using only NIST certified cryptographic modules, all data on mobile computers/devices carrying agency data unless the data is determined not to be sensitive, in writing, by your Deputy Secretary25 or a senior-level individual he/she may designate in writing;
  - Control Remote Access. Allow remote access only with two-factor authentication where one of the factors is provided by a device separate from the computer gaining access;
  - Time-Out Function. Use a “time-out” function for remote access and mobile devices requiring user re-authentication after thirty minutes of inactivity;
  - Log and Verify. Log all computer-readable data extracts from databases holding sensitive information and verify each extract, including whether sensitive data has been erased within 90 days or its use is still required; and
  - Ensure Understanding of Responsibilities. Ensure all individuals with authorized access to personally identifiable information and their supervisors sign at least annually a document clearly describing their responsibilities.
- Second, basically every security measure we take in a closed network provides a value judgement to the owner of the network. But our high system trust prevents us from seeing that when we untag a picture of us doing something embarrasing, we’re essentially saying to Facebook ‘this is a guilty pleasure‘.
Taxes this evening
In Emergencies, Should You Trust a Robot?
Starting The Law of Group Polarization. And in a semi-related thought, I wonder if flocking behavior can be used to describe this kind of behavior along dimensions of belief???
- Cass R. Sunstein
- Wacky. The text was unrecognizable so the quotation manager wouldn’t work. Wound up exporting the PDF to jpg, then using the ‘combine files’ tool to import all the pages, combining them into one document again then running OCR on that. And this was the official file from the Journal of Political Philosophy, so go figure.
Did some shepherding of the Crawl configuration. Gregg was sending 4 CSEs.
Finished up the CSEkiller. Wrote up documentation and added it to the CommonComponents.
Back to getting the rating app working.
Changing Provider to PersonOfInterest
Need to add ‘Personal’, ‘Educational’ and ‘Other’ to sources

Phil 3.18.16

7:30 – 4:00 VTX

Continuing Presenting Diverse Political Opinions: How and How Much – Finished. Wow.
- Some subjects wrote that they specifically did not want a list of solely supportive items and that they want opinion aggregators to represent a fuller spectrum of items, even if that includes challenge.
  - So here I’m wondering if interactivity in presenting the contents of the stories could be used as a proxy for these kinds of answers. Consistently setting values one way could mean more bubbly, while more change could imply star.
- In a plot of the percent agreeable items and satisfaction (Figure 5, top), the slope of the fit lines for the two list lengths follow each other quite closely, suggesting that count does not matter. When we plot the number of agreeable items (Figure 5, bottom), we can see a clear divergence. Furthermore, 2 agreeable items out of a total of 8 is superior to 2 agreeable items out of a total of 16(t(7.373) = 3.3471, p<0.05). Clearly, the presence of challenging items, not just the count of agreeable items,drives satisfaction. We conclude that the remaining subjects as a group are challenge-averse, though a few individuals may be support-seeking
News aggregator API list: http://www.programmableweb.com/category/News%20Services/apis?category=20250. I’m wondering if a study of slider ranking hooked up to a news aggregator feed might be useful.
Still working on the test harness to exercise the GoogleCSE.
Added command line args
Fixed stupid threading errors.
Checked in.

Phil 3.17.16

7:30 – 6:00 VTX

Ok, that was a great conference. Might be fun to go to Oslo next year.
Wrote up reflections on the HCC Comps while flying home.
Continuing Presenting Diverse Political Opinions: How and How Much
- Here’s the citation on Mechanical Turk study design: Crowdsourcing User Studies With Mechanical Turk
- Got through the Methods section. All kinds of good study design and rationale. Starting with Results tomorrow.
Need to submit expense report – done
Working on a test harness to exercise the GoogleCSE. Got the thread that calls the search built and running, and got most of the manager built. Heath wants it as a sanity check to evaluate Google local and in the deployment.

Phil 3.16.16

7:30 – 5:00 VTX

Write up some thoughts on Comps. What synthesis is, how study design doesn’t stop with the design, matrix of themes and papers, nuggets more than notes, etc.
Andy Field has a new stats book out. It looks nice: Discovering Statistics – The Reality Enigma
Starting on Presenting Diverse Political Opinions: How and How Much
- Sean A. Munson
- Paul Resnick – has worked on some very interesting things
  - BALANCE: Enhancing Diversity in News and Opinion Aggregators
  - RumorLens
- New aggregator site for political news Memeorandum
- New papers to add to the list….
  - Echo chambers online?: Politically motivated selective exposure among Internet news users
  - The Law of Group Polarization
- The Methods section is particularly good. I should probably use something similar.
Conference
Keynote – Pia Borlund Reflections on interactive IR evaluation issues.
- What data is required to answer the research question
- Which methods can provide the required data
- Types of users to test
- Data collection to use
- Number of participants
- Tasks to use in testing
- Time constraints
- Tailoring the work task situations according to requirements and inclusion of genuine information need that they bring with them.
- Rotation and counterbalancing of search tasks and systems (remove confounding variables) e.g. neutralize
  - System knowledge
  - topical knowledge
  - fatigue
- Protocols
- Tutorials,
- Pilot testing
- Test Design
  - Define the purpose of the study
  - formulate the questions
  - Design the study accordingly
    - Pre search questionnaire
    - Simulated work task
    - Transaction logging
    - Post-search interview
    - Observation
- A protocol (can be 50 steps including things like turning on the test computer, cleaning the keyboard, etc)
  - Is a step-by step description of the overall study procedure
  - Is a checklist to the investigator
  - Ensures consistency in the conduction of the study
  - Ensures that all participants get the same information
  - and….
  - Before testing section
  - Testing section
  - After testing section
- Tutorials
  - Introduction
  - Tasks expectation
  - Have the user explain what they are supposed to do
- Pilot testing
  - What questions the participants ask
  - Does the protocol work? Missing steps, wrong order
  - How long the study takes
  - Technical problems
  - that the required data is collected
  - To practice the control of pleasing effect and human nature
  - Continue doing the pilot test until the the protocol is sufficient for the study. This means being consistent in explaining the procedure. Maybe use a video instead?
  - Don’t do the study too close to the pilot, since lessons learned need to be incorporated.
- Everyday life information seeking: Approaching information seeking in the context of “way of life”
- Discussion
  - The amount of dedication in the search makes a difference. How does this manifest in the data?
  - Are we studying Great White Sharks in aquariums?
  - Does there need to be a research-based ISP/telecom? Spyware? Securityware? Studyware? How does the user get to review/edit set anonymization levels?
A Usefulness-based Approach for Measuring the Local and Global Effect of IIR Services
- Good paper on logging WRT new IIR tools to determine usefulness.
Assessing Learning Outcomes in Web Search: A Comparison of Tasks and Query Strategies
- The search system was hosted on Amazon EC2 and used an architecture derived from uFindIt [1] that logs user events such as queries and clicks to a MySQL database. The baseline ranked document lists for the single and multiple query conditions were provided by the Google Custom Search API.
- Intrinsically diverse presentation – rolls subtopics into ’10 blue link’ list.
- Coding by looking at the presence or absence of a predetermined list of items, this allows for sorting (n/N). Questions require examples of different levels of learning.
- Two groups – lots of clicking and exploring vs. fewer results longer reading.
- Need to do follow up studies to see the difference between learning and robust learning, where the knowledge is retained.
- There is something here. Have to ponder the kind if inferences that can be made.
A Comparison of Primary and Secondary Relevance Judgments for Real-Life Topics
- Shows how to make a corpus of variable relevance with respect to an information need. Very nice. Could be used for the person task but inverted? In other words we know what query returned a piece of text, but we give the reviewer the goal (bad doctor) and they judge relevance on those grounds, which lets us determine the quality of the query.
- Relevance – binary
- Confidence – 1 – 7
- Time, etc
- Experts have scanning behavior and can look for synonyms. Secondary assessors had to read in depth
  - Closed topics – what date did x happen? More agreement
  - Open topics – what caused the 2008 recession? Less agreement
- There is an altruistic(?) component in looking for “information that might be useful”.
Interactive Topic Modeling for aiding Qualitative Content Analysis
- Peter Bruza <- check out more. Semantic spaces and such
- Identification and interpretation of themes within text
- Types of content analysis
  - Summative – keywords
  - Conventional (inductive) Observation – NMF, LDA
  - Directed (Deductive)
  - Logic-LDA – Steerable using rules, and open source NMF – Linear Algebra underpinnings. LDA is probabilistic.
  - Semantic Validity – important for analyst confidence.Also known as topic coherence
  - Email for github repo.- done
- The Information Network: Exploiting Causal Dependencies in Online Information Seeking
  - Network of informational elements
  - Granger causality modeling? (prediction of causality model) not used for prediction in information before
  - They use wikipedia page views as a marker topic newsworthiness!!!
  - And that means that you can look at one page becoming popular that it’s popularity in time exists with in the context described by a Granger Causality
  - Auto regressive model is good for time series modeling
  - jaspreet singh singh@l3s.de <– Call about Wikipedia!!!

Phil 3.15.16

7:00 -4:00

Algorithm of Discovery – New Ideas From Computers
- Jantzen, B. “Discovery without a ‘logic’ would be a miracle,” Synthese (forthcoming). (preprint)
- From Jantzen’s current work:
  - My current work is focused on a suite of interrelated questions about natural kinds and the logic of discovery. I am attempting to test ideas about the logic of discovery by building algorithms that carry out automated scientific discovery. In particular, I’m interested in algorithms capable of generating novel ontologies or, put less grandly, novel sets of variables that may cross-cut those provided as input. Working drafts are available for some components and derivatives of this work (I’ll make more available soon). My paper arguing, contrary to received wisdom, that there must exist a logic of discovery can be found here. With regard to natural kinds, a paper in which I apply what I call the ‘dynamical kinds’ approach to the problem of levels of selection can be found here. A talk I gave on the same material at the International Conference on Evolutionary Patterns in Lisbon can be found here.
The Google UX Van came by yesterday and they asked what would I like from Google and I realized that an Ad-Free subscription would be worth some price. Interestingly, that’s kind of available through Custom Search. It would probably run 100 – 250/year doing it this way. And I’ve already got my keys, and you can point the omnibox to anything. Have to think about this…
Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.
- Presenting diverse political opinions: how and how much. From the abstract: We find individual differences: some people are diversity-seeking while others are challenge-averse. That sounds a lot like Star-and-Bubble to me!
- Finished! Long but good. Something like 65 quotes.
Starting Presenting diverse political opinions: how and how much.
I’m beginning to wonder if there is a way to have an interface that drives users to the confirming or exploratory camps. Basically see it that can be used to present cleaner data. (see highlighted below)

Conference

Knowledge Graphs versus Hierarchies: An Analysis of User Behaviours and Perspectives in Information Seeking
- Lookup, learn and investigate, last two are exploratory
- Knowledge Graph built from semantic relationships
- Watch behaviors based on these two representation(hierarchy vs graph)
- Context on the left, display on the right.
- Users interacted with graph structure more, and hierarchies sent users to the underlying documents.
Exploring the Use of Query Auto Completion: Search Behavior and Query Entry Proﬁles
- Does pulling from the QAC list imply looking for confirmation?
- Is this the kind of key that I was looking for above?
- As an aside, there needs to be a ‘benchmark query + SERP’ that allows for monitoring google for changes.Watching the watchers
  - What would the queries be?
  - How are the results evaluated?
  - What about topicality WRT queries? Should some be topical and others ‘classic’
  - SERP vs QAC storage and evaluation?
  - Other search engines to watch (Google, Bing, DuckDuckGo…)
What Affects Word Changes in Query Reformulation During a Task-based Search Session?
- Query vs. SERP using text analytics
- Reusing a word in a search is almost always a return from a subtopic to a main theme.
  - So if the subject of the query is a more specific version of the subject in the previous search, then we can get some interesting insights into the way that they are looking at the problem they are trying to solve.
- Also via one of the authors Cathy Smith: Helen Nissenbaum http://www.nyu.edu/projects/nissenbaum. Legal definitions of trust, etc.
Playing Your Cards Right: The Effect of Entity Cards on Search Behaviour and Workload
- Non-linear results page???
- HIT to find information on Axle Rose. Turk & CrowdFlower
- Bing API
- Marked Relevant or not
- Used the Wikipedia disambiguation pages to find ambiguous terms. Nice…
- Arbitrary insertion of non-relevant topics that were lexically similar.
- Attention paying questions to mark credible responses.
- MANOVA, ANOVA, with Bonferroni
- Query reformulation happens quicker when the card is off-topic
- No significance for card coherence. Habituation?
- Diverse cards have photos, which are credibility cues. If the photos are right, that’s reinforcing. If the photos are wrong, then that’s a very visible warning that the results aren’t credible. Which also makes me wonder how photos affect the psychology of the users. And system trust.
Impacts of Time Constraints and System Delays on User Experience
- Leif Azzopardi – good work on economics of search from an IR perspective.
- Multilevel Modelling?
- AQUAINT corpus
- Delay injection and task completion limits.
- This also relates to GPS navigation as well. Possibly interesting paper. And speaking of which:
  Come drive with me: an ethnographic study of driver-passenger pairs to inform future in-car assistance

Phil 3.14.15(+1)

7:00 – 6:00 VTX

Happy PI day!
At CHIIR 16 – User Modelling Tutorial yesterday, presentations today.
Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.
- Found Beyond the filter bubble: interactive effects of perceived threat and topic involvement on selective exposure to information. It shows that confirmation bias will affect users when presented with differing accounts on the same page. This seems to give strength to the idea of trustworthy/distrustworthy inference networks that can trace to authoritative material in a positive or negative way.
- Whooly.net – MSR – A mobile web app that makes latent, hyperlocal neighborhood communities more visible, to help neighbors connect. This project leverages intelligent filters and event detection algorithms to help users find relevant, spiking topics about what is happening here and now.
- Finding and assessing social media information sources in the context of journalism From the abstract: [It is a]…challenge to finding interesting and trustworthy sources in the din of the stream. In this paper we develop and investigate new methods for filtering and assessing the verity of sources found through social media by journalists. We take a human centered design approach to developing a system, SRSR (“Seriously Rapid Source Review”), informed by journalistic practices and knowledge of information production in events.
  - They build classifiers to discriminate sources and value!
- Reflect – Reflect makes a simple change to comment boards. Next to every comment, Reflect invites readers to succinctly restate the commenter’s points. These restatements are shown in a bulleted list.
- Microsoft Academic Knowledge API looks useful for tying back to experts maybe?
Keynote
- Mark Ackerman
- Information reuse and context
- Computer supported cooperative work –
- You have build things before you understand the world.
- Understand the world before you can build successful things
- CSCW is now social computing???
- CSCW has a pint of view
  - Use of collaborative contributions – i.e Google Docs.
  - Social navigation – ant trails This is diferent from wayfinding.
  - Iterative refinement
  - Reward systems dictate how people interact with social systems
  - Finding the background of the information for trust issues. With Lutters – aircraft engineers would throw out calculations of people they don’t know
  - The ‘cold start problem’ for recommender systems
  - Expert locators good programmers hang out in stackoverflow
  - Ifrastructuring == habituation?
  - The postmodern turn
  - FIT – How do you measure the information distance?
  - People placing themselves on their connectivist structure?? Can we know that this is true?
  - Activity traces
    - Mental Illness severity in online Pro-Eating disorder – CSCW 16
    - The livhoods project ICSWM 2012
    - Generalizing activity instantiations based on context
Papers
- Active and passive Utility of Search Interface Features
  - Hugo Huudeman
  - Interactive Search User INterfaces proven in micro-studies.
  - SearchAssist – it would be interesting to use a ranker as the backend
- The Forgotten Needle in My Collections: Task-Aware Ranking of Documents in Semantic Information Space
  - Ranking of items that are being used in a task recently.
  - PIMO Personal Information Model – Semantic layer over workspace
  - Users create the annotations
  - Look at task aware ranking. Connections, Beagle+++, SURPA, T-Fresh
  - Uses machine learning to determine the weights.
  - http://pimo.opendfki.de/
- Behaviour Mining for Automatic Task-Keeping and Visualisations for Task-Reﬁnding
  - Using the interaction between documents as a way to connect tasks as a network Didn’t work at first so they added types of interactions and weighted those
  - Labeled weak nodes rather than discarding- Polynomial decay function for flags???
  - Are FF plugins easier to write?
- Collaborative Information Retrieval
  - Rerank based on search context (chat, click through, previous queries from other team members)
  - We apply the following procedure to re-rank relevant documents for each to-be-supported query. For each candidate document, we estimate its document language model using Dirchilet smoothing [31], where we set the smoothing parameter µ = 100. The similarity between each candidate document and the contextual model is measured by the KL divergence between their estimated language models [30]. The matching between a candidate document and the query is determined by Google rank position of the given candidate document. This is because our experimental system uses Google results as the default. Instead of using linear interpolation proposed by Shen et al. [23], we employ LambaMART in RankLib2 to build a pairwise learning-to-rank approach for combining diﬀerent features.
- (The Lack of) Privacy Concerns with Sharing Web Activity at Work and the Implications for Collaborative Search.
  - Loosely coupled collaboration. Tools must have minimal effort to use.
  - Coupling – the amount of work that people can do individualy before they have to interact explicitly. It’s opportunistic.
  - Webwear???
- An AID for Avoiding Inadvertent Disclosure: Supporting Interactive Review for Privilege in E-Discovery
  - Automated annotators – this is exactly what we need for flags. Don’t see exactly how the annotator is built other than ML.
    - Used Enron Corpus
    - Entity Linking – people and companies
    - Type
    - Propensity that the person is involved with the confidential communication
    - Unigram language models
      - Privileged communication
      - Non-privileged communication
      - Compared the entropy for the top-n words and used that.
  - Relevance Review
  - Privilege review – by senior lawyers because of the error cost.
  - Lawyers have no System Trust? Why? What do they trust?
    - They trust stories with provenance, not a number generated by machine learning system
  - Increase in recall, decrease in precision? Huh.
  - People highlighting was useful,
  - Term highlighting was useful.
  - Preference for sentence and paragraphs.
Posters
- DP-means clustering?

Phil 3.13.16

9:00 – 5:00

Data journalism is IR with better affordances?

Still thinking about getting lost. In low information environments, credibility cues and entertainment value can lead to habituation. Habituation can help maintain this process beyond what someone who’s unfamiliar with the situation might draw the line at. Which means that the sense betrayal is higher?

ACM CHIIR Conference Day 1 – Tutorials User modelling on information retrieval

Quantifying performance
Practically significant?
Statistical significance?
User-centered evaluation
- Measure Users in the wild
  - A/B Testing, etc.
- User in the lab
User performance prediction
- Record user
- Create model
- Calibrate
- validate
- Use model to predict performance
Cranfield Paradigm – Cyril Cleverdon
- TREC – paid assessors
- User satisfaction for retrieval evaluation metrics
- Discounted Cumulative Gain (probability of document visited WRT rank) can also be normalized WRT an optimal return
- Expected Reciprocal Rank – pertinence calculation??? Based on the idea that there is one perfect document that is who’s utility is based on the position of the document
- Average precision <– search for this
Diversity, novelty (novelty), tractability
Underspecified vs. ambiguous queries
Specifications have aspects
Ambiguities have interpretations
Inferring query intent from reformulations and clicks
Ian Soboroff – Mr. TREC
Randomization – check animation in slides
Bootstrap –
Sign test – just for one side or the other of a value. A binomial distribution

Afternoon session

Evaluating whole systems
Metrics Derived from Query Logs
- Use the logs o understand user behavior, then…
- Learn the parameter of the user model from the query logs
Incorporating UI
User Variance
Time
- Costs in time spent searching
- Benefits in time well spent
- Initial Assessment – quickly scan the document first. So what if we could make that more amenable to measuring that effort.
  - Findability
  - Readablity
  - Understandability
  - If the judge has to use tools to find the relevant part of the document and mark it, those biometrics might be usable…
- Utility Extraction
- A real user goes through both stages, an Assessor only does step 1, Initial Assessment. But learning can be a third step? It’s certainly the step that would take the most time and require interdocument relationships
- What about learning how to disambigulating your query?
- Conceptual leaps???? Is that an information distance issue???
Session
- Time spent on the last clicked document.
- A session is just based on time (e.g. 30 minutes). TREC is leaving session and going to Task-Based
Task
- What is a Gold-Standard task??
- Which metrics to use??

Phil 3.11.16

8:00 – VTX

Created new versions of the Friday crawl scheduler, one for GOV, one for ORG.
The gap between inaccurate viral news stories and the truth is 13 hours, based on this paper: Hoaxy – A Platform for Tracking Online Misinformation
Here’s a rough list on why UGC stored in a graph might be the best way to handle the BestPracticesService.
- Self generating, self correcting information using incentivized contributions (every time a page you contributed to is used, you get money/medals/other…)
- Graph database, maybe document elements rather than documents
  BPS has its own network, but it connects to doctors and possibly patients (anonymized?) and their symptoms.
- Would support Results-driven medicine from a variety of interesting dimensions. For example we could calculate the best ‘route’ from symptoms to treatment using A*. Conversely, we could see how far from the optimal some providers are.
- Because it’s UGC, there can be a robust mechanism for keeping information current (think Wikipedia) as well as handling disputes
- Could be opened up as its own diagnostic/RDM tool.
- A graph model allows for easy determination of provenience.
- A good paper to look at: http://www.mdpi.com/1660-4601/6/2/492/htm. One of the social sites it looked at was Medscape, which seems to be UGC
Got the new Rating App mostly done. Still need to look into inbound links
Updated the blacklists on everything

Phil 3.10.16

7:00 – 3:30 VTX

Today’s thought. Trustworthiness is a state that allows for betrayal.
Since it’s pledge week on WAMU, I was listening to KQED this morning, starting around 4:45 am. Somewhere around 5:30(?) they ran an environment section that talked about computer-generated hypotheses. Trying to run that down with no luck.
Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.
- End-user–based framework approaches use different methods to allow for the differences between individual end-users for adaptive, interactive, or personalized assessment and ranking of UGC. They utilize computational methods to personalize the ranking and assessment process or give an individual end-user the opportunity to interact with the system, explore content, personally deﬁne the expected value, and rank content in accordance with individual user requirements. These approaches can also be categorized in two main groups: human centered approaches, also referred to as interactive and adaptive approaches, and machine-centered approaches, also referred to as personalized approaches. The main difference between interactive and adaptive systems compared to personalized systems is that they do not explicitly or implicitly use users’ previous common actions and activities to assess and rank the content. However, they give users opportunities to interact with the system and explore the content space to ﬁnd content suited to their requirements.
- Looks like section 3.1 is the prior research part for the Pertinence Slider Concept.
- Evaluating the algorithm reveals that enrichment of text (by calling out to
  search engines) outperforms other approaches by using simple syntactic conversion
  - This seems to work, although the dependency on a Google black box is kind of scary. It really makes me wonder what would happen if we analyzed the links created by a search of each sentence (where the subject is contained in the sentence?) would look like ant what we could learn…I took the On The Media retweet of a Google Trends tweet [“Basta” just spiked 2,550% on Google search as @hillaryclinton said #basta during #DemDebate][https://twitter.com/GoogleTrends/status/707756376072843268] and fed that into Google which returned:
```
4 results (0.51 seconds)
```
```
Search Results
```
```
Hillary Clinton said 'basta' and America went nuts | Sun ...
national.suntimes.com/.../7/.../hillary-clinton-basta-cnn-univision-debate/
9 hours ago - America couldn't get enough of a line Hillary Clinton dropped during Wednesday night's CNN/Univision debate after she ... "Basta" just spiked 2,550% on Google search as @hillaryclinton said #basta during #DemDebate.
```
```
Hillary is Asked If Trump is 'Racist' at Debate, But It Gets ...
https://www.ijreview.com/.../556789-hillary-was-asked-if-trump-was-raci...
"Basta" just spiked 2,550% on Google search as @hillaryclinton said #basta during #DemDebate. — GoogleTrends (@GoogleTrends) March 10, 2016.
```
```
Election 2016 | Reuters.com
live.reuters.com/Event/Election_2016?Page=93
Reuters
Happening during tonight's #DemDebate, below are the first three tracks: ... "Basta" just spiked 2,550% on Google search as @hillaryclinton said #basta during # ...
```
```
Maysoon Zayid (@maysoonzayid) | Twitter
https://twitter.com/maysoonzayid?lang=en
Maysoon Zayid added,. GoogleTrends @GoogleTrends. "Basta" just spiked 2,550% on Google search as @hillaryclinton said #basta during #DemDebate.
```
- Found Facilitating Diverse Political Engagement with the Living Voters Guide, which I think is another study of the Seattle system presented at CSCW in Baltimore. The survey indicates that it has a good focus on bubbles.
- Encouraging Reading of Diverse Political Viewpoints with a Browser Widget. Possibly more interesting are the papers that cite this…
- Can you hear me now?: mitigating the echo chamber effect by source position indicators Does offline political segregation affect the filter bubble? An empirical analysis of information diversity for Dutch and Turkish Twitter users Events and controversies: Influences of a shocking news event on information seeking
Finished and committed the CrawlService changes. Jenkens wasn’t working for some reason, so we spun on that for a while. Tested and validated on the Integration sysytem.
Worked some more on the Rating App. It compiles all the new persisted types in the new DB. Realized that the full website text should be in the result, not the rating.
Modified Margarita’s test file to use Theresa’s list of doctors.
Wrote up some notes on why a graph DB and UGC might be a really nice way to handle the best practices part of the task

Phil 3.9.16

7:00 – 2:30 VTX

Good discussion with Wayne yesterday about getting lost in a car with a passenger.
- The equivalent of a trapper situated in an environment who may not know where he is but is not lost is analogous to people exchanging information where the context is well understood, but new information is being created in that context. Think of sports enthusiasts or researchers. More discussion will happen about the actions in the game than the stadium it was played in. Similarly, the focus of a research paper is the results as opposed to where the authors appear in the document. Events can transpire to change that discussion (The power failure at the 2013 Superbowl, for example) but even then most of the discussion involves how the blackout affected gameplay.
- Trustworthy does not mean infallible. GPS gets things wrong, but we still depend on it. It has very high system trust. Interestingly, a Google Search of ‘GPS Conspiracy’ returns no hits about how GPS is being manipulated, while ‘Google Search Conspiracy’ returns quite a few appropriate hits.
- GPS can also be considered a potential analogy to how our information gathering behaviors will evolve. Where current search engines index and rank existing content, a GPS synthesises a dynamic route based on an ever-increasing set of constraints (road type, tolls, traffic, weather, etc). Similarly, computational content generation (of which computational journalism is just one of the early trailblazers) will also generate content that is appropriate for the current situation (in 500 feet turn right). Imagine a system that can take a goal “I want to go to the moon” and creates an assistant that constantly evaluates the information landscape to create a near optimal path to that goal with turn-by-turn directions.
- Studying how to create Trustworthy Anonymous Citizen Journalism is important then for:
  - Recognising individuals for who they are rather than who they say they are
  - Synthesizing trustworthy (quality?) content from the patterns of information as much as the content (Sweden = boring commute, Egypt = one lost, 2016 Republican Primaries = lost and frustrated direction asking, etc). The dog that doesn’t bark is important.
  - Determining the kind of user interfaces that create useful trustworthy information on the part of the citizen reporters and the interfaces and processes that organize, synthesise, curate and rank the content to the news consumer.
  - Providing a framework and perspective to provide insight into how computational content generation potentially reshapes Information Retrieval as it transitions to Information Goal Setting and Navigation.
Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.
- Reasonable progress. Found an interesting paper on how people on a data visualization blog spend a lot of time talking about the content, not the visualization.
- Got to End-User-Based Framework section
Finish tests – Done. Found a bug!
Submit paperwork for Wall trip in Feb. Done
Get back to JPA
- Set up new DB.
- Did the initial populate. Now I need to add in all the new data bits.
Margarita sent over a test json file. Verified that it worked and gave her kudos.

Phil 3.8.16

7:00 – 3:00 VTX

Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web. Dense paper, slow going.
- Ok, Figure 3 is terrible. Blue and slightly darker blue in an area chart? Sheesh.
- Here’s a nice nugget though regarding detecting fake reviews using machine learning: For assessing spam product reviews, three types of features are used [Jindal and Liu 2008]: (1) review-centric features, which include rating- and text-based features; (2) reviewer-centric features, which include author based features; and (3) product-centric features. The highest accuracy is achieved by using all features. However, it performs as efﬁciently without using rating-based features. Rating-based features are not effective factors for distinguishing spam and nonspam because ratings (feedback) can also be spammed [Jindal and Liu 2008]. With regard to deceptive product reviews, deceptive and truthful reviews vary concerning the complexity of vocabulary, personal and impersonal use of language, trademarks, and personal feelings. Nevertheless, linguistic features of a text are simply not enough to distinguish between false and truthful reviews. (Comparison of deceptive and truthful travel reviews). Here’s a later paper that cites the previous. Looks like some progress has been made: Using Supervised Learning to Classify Authentic and Fake Online Reviews
- And here’s a good nugget on calculating credibility. Correlating with expert sources has been very important: Examining approaches for assessing credibility or reliability more closely indicates that most of the available approaches use supervised learning and are mainly based on external sources of ground truth [Castillo et al. 2011; Canini et al. 2011]—features such as author activities and history (e.g., a bio ofan author), author network and structure, propagation (e.g., a resharing tree of a post and who shares), and topical-based affect source credibility [Castillo et al. 2011; Morris et al. 2012]. Castillo et al. [2011] and Morris et al. [2012] show that text- and content-based features are themselves not enough for this task. In addition, Castillo et al. [2011] indicate that authors’ features are by themselves inadequate. Moreover, conducting a study on explicit and implicit credibility judgments, Canini et al. [2011] ﬁnd that the expertise factor has a strong impact on judging credibility, whereas social status has less impact. Based on these ﬁndings, it is suggested that to better convey credibility, improving the way in which social search results are displayed is required [Canini et al. 2011]. Morris et al. [2012] also suggest that information regarding credentials related to the author should be readily accessible (“accessible at a glance”) due to the fact that it is time consuming for a user to search for them. Such information includes factors related to consistency (e.g., the number of posts on a topic), ratings by other users (or resharing or number of mentions), and information related to an author’s personal characteristics (bio, location, number of connections).
- On centrality in finding representative posts, from Beyond trending topics: Real-world event identification on twitter: The problem is approached in two concrete steps: ﬁrst by identifying each event and its associated tweets using a clustering technique that clusters together topically similar posts, and second, for each cluster of event, posts are selected that best represent the event. Centrality-based techniques are used to identify relevant posts with high textual quality and are useful for people looking for information about the event. Quality refers to the textual quality of the messages—how well the text can be understood by any person. From three centrality-based approaches (Centroid, LexRank [Radev 2004], and Degree), Centroid is found to be the preferred way to select tweets given a cluster of messages related to an event [Becker et al. 2012]. Furthermore, Becker et al. [2011a] investigate approaches for analyzing the stream of tweets to distinguish between relevant posts about real-world events and nonevent messages. First, they identify each event and its related tweets by using a clustering technique that clusters together topically similar tweets. Then, they compute a set of features for each cluster to help determine which clusters correspond to events and use these features to train a classiﬁer to recognizing between event and nonevent clusters.
  - And here’s a subsearch of the citations that mention ‘news’. Some good looking stuff.
Meeting with Wayne at 4:15
Crawl Service
- had the ‘&q=’ part at the wrong place
- Was setting the key = to the CSE in the payload, which caused much errors. And it’s working now! Here’s the full payload:
```
{
 "query": "phil+feldman+typescript+angular+oop",
 "engineId": "cx=017379340413921634422:swl1wknfxia",
 "keyId": "key=AIzaSyBCNVJb3v-FvfRbLDNcPX9hkF0TyMfhGNU",
 "searchUrl": "https://www.googleapis.com/customsearch/v1?",
 "requestId": "0101016604"
}
```
- Only the “query” field is required. There are hard-coded defaults for engineId, keyId and searchUrlPrefix
- Ok, time for tests, but before I try them in the Crawl Service, I’m going to try out Mockito in a sandbox
- Added mockito-core to the GoogleCSE2 sandbox. Starting on the documentation. Ok – that makes sense
- Added SearchRequestTest to CrawlService

Phil 3.7.16

VTX 8:00 – 5:00

Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web. Also, Wayne found a very interesting paper at CSCW: On the Wisdom of Experts vs. Crowds : Discovering Trustworthy Topical News in Microblogs. Looks like they are able to match against known experts. System trust plus expert trust? Will read in detail later.
I’ve been trying to simplify the concept of information bubbles and star patterns, particularly based on the Sweden and Gezi/Egypt papers, and I started thinking about getting lost in a car as a simple model. A car or other shared vehicle is interesting because it’s not possible to leave when it’s moving, and it’s rarely practical to leave anywhere other than the start and destination.
For case 1, imagine that you are in a car with a passenger driving somewhere that you both know, like a commute to work. There is no discussion of the route unless something unusual happens.Both participants look out over the road and see things that they recognise so the confidence that they are where they should be is high. The external world reinforces their internal model.
For case 2, imagine that two people are driving to a location where there is some knowledge of the area, but one person believes that they are lost and one person believes that they are not. My intuition is that this can lead to polarizing arguments, where each party points to things that they think that they know and use it to support their point.
In case 3, both people are lost. At this point, external information has to be trusted and used. They could ask for directions, get a map, etc. These sources have to be trusted, but they may not be trustworthy. Credibility cues help determine who gets asked. As a cyclist, I get asked for directions all the time, because people assume me to be local. I have also been the second or third person asked by someone who is lost. They are generally frustrated and anxious. And if I am in an area I know, and speak with authority, the relief I see is palpable.
Case 4 is a bit different and assumes the presence of an expert. It could be a GPS or a navigator, such as is used in motorsports like the WRC. Here, trust in the expert is very high. So much so that misplaced trust in GPS has lead to death. In this case, the view out the window is less important than the expert. The tendency to follow and ignore the evidence is so high that the evidence has to pile up in some distinctive way to be acknowledged.
Case 5 is kind of the inverse of case four. Imagine that there are two people in a vehicle who are trained in navigation as opposed to knowing a route. I’m thinking of people who live in the wilderness, but there are also navigation games like rallyes. In this case, the people are very grounded in their environment and never really lost, so I would expect their behavior to be different
These five cases to me seem to contain the essence of the difference between information bubbles and and star patterns. In a cursory look through Google Scholar, I haven’t seen much research into this. What I have found seems to be related to the field of Organizational Science. This is the best I’ve found so far:
Anyway, it seems possible to make some kind of simple multiplayer game that explores some of these concepts and would produce a very clean data set. Generalizations could definitely carry over to News, Politics, Strategy, etc.
Need to think about bias.
Starting on Crawl Service
- Running the first gradle build clean in the command line. I’m going to see if this works without intellij first
- Balaji said to set <serviceRegistry>none</serviceRegistry> in the srs/main/resources crawlservice-config.xml, but it was already set.
- Found the blacklist there too. Might keep it anyway. Or is it obsolete?
- To execute is java -jar build/libs/crawlservice.war
- Trying to talk to CrawlService. Working in Postman on http://localhost:8710/crawlservice/search
- changed the SearchRequest.java and CrawlRequest.java to be able to read in and store arguments
- Had to drill into SearchQuery until I saw that SearchRequest is buried in there.
- Trying to put together the uri in GoogleWebSearch.getUri to handle the SearchRequest.
- A little worried about there not being a CrawlQuery
- It builds but I’m afraid to run it until tomorrow.
Still hanging fire on updating the JPA on the new curation app..

Phil 3.4.16

VTX 7:00 – 5:00

Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web
- Adding N. Diakopoulos and M. Naaman. Topicality, Time, and Sentiment in Online News Comments. Conference on Human Factors in Computing Systems (CHI) Works in Progress. May, 2011. [PDF] Short! Yay!
- Added Adaptive Faceted Ranking for Social Media Comments. I think it may touch on my idea of Pertinence ranking using Markov Chains.

Scanned Exploiting Social Context for Review Quality Prediction and realized that it’s got some very good hints for markers that can be used to use for machine learning on the doctor records

Feature Name 	Type 		Feature Description
NumToken 	Text-Stat 	Total number of tokens.
NumSent 	Text-Stat 	Total number of sentences.
UniqWordRatio 	Text-Stat 	Ratio of unique words
SentLen 	Text-Stat 	Average sentence length.
CapRatio 	Text-Stat 	Ratio of capitalized sentences.
POS:NN 		Syntactic 	Ratio of nouns.
POS:ADJ 	Syntactic 	Ratio of adjectives.
POS:COMP 	Syntactic 	Ratio of comparatives.
POS:V: 		Syntactic 	Ratio of verbs.
POS:RB 		Syntactic 	Ratio of adverbs.
POS:FW 		Syntactic 	Ratio of foreign words.
POS:SYM 	Syntactic 	Ratio of symbols.
POS:CD 		Syntactic 	Ratio of numbers.
POS:PP 		Syntactic 	Ratio of punctuation symbols.
KLall 		Conformity 	KL div DKL(Tr||Ti)
PosSEN 		Sentiment 	Ratio of positive sentiment words.
NegSEN 		Sentiment 	Ratio of negative sentiment words.

This means I need to store the whole page in the rating app so that I can evaluate machine ratings after getting human ratings.
Finished the UI part of the display, now to change the DB back end. I’m going to start the DB over again since there is so much new stuff.
Cleaning up classes. Moved LoginDialog and CheckboxGroup to utils.
Meeting about the relative merits of StanfordNLP and Rosette. We’ll stick with Stanford for now. I have some questions about how Webhose.io will be handled, but Aaron thinks that it can be filtered in the TAS, with a query string preprocessor.

viztales

Dimension reduction, State, Orientation, and Speed