Phil 4.12.16

7:00 – 6:00 VTX

At the poster session yesterday, I had a nice chat with Yuanyuan about her poster on Supporting Common Ground Development in the Operation Room through Information Display Systems. It turns out that she is looking at information exchange patterns in groups independent of content, which is similar to what I’m looking at. We had a good discussion on group polarization and what might happen if misinformation was introduced into the OR. It turns out that this does happen – if the Attending Physician becomes convinced that, for example, all the instruments have been removed from the patient, the rest of the team can become convinced of this as well and self-reinforce the opinion.
Scanned through Deindividuation Effects on Group Polarization in Computer-Mediated Communication: The Role of Group Identiﬁcation, Public-Self-Awareness, and Perceived Argument Quality. The upshot appears that individuation of participants acts as a drag on group polarization. So the more the information is personalized (and the more that the reader retains self awareness) the less the overall group polarization will move.
I’ve often said that humans innately communicate using stories and maps (Maps are comprehended at 3-4.5 years, Stories from when?). The above would support that stories are more effective ways of promoting ‘star’ information patterns. This is all starting to feel very fractal and self similar at differing scales…
Looking for children’s development of story comprehension led to this MIT PhD Thesis: TOWARD A MODEL OF CHILDREN’S STORY COMPREHENSION. Good lord – What a Committee: Marvin Minsky (thesis supervisor), Professors Joel Moses and Seymour Papert (thesis committee), Jeff Hill, Gerry Sussman, and Terry Winograd.
———————
While reading Deep or Shallow, NLP is Breaking Out, I learned about word2vec. Googling led to Deeplearning4j.org, which has its own word2vec page, among a *lot* of other things. From their home page:
- Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments, rather than as a research tool. Skymind is its commercial support arm.
- Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors.
- By following the instructions on our Quick Start page, you can run your first examples of trained neural nets in minutes.
The word vector alternative is from the Stanford NLP folks: GloVe: Global Vectors for Word Representation. The link also has trained (extracted?) word vectors.
Testing the behavior of query construction and search results. Fixing stupid bugs. Testing more. Lathering, rinsing and repeating.
Some good discussions with Aaron on inferencing and toxicity profiles. Basically taking the outputs and determining correlations with the inputs. Which led to a very long day.

Phil 4.11.16

7:00 – 3:00 VTX

Make MOLST appointment today
Working on the outline
Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology. Done! Meaty.
- Page 907: While the relationship between human-like trust and outcomes is stronger in most cases, system-like trust still matters (see Figure 2). This may be because, in part, Facebook is a tool that helps one do social networking. I think this is important. When a tool has high system trust in a social context, it disappears, while the social aspect comes to the fore. This is true even if the tool is performing hidden tasks that influence the social interaction. This is related to relevance and pertinence, I think. As long as the social cies are presented in a way that feels pertinent (and is reliable?), it’s trusted explicitly as a system and implicitly as a player in the social interaction.
- Clifford Nass, Byron Reeves
  - Computers are Social Actors (and citations)
  - The Media Equation: How people treat computers, television, and new media like real people and places
  - I need to touch on this, but I’m not looking at computers as social actors, I’m looking at how the actions of trusted systems can both monitor and influence individuals (human and otherwise) within connectivist systems.
——————————-
TODOs in GoogleCSE2 buildQueryObjects() and buildNewQueryObjects():
- Modify to take SmartTerm Object
  If there is a valid term, then create the query with po.getNamePermutations()
Finished. Building a new (small!) set of people to test with
Discussed how order affects search results with Andy. Need to think about that
Had an idea about running overspecified queries that return nothing, then backing off term by term until a hit. Running through the permutations that have very small numbers of hits looking for common hits might be a good way of getting good results?

Phil 4.8.16

7:00 – 4:30 VTX

Here’s the new link for Microsoft Cognitive Services
Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
- Page 906: Among the five factors, social presence correlated the highest with humanness for both Facebook (0.48) and Access (0.56). Also noteworthy is that for Access, the correlation between humanness and animation was high (0.51), whereas for Facebook it was not (0.31). Further, dynamism correlated somewhat higher with humanness for Access (0.39) than for Facebook (0.23). These interesting differences show that each technology likely has a general humanness that finds its basis in different factors
  - This leads me to believe that ‘humanness’ is not exactly what they are testing here. Responsiveness can be used to discriminate between different types of tires (WRT cornering), and I don’t think anyone would call one tire more or less human than another. I think this also applies to the animation test. Social presence though makes a lot of sense.
  - It did just strike me that partial least squares structural equation modeling (PLS-SEM – [XLstat’s definition and tutorial) would be a *great* way of evaluating trustworthiness and credibility cues. This should be part of the research part of the study?
- Page 906: Our study 2 findings raise a related question: instead of considering humanness a general construct measured with three items like we did, could one theorize humanness as a second-order construct that is reflected by specific first-order factors that are components of social presence, social affordances,and affordances for sociality? Researchers exploring such a second-order construct could integrate it into a nomological humanness network.
  - Hah! See my second comment for the previous quotation.
- Page 907: Researchers should try to determine differences between these respondents and those who ranked one or both technologies at or above the midpoint. Researchers could also perform a cluster analysis to identify groups with common responses to the humanness items of which a group with low humanness scores might emerge. It could be that the humanness factors identified in study 2 are more or less important in indicating humanness based on cluster membership. It could also be that results from study 1 about the importance of trust type might differ by humanness cluster.
  - This could also be a component of trust/credibility analysis.
- Paper structure thoughts
—————————
Set the persistence.xml to point to the Talend DB
Created the DB
Added users
- Phil
- Aaron
- Margarita
- Andy
- John
Need to figure out how to come up with a list of names/terms/CSEs to start evaluating
Need to test fully functional app, then package and deploy
Need to have VTX get a SemRush account
Conference Call with John, Margarita and Andy about setting up the Crawl for this weekend. John will get back to me with some known bad actors
Need to associate search terms with an optionalString element
Monday’s TODOs in GoogleCSE2 buildQueryObjects() and buildNewQueryObjects():
- Modify to take SmartTerm Object
  If there is a valid term, then create the query with po.getNamePermutations()

Phil 4.7.16

From Communications of the ACM’s Kode Vicious column: To understand the first downside, you should find a friend who works on compilers and ask if he or she has ever looked inside gcc (GNU C compiler), and, after the crying stops and you have bolstered your friend’s spirits, ask if he or she has ever tried to extend the compiler. If you are still friends at that point, your final question should be about submitting patches upstream into this supposedly open source project.

Yup.

7:00 – 4:30 VTX

Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
- Page 894: trust is most often treated as a psychological construct (i.e., trusting
  beliefs). As a psychological construct, trusting beliefs exists apart from any attempt to measure it (Schwab, 1980). Yet knowing what the construct means helps one to measure it properly. Hence, the trusting beliefs construct will influence its components. Third, we used reflective first-order factors because we did not seek to explain variance in trusting beliefs.
  - So trust can be measured using inferential models? As an influence system maybe???
- At 6.2. Study 2: Methodology, page 903. The second study is more related to the credibility cues that people use to determine the humanness of an interface. Not sure if it’s relevant to what I’m working on, but it is interesting to see how they include the second study which follows up on the open questions from the first.
In the paper above, they use something called partial least squares structural equation modeling (PLS-SEM). SmartPLS is a system that uses this, and there’s a presentation on YouTube that shows how it’s used to predict shadow banking. Need to look into this some more as a way of predicting outcomes based on behavior.
———————–
Sent an email to John and Bob about using the new CSEs
Set up the rating app so that Andy and Margarita can use it to create the json characterization. Had a hell of a time getting the executable jar built. The artifact builder in Intellij doesn’t synchronize with the dev process. I was not including jars that were required and getting a “Error: A JNI error has occurred, please check your installation and try again” error on execution. I wound up having to delete the artifact, commit, create a new artifact and then create the jar and executable.
Sent the Rating app as a zip. Not sure if the filters are letting it through. Hey! It works!
Sent Aaron a rant on what I’d like to get the db up and running. Done! Yay!
Finalized REST discussions with Jeremy

Phil 4.6.16

7:00 – 3:30 VTX

Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
- Really nice layout of hypothesis
- Really nice layout of methods
  - They even have the questionnaire!
- At section 4.2. Measurement Items, page 893.
———————-
Mercer Marketplace wants more documentation….
Conference call with Andy and Margarita about flags and rating. Theresa joined in at the end.
Rediscovering all my postgres notes
added a role for a non-super-user who can create databases
created a new googlecse2 database
added postgres jdbc driver
Aaaaand JPA works! Db created and users added. Password checking behaves!
Set up my postgres to accept external access by following these directions
Waiting for Gregg on DB access
Chatted with Jeremy about a RESTful interface to extract flag data. More tomorrow

Phil 4.5.16

7:00 – 4:30 VTX

Had a good discussion with Patrick yesterday. He’s approaching his wheelchair work from a Heideggerian framework, where the controls may be present-at-hand or ready-to-hand. I think those might be frameworks that apply to non-social systems (Hammers, Excel, Search), while social systems more align with being-with. The evaluation of trustworthiness is different. True in a non-social sense is a property of exactness; a straightedge may be true or out-of-true. In a social sense, true is associated with a statement that is in accordance with reality.
While reading Search Engine Agendas in Communications of the ACM, I came upon a mention of Frank Pasquale, who wrote an article on the regulation of Search, given its impact (Federal Search Commission? Access, Fairness, and Accountability in the Law of Search). The point of Search Engine Agendas is that the ranking of political candidates affects people’s perception of them (higher is better) This ties into my thoughts from March 29th. That there are situations where the idea of ordering among pertinent documents may be problematic and further that how users might interact with the ordering process might be instructive.
Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
————————
Added the sites Andy and Margarita found to the blacklist and updated the repo
Theresa has some sites too – in process.
Finished my refactoring party – more debugging than I was expecting
Converted the Excela spreadsheet to JSON and read the whole thing in. Need to do that just for a subsample now.
Added a request from Andy about creating a JSON object for the comments in the flag dismissal field.
Worked with Gregg about setting up the postgres db.

Phil 4.4.16

7:00 – 2:30 VTX

Happy perfect square day.
Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
- Page 833: Ability/competence is the belief that a person has the skills, competencies, and characteristics that enable them to have influence in some specific domain. Benevolence is the belief that a person will want to do good to the trustor aside from an egocentric profit motive. Integrity is the belief that a person adheres to an acceptable set of principles.
- Page 833: It is not as clear, however, whether technologies have volition or can make ethical decisions without being pre-programmed to do so. Because of this issue, some researchers have developed alternative trust belief constructs that do not assume technologies have volition or ethical decision making capability. For example, Lippert and Swiercz (2005) use utility, reliability, and predictiveness, and Söellner, Hoffman, Hoffman, Wacker, and Leimester (2012) use performance, process, and purpose to represent technology-trusting beliefs.
- Page 833: We adopt McKnight et al.’s (2011) conceptualization of system-like trust in a technology’s reliability, functionality, and helpfulness to measure trust in technology because these three attributes were directly derived from, and are corollaries to, the human-like trust attributes of integrity, competence, and benevolence
The discussion on affordances started me thinking about SERPs again. This is kind of related but almost more basic – how users search within documents using find: The Myth of Find: User Behaviour and Attitudes Towards the Basic Search Feature. and the documents that cite (WRT document triage, etc) are also pretty interesting looking.
———————————
Starting up the computers after the weekend at work today, and Skype For Business doesn’t let me log in. Says my email address is bad. And it’s not.
Got the PoiOptionalStrings object integrated and running.
Realized that I need to have a generalized ‘OptionalContent’ class. generalizing from above.

Need to see how JQL works with all this new stuff now.

Fancy JPQL query of the day:

@NamedQuery(name = "PoiObject.getFromOptionalStrings", query = "SELECT p from poi_object p, IN (p.optStringSet) os WHERE os.name = :name AND os.value = :value"),

Should I be doing this as a template? If so, what does the table get named?

Phil 4.1.16

7:15 – 4:15 VTX

Had a bunch of paperwork to do for my folks. All handled now?
Continuing What is Trust? A Conceptual Analysis and An Interdisciplinary Model. Done
- Disposition to Trust. This construct means the extent to which one displays a consistent tendency to be willing to depend on general others across a broad spectrum of situations and persons
  - a general propensity to be willing to depend on others.
  - does not necessarily imply that one believes others to be trustworthy
  - only has a major effect on one’s trust-related behavior when novel
    situations arise, in which the person and situation are unfamiliar
  - Disposition to Trust has two subconstructs, Faith in Humanity and Trusting Stance
    - Faith in Humanity means one assumes others are usually upright, well-meaning, and dependable.
    - Trusting Stance means that, regardless of what one assumes about other people generally, one assumes that one will achieve better outcomes by dealing with people as though they are well-meaning and reliable
  - Because Faith in Humanity relates to assumptions about peoples’ attributes, it is more likely to be an antecedent to Trusting Beliefs (in people) than is Trusting Stance. Trusting Stance may relate more to Trusting Intention, which, depending on the situation, is probably not based wholly on beliefs about the other person.
- Institution-based Trust means one believes the needed conditions are in place to enable one to anticipate a successful outcome in an endeavor or aspect of one’s life
  - This construct comes from the sociology tradition that people can rely on others because of structures, situations, or roles that provide assurances (Affordances???) that things will go well
  - Institution-based Trust has two subconstructs, Structural Assurance and Situational Normality.
    - Structural Assurance means one believes that success is likely because guarantees, contracts, regulations, promises, legal recourse, processes, or procedures are in place that assure success
    - Situational Normality means one believes that success is likely because the situation is normal or favorable. (I think that this comes from very primitive parts of our brains. It can be observed in many animals and may be one of those things that separates infant and adult behavior. If you trust too much, you are likely to get eaten..?)
      - Situational Normality means that a properly ordered setting is likely to facilitate a successful venture. When one believes one’s role and others’ roles in the situation are appropriate and conducive to success, then one has a basis for trusting the people in the situation.
      - likely related to Trusting Beliefs and Trusting Intention. A system developer who feels good about the roles and setting in which they work is likely to have Trusting Beliefs about the people in that setting.
- Trusting Beliefs means one believes (and feels confident in believing) that the other person has one or more traits desirable to one in a situation in which negative consequences are possible.
  - We distinguish four main trusting belief subconstructs, while recognizing that others exist.
    - Trusting Belief-Competence means one believes the other person has the ability or power to do for one what one needs done.
    - Trusting Belief-Benevolence means one believes the other person cares about one and is motivated to act in one’s interest. A benevolent person does not act opportunistically.
    - Trusting Belief-Integrity means one believes the other person makes good faith agreements, tells the truth, and fulfills promises
    - Trusting Belief-Predictability means one believes the other person’s actions (good or bad) are consistent enough that one can forecast them in a given situation
- Trusting Intention means one is willing to depend on, or intends to depend on, the other person in a given task or situation with a feeling of relative security, even though negative consequences are possible
  - Trusting intention subconstructs include Willingness to Depend and Subjective Probability of Depending.
    - Willingness to Depend means one is volitionally prepared to make oneself vulnerable to the other person in a situation by relying on them.
    - Subjective Probability of Depending means the extent to which one forecasts or predicts that one will depend on the other person.
  - Trusting Intention definitions embody five elements synthesized from the trust literature.
    1. The possibility of negative consequences or risk is what makes trust important but problematic.
    2. A readiness to depend or rely on another is central to trusting intention.
    3. A feeling of security means one feels safe, assured, and comfortable (not anxious or fearful) about the prospect of depending on another. Feelings of security reflect the affective side of trusting intention.
    4. Trusting intention is situation-specific.(???? why? Examples?)
    5. Trusting intention involves willingness that is not based on having control or power over the other party. Note that Trusting Intention relates well to the system development power literature because we define it in terms of dependence and control.
- Another limitation relates to Whetten’s (1989) recommendation that Who and Where conditions should be placed around models. Whereas we have assumed that the model applies to any kind of relationship between two people (Who) in any situation (Where), this may not be the case. Empirical research is needed to better define the boundary conditions of the model.
Starting Technology, Humanness, and Trust: Rethinking Trust in Technology, also by D. Harrison McKnight
- Page 881 (Basic?) Social Trust: human-like trust constructs of integrity, ability/competence, and benevolence that researchers have traditionally used to measure interpersonal trust.
- Page 881 (Basic?) System Trust: system-like trust constructs such as reliability,
  functionality, and helpfulness
- Page 881. First, we hypothesize that technologies can differ in humanness. Second, we predict that users will develop trust in the technology differently depending on whether they perceive it as more or less human-like, which will result in human-like trust having a stronger. influence on outcomes for more human-like technologies and system-like trust having a stronger influence on outcomes for more system-like technologies. (Cite Kate Bush Deeper Understanding 1989)
- Here’s the beginning of a thought: What is self-trust? Just thinking about it, it seems to be a sense of the reliability of my future self to do what my present self desires. That’s different from Social Trust, which in the literature is more about integrity, competence and benevolence. It seems closer to system trust in that reliability and functionality are more significant. There are things that I trust that I will do tomorrow: Get up, go to work, exercise if the weather is good enough. But there are also things that I can’t trust myself to do. My future self will almost certainly eat more calories than my current self desires. My grocery shopping behaviors are based around this lack of trust. There are items that I do not bring into my house because I know that they will get eaten (I was going to write that I know that my will is weak around chocolate, but that’s not really it. Or at least, that’s not all of it, or maybe even most of it..). Because (interactive?) information technology is more like a self-amplifier, I wonder if what we think of system trust can be thought of as the trust in ourselves, but the part of ourselves that is more reliable and trustworthy. A search tomorrow will work as well as a search today. Maybe better. And the effectiveness of that search reflect somehow my ability to interact effectively with the external world? This is starting to sound a lot my point of view that living a life in prolonged contact with a compiler changes you in profound ways.
- So what would that mean? I think it’s a reasonable hypothesis to change search results from focusing on pertinence to revelation. This does not mean that the ‘Ten Blue Links’ need to go away. But it does imply that peripheral information could be just as important, so that a less casually polarized worldview might be developed.
Finishing up the CSE version control setup – need to write up the process for confluence – done.
Since I need to be able to now read in the Excella data, I was going to look to Gregg’s ontology as a way to determine the table structure. But it’s way too big and nested. In a Person’s description includes a reference to a complete organization, activities, charges, arrests, and it doesn’t even have room for nice things yet (will we have co-authors?). Anyway, To avoid this, I’m going to have basic person characteristics with an associated StringMaps, NumMaps and DateMaps. Anything that’s not recognized as a column gets added to that. Need to see how persistence will work with that in some testing first.
Got the code working. JPA 2 says you should be able to build a map entirely without annotations, but I couldn’t get it to work. Modified JsonLoadable so that it goes through the Json Object and anything that is not a member of the current class is added to HashMaps of PoiOptionalStrings. It should be very straightforward to extend to number and date types. Probably worth doing?

Phil 3.31.16

7:00 – 4:00 VTX

Starting on What is Trust? A Conceptual Analysis and An Interdisciplinary Model.
- D. Harrison Mcknight
  - This looks good too – Technology, Humanness, and Trust: Rethinking Trust in Technology
- Norman L. Chervany
- Table 1 on page 829 is interesting. In my previous readings, trustworthiness is based primarily on COMPETENCE. Journalism also uses INTEGRITY. I think that trust in your GPS is Reliable and Dependable?
- I think they are talking about this from an OO inheritance perspective?
Starting to set up the key and sitelist repo
It turns out that you can export xml configuration of the CSE and the annotations for that CSE. From webapps.stackexchange.com:
- Go to your custom search engine
- Click on Advanced tab under Control Panel. The url looks something likehttp://www.google.com/cse/panel/advanced?…..
- Look for Download Annotations in the main pane and download in your preferred format
We can only have a total of 5k annotations. That’s not a problem – yet.

All the files are set up and transferred. New search engines are

ONLY_COM = "cx=006834724223295726872:k0pebqyqa8m"
ONLY_EDU = "cx=006834724223295726872:gded1dvdt94"
ONLY_GOV = "cx=006834724223295726872:ydjrxqpedqq"
ONLY_ORG = "cx=006834724223295726872:lsgxnigrfme"
ONLY_US = "cx=006834724223295726872:dw0n0_hai6s"

Found a more credible source than boardactions.com (possibly just for New York state? But it has VA records..). Anyway, not only does it have a nice listing, it also has a pdf of the relevant board order. Which means we can build a good legal languagge model. Very nice: http://w3.nyhealth.gov/opmc/factions.nsf/physiciansearch?openform
Need to rethink the PoiObject class to be more general.

Phil 3.30.16

7:00 – 3:30 VTX

So I was starting The spreading of misinformation online, but it was discussing more of the same. This feels a lot like saturation. My thoughts are coalescing around the idea of the difference between trusted and trustworthy interactions in computer-mediated systems. The anonymous citizen journalism concept becomes a unifying thought experiment that can be used to show the potential strengths and weaknesses of particular concepts.
The last piece I think I need is what is trust from a developmental perspective. The initial google scholar search of “trust development” didn’t bring up exactly what I want (object permanence maybe?), but it did provide this: Effects of four computer-mediated communications channels on trust development. The citations provided this: The mechanics of trust: A framework for research and design In International Journal of Human – Computer Studies 2005 62(3):381-422. This one seems different enough to look through carefully.
Ok, I think I found what I’m looking for: The ‘like me’ framework for recognizing and becoming an intentional agent. I think I’ll read The Mechanics of Trust first, them ‘like me’ second.
Starting The mechanics of trust: A framework for research and design.
- It does seem to be focused on how effectively a system transmits(?) cues that support well-placed trust. I think that we tend to confuse the trust we place in the channel vs the trust we place in the entity at the other end of the channel. And these lines are not clearly drawn:
  - In IR, we trust that the search engine is providing us with the relevant documents we seek. People trust Google more than Bing because the results are more pertinent. Does this trust carry over into the documents retrieved? Probably, though I can’t find a study that does this. (It would be pretty easy to do with the Google Custom Search Engine API + noise)
  - In GPS the trust in the system is very high, even though it is synthesizing information from retrieved and processed sources (maps, DTED, etc) that could in turn be wrong. Here though, the entity we are interacting with is clearly the GPS, not the mapmakers.
  - Skype, on the other hand is essentially transparent when it’s working right. And that ‘working right’ is a kind of conditional trust in the system that has no effect on out evaluation of the trustworthiness of the person that we are interacting with at the other end of the channel.
  - So what does that mean in the context of our imaginary citizen journalists?
    - They are anonymized. We have no names. We probably don’t even have the exact words as written. These are the same issues that newspapers face when dealing with anonymous sources. And in this case, it’s reasonable to assume that the newspaper is the entity that is attempting to get us to place our trust in it.
      - Reporters as proxies
      - Additional perspectives – images, videos etc.
      - Stories that match reader’s experiences, so that trust can be evaluated.
      - What else?
- One of the cited papers is What is Trust? A Conceptual Analysis and An Interdisciplinary Model. Quickly scanning through it, I found this on page 830-831: Garfinkel found in natural experiments that people don’t trust others when things “go weird,” that is, when they face inexplicable, abnormal situations. For example, one subject told the experimenter he had a flat tire on the way to work. The experimenter responded, “What do you mean, you had a flat tire?” The subject replied, in a hostile way, “What do you mean? What do you mean? A flat tire is a flat tire. That is what I meant. Nothing special. What a crazy question!” At this point, trust between them broke down because the illogical question produced an abnormal situation.
  - I think that this is core. Trust is tied to normalicy, and probably builds out from there.
Prepping for the sprint planning session.
As far as the OMG work, I think the following
- Set up version controlled system for Google CSE keys and url exclude lists, including a way to submit an url for inclusion in an exclusion list.
- Add PDF parsing and storing to Crawl Service
- Add MSWord parsing and storing to Crawl Service
- Add MSExcel parsing and storing to CrawlService
- Add backlink calculation and storing to CrawlService – this is looking like a good way to increase pertinence within a return, particularly with respect to the matched-name wrong-person condition.
For the machine learning work
- Get DB up, accessible and on a backup schedule
- Set up deployment infrastructure for Rating App.
- Small scale test of Rating App, with refinement and development of manual
- Accumulate corpus
- Test corpus in WEKA
  - Translator from DB to WEKA format
  - Construction of training data sets
  - Tests and evaluations
  - Report
As far as my research, it’s more vague, so I’m just going to free-associate a bit here.

First, I just need to write up the proposal, and since that’s where my head is at right now, it’s hard to come up with specifics. One of the overall goals is to build a search result interface that ‘nudges’ users from bubble patterns into star patterns.

Secondly, it’s my current belief is that this interface could be along the lines of the word cloud plus slider display interface I’ve discussed with you before. On the back end, there’s a topic extraction/document classification system that builds a graph database that is used for:
- In my case, placing the search results in a context of discussion vs information (DvI) along the axis’ defined by the topics in the search results. The user can select a topic (which then shows the DvI graphs and where the current search falls on those spectrums). Once a topic has been selected, the user can adjust the weights on subsequent topics, causing the result list to reorder and the position on the DvI graph to move.
- In EIT’s case (1) predictions and alerts and (2) for the user interface [and I think this can be pitched as the gamified display]. For example, I think there are many cases where conditions for making a judgment (medical best practices or behavior related) may be ambiguous. Using such an interface could allow a user to explore and resolve such ambiguity. The nice thing is that in the EIT case, the data is (potentially) more structured and granular, allowing a more fluid analysis (e.g. a bad manager indirectly affecting performance or combined conditions such as opiate addiction + newborn).

Phil 3.29.16

7:00 – 4:00 VTX

Continuing The Law of Group Polarization – done!
- Group polarization: A critical review and meta-analysis. Looks like a more rigorous version of TLoGP. It’s available in the library as a PDF if needed.
- Page 194: In short, the external materials and expert panels shift the argument pool available to the deliberators and are also likely to have effects on social influence.
  - The way I read this, external trusted sources can shift the poles if they are incorporated into the discussion. Think about how a GPS affects wayfinding arguments. If search interfaces are modified such that they show the range of opinion and the position of the ‘Ten Blue Links’ within that range then, given its high system trust, we might expect individuals to adjust their belief trajectories based on their understanding of the pole’s position given the larger information landscape.
- Page 195: There are large lessons here about appropriate institutional design for
  deliberating bodies. Group polarization can be heightened, diminished, and
  possibly even eliminated with seemingly small alterations in institutional
  arrangements.
  - Now substitute system for institutional. Although I would contend that search is an institution, given its reach. Also, presenting a better mechanism for placing the returned information in a context allows for ‘nudging‘ cues, which seem to work better than more ‘authoritarian’ systems.
Starting The spreading of misinformation online
Before continuing on backlinks, I spent some tom,e looking at the Microsoft Oxford system. LUIS is interesting, though I’m not sure exactly how to take advantage of it yet. I think this can be a chatbot construction kit? The WebLM system looks more immediately useful, kinda like AlchemyNLP. Maybe cheaper? You need a key, which you get here. And this is different from the Academic Knowledge API, which is also an Oxford project, but not listed on the Oxford site.
Got the SrBacklinkObject persisting
Adding backlinks to the ResultItemObject2 class. Whoops! Forgot that you have to set both relationships in a many-to-one:

curResult.addBacklink(bo);
bo.setResultObj(curResult);
needed to split off the protocol from the curResult.link and add it back to the curResult.displayLink to get backlinks
Done and working. Kinda like the fallback strategy.

Phil 3.28.16

7:00 – 2:30 VTX

Took some notes on the MS Tay fiasco yesterday. Need to ping Peter Lee and see if I can get anywhere talking about Group Polarization Theory. Done
Microsoft Research Open source for academics
Microsoft Language Understanding Intelligent Service (beta) LUIS
Veracity Roadmap:Is Big Data Objective, Truthful and Credible?
Continuing The Law of Group Polarization
- Page 193: The constraints of time and attention call for limits to heterogeneity; and-a separate point-for good deliberation to take place, some views are properly placed off the table, simply because time is limited and they are so invidious, implausible, or both. This point might seem to create a final conundrum: To know what points of view should be represented in any group deliberation, it is important to have a good sense of the substantive issues involved, indeed a sufficiently good sense as to generate judgments about what points of view must be included and excluded. But if we already know that, why should we not proceed directly to the merits? If we already know that, before deliberation occurs, does deliberation have any point at all?
- The answer is that we often do know enough to know which views count as reasonable, without knowing which view counts as right, and this point is sufficient to allow people to construct deliberative processes that should correct for the most serious problems potentially created by group deliberation. What is necessary is not to allow every view to be heard, but to ensure that no single view is so widely heard, and reinforced, that people are unable to engage in critical evaluation of the reasonable competitors.
- At E. THE DELIBERATIVE OPINION POLL: A CONTRAST
Now that I’ve gotten the queries behaving, working on the SemRushIO and BacklinkObject
- Added configuration file
- Nice to know. If SemRush finds nothing, it returns
  ERROR 50 :: NOTHING FOUND so, we can do two passes; if the specific result returns nothing, we can go to the root.
- Built up the SemRush base class based on the JsonLoadable
- Built the SrBacklinkObject
- Loading the object successfully.
Fika

Phil 3.26.16

Peter Lee – Corporate Vice President, Microsoft Research

Learning from Tay’s introduction

Phil 3.25.16

7:30 – 3:30 VTX

Saw The Who last night and got into bed after 1:00am. Sleeeeeeeeeepy.
Still browsing the team sensemaking paper over breakfast. There are some very similar goals. In group polarization, the awareness of where the boundaries of the discussion help to determine how the average viewpoint moves. Current search returns no context on where the results lie on those axis. Translucency in search could allow users to see ‘meta information’ about the search results that they have and where the results lie in that information space, while also providing a means to adjust the position in that space in a way that is not intrusive. Or something like that.
Group polarization works on chatbots. There is something really interesting here about measuring polarization. Not quite sure what exactly yet.
Continuing The Law of Group Polarization
- Phrase of the day ‘Skewed Argument Pools‘
- Page 187: And shifts toward more in the way of enclave deliberation will increase society’s aggregate “argument pool,” and hence enrich the marketplace of ideas, while also increasing extremism, fragmentation, hostility, and even violence.
- First, it’s a neat thought to think of an interwoven pattern of Bubbles and Stars. Second, I think the continuum to be most interested is the one from most bubble-ish to most star-ish for a given topic. Now that, in and of itself is a big document classification/topic extraction problem, but I would submit that being able to visualize what that search result could look like could help to produce useful work in that direction. And there are proxies that can be used intact, such as papers. Bubbles are papers and topics that point at each other a lot, for example.
- Page 187: It is important to ensure social spaces for deliberation by like minded persons, but it is equally important to ensure that members of the relevant groups are not isolated from conversation with people having quite different views.
- ^^^Translucency^^^
- Page 187: The most important point here is that those who emphasize the ideals associated with deliberative democracy tend to emphasize its preconditions, which include political equality, an absence of strategic behavior, full information, and the goal of “reaching understanding (pp. 52-94).”
- At Page 189: B. THE VIRTUES OF HETEROGENEITY
Scrum – some big changes coming?
11:00 all hands

Working on backlink object

Started query generator, SemRushIO.java. After some hiccups in getting the format of the query results right, the generation part is working. The reader should be pretty straightforward, though a little more complex/brittle than reading JSON. Here’s an example return:

page_score;source_title;source_url;target_url;anchor;external_num;internal_num;first_seen;last_seen
1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;http://visitarizona.com/places-to-visit/northern-arizona/monument-valley;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;29;116;1452309348;1452309348
1;"""New Concertina Wire"" Fencing Around Closed Nevada Prison And Guard In Tower - Are Closed Prisons Going To Be Used As ""Fema Camps""? - Veteran Who Took Photos Followed By White Van";http://allnewspipeline.com/Veteran_Notes_New_Wire_Fencing_Nevada.php;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;14;23;1443887704;1452368321
1;Evening Meeting. - Kirkham & Rural Fylde;http://rotary-ribi.org/clubs/page.php?ClubID=1161&PgID=514041;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;170;59;1444332718;1457694435
1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;http://arizodiac.com/places-to-visit/northern-arizona/monument-valley;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;29;115;1447807529;1457622986
1;Visit AZ – Vacation Information for Arizona, the Grand Canyon State | Arizona Office of Tourism;http://www.arizodiac.com/places-to-visit/northern-arizona/monument-valley;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;29;113;1454861531;1457744933
1;UNIST - Sajun.org;http://sajun.org/index.php?diff=prev&oldid=2901321&printable=yes&title=UNIST;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;60;48;1448012851;1454851002
1;1 عدد تمبر جان مون نت - دیپلمات - جمهوری فدرال آلمان 1977;http://tambrestan.com/-/5689-7-1983.html;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;27;1308;1452387237;1452387237
1;About Puslinch Lake - Calmwaters Cottage & Fly Fishing;http://calmwaterscottage.ca/1337-2/;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;42;12;1454519315;1457440491
1;PEABODY 100 - demetrioskritikos.com;http://demetrioskritikos.com/peabody100/;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;32;16;1454518621;1454518621
1;PEABODY SPORTS - demetrioskritikos.com;http://demetrioskritikos.com/peabody-sports/;https://en.wikipedia.org/wiki/Geographic_coordinate_system;Coordinates;21;16;1454518773;1454518773

SemRushIO will create the backlink object
- Calls the service, using a default or read-in key
- Is fired the same time the page source is loaded (in GuiVars.loadNextPage)
- Creates a BackLinkObject data from SEMRush includes:
  - page_score
  - source_title
  - source_url
  - target_url
  - anchor (the text in the source)
  - external_num
  - internal_num
  - first_seen
  - last_seen
ResultItemObject2 changes
- Set of BackLinkObjects

Phil 3.24.16

7:00 – 10:00, 11:00 – 3:00 VTX

Was going to continue The Law of Group Polarization, but got sucked into the following. On a related note, I peeked at the group sensemaking paper from CSCW and realized that they are dealing with group polarization issues.
Soooooooooo, I went back to check the links that the google search “link:http://dotearth.blogs.nytimes.com” brings up. In looking at the pages (mostly other blog-like sites), the link to dotearth is almost always in the blogroll list that’s off to the side on many of these sites. For example look at the lower right on climatecentral.org, and you’ll see the link.
I think this makes sense. These are the generic pages that point to other generic pages. So I went back to Google and searched for ‘Paul Krugman blog‘ and then looked for the oldest post that I could find in the result, which was this one from January 16. Top ratings means that it has to be linked to a lot, so I tried “link:krugman.blogs.nytimes.com/2016/01/23/how-to-make-donald-trump-president/“. Alas, that doesn’t return anything, though “link:krugman.blogs.nytimes.com” does.
So I went to the the Wikipedia most referenced pages page. Top ranked was Geographic coordinate system, which has over 600k inbound links. But –
- link:en.wikipedia.org/wiki/Geographic_coordinate_system doesn’t return anything
- And even more disturbing, link:wikipedia.org doesn’t return anything either. Clearly I’m doing something wrong.
Apparently, this is Google being coy. Searching for backlinks can be expensive. Moz has plans that start at $500/month. Bing also seems to have something with an API. Starting to check that out.
- Added philfeldman.com to my bing webmaster profile. Had to add BingSiteAuth.xml to the site.
- Nope, looks like it’s just the verified pages

Looking at SEMrush. Pretty straightforward and $15 buys you 7,500 lines of results.

Here’s the REST-ish API

Here’s the first format I’ve tried:

http://api.semrush.com/analytics/v1/?key=xxxxxxxxxxxxxxxxxxxxxx&target=boardsanctions.com/&type=backlinks&target_type=root_domain&display_sort=page_score_desc&display_limit=10

The first thing I tried out was on my angular blog entry, and this is what comes back:

page_score;source_title;source_url;target_url;anchor;external_num;internal_num;first_seen;last_seen
1;Philip Feldman;http://philfeldman.com/resume.html;https://phifel.wordpress.com/;blog;7;2;1435698192;1452178691
1;Phil Feldman Resume (WebGL);http://philfeldman.com/;https://phifel.wordpress.com/;My Primary Blog;15;4;1424207638;1452178080
1;Phil Feldman Resume (WebGL);http://www.philfeldman.com/;https://phifel.wordpress.com/;My Primary Blog;15;4;1435689880;1452178091

Pretty good! Very clean. Then I tried boardsanctions.com:

page_score;source_title;source_url;target_url;anchor;external_num;internal_num;first_seen;last_seen
0;Plastic Surgery - Avoiding The Nightmare Case - Social Gaming Wiki FR;http://fr.socialgamingwiki.com/index.php/Plastic_Surgery_-_Avoiding_The_Nightmare_Case;http://boardsanctions.com/;Georgia Medical Board Actions;4;32;1454582397;1454582397
0;Plastic Surgeon - Advice To Allow You Choose – TFC;http://www.tvfc.de/index.php?printable=yes&title=Plastic_Surgeon_-_Advice_To_Allow_You_Choose;http://boardsanctions.com/;Doctors to avoid;2;28;1452634501;1452634501
0;Finding A Plastic Surgeon In Your Area – TheorieWiki;http://theoriewiki.org/index.php?oldid=8721&title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Ohio Medical Board Actions;4;40;1451297137;1451297137
0;How To Prepare For Your Breast Augmentation – TheorieWiki;http://theoriewiki.org/index.php?title=How_To_Prepare_For_Your_Breast_Augmentation;http://boardsanctions.com/;Doctor Complaints;4;33;1444916428;1453210146
0;Finding A Plastic Surgeon In Your Area: Unterschied zwischen den Versionen – TheorieWiki;http://theoriewiki.org/index.php?diff=8723&oldid=8721&title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Florida Medical Board Sanctions;4;39;1457400844;1457400844
0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;NC Medical Board Actions;5;35;1448297485;1458043290
0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;http://boardsanctions.com/;5;35;1448297485;1458043290
0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?printable=yes&title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;NC Medical Board Actions;5;30;1456257160;1457931212
0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?printable=yes&title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;http://boardsanctions.com/;5;30;1456257160;1457931212
0;Finding A Plastic Surgeon In Your Area – TheorieWiki;http://theoriewiki.org/index.php?title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Florida Medical Board Sanctions;4;33;1443858328;1457622408

Note that it’s a good thing I’m limiting the results to 10! The second thing to notice is every one of these links is SEO garbage. This one is my favorite. Now, this is ordered according to rank (however that’s calculated) and maybe there are better ways to order the results, but this does make me nervous about using backlinks without some checking. Maybe cosine similarity?
So the last thing, if we want to spend some money is to use the common crawl for backlinks. Not sure if it would make any difference, but there would be more insight. As an example, there’s wikireverse which did exactly that.

viztales

Dimension reduction, State, Orientation, and Speed

Phil 4.12.16

Phil 4.11.16

Phil 4.8.16

Phil 4.7.16

Phil 4.6.16

Phil 4.5.16

Phil 4.4.16

Phil 4.1.16

Phil 3.31.16

Phil 3.30.16

Phil 3.29.16

Phil 3.28.16

Phil 3.26.16

Phil 3.25.16

Phil 3.24.16