Category Archives: Ranking

Phil 10.30.18

7:00 – 3:30 ASRC PhD

  • Search as embodies in the “Ten Blue Links” meets the requirements of a Parrow “Normal Accident”
    • The search results are densely connected. That’s how PageRank works. Even latent connections matter.
    • The change in popularity of a page rapidly affects the rank. So the connections are stiff
    • The relationships of the returned links both to each other and to the broader information landscape in general is hidden.
    • An additional density and stiffness issue is that everyone uses Google, so there is a dense, stiff connection between the search engine and the population of users
  • Write up something about how
    • ML can make maps, which decrease the likelihood of IR contributing to normal accidents
    • AI can use these maps to understand the shape of human belief space, and where the positive regions and dangerous sinks are.
  • Two measures for maps are the concepts or Range and length. Range is the distance that a trajectory can be placed on the map and remain contiguous. Length is the total distance that a trajectory travels, independent of the map its placed on.
  • Write up the basic algorithm of ML to map production
    • Take a set of trajectories that are known to be in the same belief region (why JuryRoom is needed) as the input
    • Generate an N-dimensional coordinate frame that best preserves length over the greatest range.
    • What is used as the basis for the trajectory may matter. The range (at a minimum), can go from letters to high-level topics. I think any map reconstruction based on letters would be a tangle, with clumps around TH, ER, ON, and AN. At the other end, an all-encompassing meta-topic, like WORDS would be a single, accurate, but useless single point. So the map reconstruction will become possible somewhere between these two extremes.
  • The Nietzsche text is pretty good. In particular, check out the way the sentences form based on the seed  “s when one is being cursed.
    • the fact that the spirit of the spirit of the body and still the stands of the world
    • the fact that the last is a prostion of the conceal the investion, there is our grust
    • the fact them strongests! it is incoke when it is liuderan of human particiay
    • the fact that she could as eudop bkems to overcore and dogmofuld
    • In this case, the first 2-3 words are the same, and random, semi-structured text. That’s promising, since the compare would be on the seed plus the generated text.
  • Today, see how fast a “Shining” (All work and no play makes Jack a dull boy.) text can be learned and then try each keyword as a start. As we move through the sentence, the probability of the next words should change.
    • Generate the text set
    • Train the Nietzsche model on the new text. Done. Here are examples with one epoch and a batch size of 32, with a temperature of 1.0:
      ----- diversity: 0.2
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
      ----- diversity: 0.5
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
      ----- diversity: 1.0
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a dull boy anl wory and no play makes jand no play makes jack a dull boy all work and no play makes jack a 
      ----- diversity: 1.2
      ----- Generating with seed: "es jack a 
      dull boy all work and no play"
      es jack a 
      dull boy all work and no play makes jack a pull boy all work and no play makes jack andull boy all work and no play makes jack a dull work and no play makes jack andull

      Note that the errors start with a temperature of 1.0 or greater

    • Rewrite the last part of the code to generate text based on each word in the sentence.
      • So I tried that and got gobbledygook. The issues is that the prediction only works on waveform-sized chunks. To verify this, I created a seed from the input text, truncating it to maxlen (20 in this case):
        sentence = "all work and no play makes jack a dull boy"[:maxlen]

        That worked, but it means that the character-based approach isn’t going to work

        ----- temperature: 0.2
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
        ----- temperature: 0.5
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes 
        ----- temperature: 1.0
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy pllwwork wnd no play makes 
        ----- temperature: 1.2
        ----- Generating with seed: [all work and no play]
        all work and no play makes jack a dull boy all work and no play makes jack a dull boy all work and no play makes


    • Based on this result and the ensuing chat with Aaron, we’re going to revisit the whole LSTM with numbers and build out a process that will support words instead of characters.
  • Looking for CMAC models, I found Self Organizing Feature Maps at
  • Here’s How Much Bots Drive Conversation During News Events
    • Late last week, about 60 percent of the conversation was driven by likely bots. Over the weekend, even as the conversation about the caravan was overshadowed by more recent tragedies, bots were still driving nearly 40 percent of the caravan conversation on Twitter. That’s according to an assessment by Robhat Labs, a startup founded by two UC Berkeley students that builds tools to detect bots online. The team’s first product, a Chrome extension called, allows users to see which accounts in their Twitter timelines are most likely bots. Now it’s launching a new tool aimed at news organizations called, which allows journalists to see how much bot activity there is across an entire topic or hashtag

Phil 2.21.18

7:00 – 6:00 ASRC MKT

  • Wow – I’m going to the Tensorflow Summit! Need to get a hotel.
  • Dimension reduction + velocity in this thread
  • Global Pose Estimation with an Attention-based Recurrent Network
    • The ability for an agent to localize itself within an environment is crucial for many real-world applications. For unknown environments, Simultaneous Localization and Mapping (SLAM) enables incremental and concurrent building of and localizing within a map. We present a new, differentiable architecture, Neural Graph Optimizer, progressing towards a complete neural network solution for SLAM by designing a system composed of a local pose estimation model, a novel pose selection module, and a novel graph optimization process. The entire architecture is trained in an end-to-end fashion, enabling the network to automatically learn domain-specific features relevant to the visual odometry and avoid the involved process of feature engineering. We demonstrate the effectiveness of our system on a simulated 2D maze and the 3D ViZ-Doom environment.
  •  Slides
    • Location
    • Orientation
    • Velocity
    • IR context -> Sociocultural context
  • Writing Fika. Make a few printouts of the abstract
    • It kinda happened. W
  • Write up LMN4A2P thoughts. Took the following and put them in a LMN4A2P roadmap document in Google Docs
    • Storing a corpora (raw text, BoW, TF-IDF, Matrix)
      • Uploading from file
      • Uploading from link/crawl
      • Corpora labeling and exploring
    • Index with ElasticSearch
    • Production of word vectors or ‘effigy documents’
    • Effigy search using Google CSE for public documents that are similar
      • General
      • Site-specific
      • Semantic (Academic, etc)
    • Search page
      • Lists (reweightable) or terms and documents
      • Cluster-based map (pan/zoom/search)
  • I’m as enthusiastic about the future of AI as (almost) anyone, but I would estimate I’ve created 1000X more value from careful manual analysis of a few high quality data sets than I have from all the fancy ML models I’ve trained combined. (Thread by Sean Taylor on Twitter, 8:33 Feb 19, 2018)
  • Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.
  • Done with Angular fundamentals. reDirectTo isn’t working though…
    • zone.js:405 Unhandled Promise rejection: Invalid configuration of route '': redirectTo and component cannot be used together ; Zone: <root> ; Task: Promise.then ; Value: Error: Invalid configuration of route '': redirectTo and component cannot be used together

Phil 1.14.18

Pondering what a good HI-LO game would be for a presentation:

  • Ask the audience to choose A or B, based on what they think the most likely answer is. Show of hands, B, then A.
  • Describe the H/L chart and cooperative game theory, and how traditional game theory can’t account for why LL makes less sense to us than HH
  • fig-1-2x

Phil 9.21.17

6:00 – 10:30, 1:00 – 6:00 ASRC MKT

  • I think there is a difference between exploring, a deliberate exposing to things unknown and serendipity, an accidental encounter with the unknown. In the first case, the mind is prepared for the situation. In the second, the mind needs to be receptive to the serendipity. I think that design may matter a lot here. A serendipitous result low on a list may not have the same impact as a point on a map or a line in a story.
  • Oxford English dictionary’’s definitions of:
    • serendipity: “the faculty of making happy and unexpected discoveries by accident”.  
    • explore:  An act of exploring an unfamiliar place; an exploration, an excursion. 
    • discoverTo disclose, reveal, etc., to others or (later) oneself; to find out. 
    • sagacity: Acuteness of mental discernment; aptitude for investigation or discovery; keenness and soundness of judgement in the estimation of persons and conditions, and in the adaptation of means to ends; penetration, shrewdness.
    • synchronicity: the phenomenon of events which coincide in time and appear meaningfully related but have no discoverable causal connection.
  • Skimming these
    • The bohemian bookshelf: supporting serendipitous book discoveries through information visualization
      • A ThudtU HinrichsS Carpendale
      • Serendipity, a trigger of exciting discoveries when we least expect it, is currently being discussed as an often neglected but still important factor in information seeking processes, research, and ideation. In this paper we explore serendipity as an information visualization goal. In particular, we introduce the Bohemian Bookshelf visualization that aims to support serendipitous exploration of digital book collections. The Bohemian Bookshelf consists of five interlinked visualizations, each representing a unique (over)view of the collection. It facilitates serendipitous discoveries by (1) offering multiple access points by providing visualizations of different perspectives on the book collection, (2) enticing curiosity through abstract, metaphorical, and visually distinct representations of the collection, (3) highlighting alternate adjacencies between books, (4) providing multiple pathways for exploring the data collection in a flexible way, (5) supporting immediate previews of books, and (6) enabling a playful approach to information exploration. Our design goals and their exploration through the Bohemian Bookshelf visualization opens up a discussion on how to promote serendipity through information visualization.
      • six design goals that we have derived for promoting serendipitous discoveries through information visualization.
      • Austin coined the term altamirage that describes serendipitous discoveries as a result of chance paired with individual traits of the exploring person [2, 29].
      • This is closely related to the notion of synchronicity where related ideas may manifest as simultaneous occurrences that seem acausal but still meaningful [29].
      • The prevalence of these ideas of chance, fortuity, and coincidence in the discussion around serendipity has led to a tendency to trivialize this complex concept by assuming that serendipity can be supported simply through the introduction of randomness.
      • The design of the Bohemian Bookshelf offers multiple pathways through the book collection by (1) providing multiple interactive overviews of the book collection that can guide the information seeker into different and interesting directions, (2) the presentation of adjacent data that can act as visual signposts providing alternatives for the viewer to move through the dataset by following up on related books, and (3) emphasizing cross visualization attributes by mutual highlighting as in coordinated views [3, 7]
      • multiple pathways through the book collection that can provide guidance in a serendipitous way. The visual overviews can provide one way of exploring books. For instance, visitors can systematically browse through all books of their favourite colour and, in this way, possibly encounter books that are of interest to them but that they did not think of to search for directly. Furthermore, emphasizing adjacent books can be considered as visual signposts. For instance, following up on highlighted books in the Book Pile is likely to rapidly guide people serendipitously to different topical areas of the book collection. As a third approach to multiple pathways, all visualizations of the Bohemian Bookshelf are interlinked with each other. Therefore, every selection of a book in one visualization can be considered a cross road to the other visualizations that highlight this selection as well in their particular context.
      • We deliberately designed the Bohemian Bookshelf to provide multiple overviews of the entire book collection to provide opportunities to discover unexpected trends and relations within the collection.
    • Discovery is never by chance: designing for (un)serendipity – finished. Good paper!
      • P AndréJ TeevanST Dumais
      • Serendipity has a long tradition in the history of science as having played a key role in many significant discoveries. Computer scientists, valuing the role of serendipity in discovery, have attempted to design systems that encourage serendipity. However, that research has focused primarily on only one aspect of serendipity: that of chance encounters. In reality, for serendipity to be valuable chance encounters must be synthesized into insight. In this paper we show, through a formal consideration of serendipity and analysis of how various systems have seized on attributes of interpreting serendipity, that there is a richer space for design to support serendipitous creativity, innovation and discovery than has been tapped to date. We discuss how ideas might be encoded to be shared or discovered by “association-hunting” agents. We propose considering not only the inventor‘s role in perceiving serendipity, but also how that inventor‘s perception may be enhanced to increase the opportunity for serendipity. We explore the role of environment and how we can better enable serendipitous discoveries to find a home more readily and immediately.
        • there is “no discovery of a thing you are looking for
        • However, most systems designed to induce or facilitate serendipity have focused on the first aspect, subtly encouraging chance encounters, while ignoring the second part, making use of those encounters in a productive way.
        • Especially, however, we want to offer approaches to get at
          the desired effect of serendipity: insight
        • For us, serendipity is:
          1. the finding of unexpected information (relevant to the goal or not) while engaged in any information activity,
          2. the making of an intellectual leap of understanding with that information to arrive at an insight
        • In our study, a number of participants remarked that they thought of themselves as ‘serendipitous’, and were surprised to find no instances of it in their search behaviour.
          • This is because exploring is not serendipity. See first point above
        • Click entropy, a direct measure of how varied the result clicks are for the query, was found to be significant. That is, a positive correlation between entropy and the number of potentially serendipitous results suggests that people may have clicked varied results not just because they could not find what they wanted, but because they considered more things interesting, or were more willing to go off at a tangent.
        • Arguably however, almost all visualization systems are designed to support such a goal: identifying interesting, but unknown, trends or patterns in data that would not have been visible otherwise.
        • Erdelez‘s [12] so-called ‘super-encounterers’, encountering unexpected information on a regular basis, even counting on it as an important element in information acquisition.
        • Instead of treating serendipity as arcane, mysterious and accidental, we embrace the ability of computers to help us perceive connections and opportunities in various pieces of information
        • presenting such information to users has the potential to increase the overall information the user must interact with. This can lead to two problems: distraction or overload, and the negative consequences of incorrect or problematic recommendations or assumptions
        • It is widely acknowledged that serendipitous discoveries are preceded by a period of preparation and incubation [7]. They are, in that respect, not as ‗serendipitous‘ as we might expect, being the product of mental preparation as well as of an open and questioning mind
        • The challenge from a design perspective may not necessarily be discovering domain literature opportunities, but defining mechanisms for presenting these suggestions in ways that are effective for the investigator. Further to creating a reading list is defining the space to deliver them opportunistically
        • This idea again supposes a form of common language model, a way to express interest or expertise in particular areas, and a way to search for results.
        • In this spectrum, we have also demonstrated that computer science has spent most of it’s design effort perhaps overly focused on trying to create insight (effect of serendipity), by recreating the cause (chance), rather than on, for instance, increasing the rate and accuracy of proposed candidates for serendipitous insight, or developing domain expertise
  • Ordered this, too: Information Visualization: Beyond the Horizon. Has quite a bit on maps that’s going to be needed in the implications for design section
  • What is a Diagram?
    • This paper responds to renewed interest in the centuries old question of what is a diagram. Existing status of our understanding of diagrams is seen as unsatisfactory and confusing. This paper responds to this by proposing a framework for understanding diagrams based on symbolic and spatial mapping. The framework deals with some complex problems any useful definition of diagrams has to deal with. These problems are the variety of diagrams, meaningful dynamics of diagramming, handling change in diagrams in a well formed way, and all of this in the context of semantically mixed diagrams. A brief description of the framework is given discussing how it addresses the problems.
  • Supporting serendipity: Using ambient intelligence to augment user exploration for data mining and web browsing.
    • Has some very Research-Browser-ish bits in it
    • an agent-based system to support internet browsing. It models the user‘s behaviour to look ahead at linked web pages and their word frequencies, using a Bayesian approach to determine relevance. It then colours links on the page depending on their relevance. In evaluation, the colouring was seen as successful, with people tending to follow the strongly advised links most of the time.
  • Retroactive answering of search queries
    • Major search engines currently use the history of a user’s actions (e.g., queries, clicks) to personalize search results. In this paper, we present a new personalized service, query-specific web recommendations (QSRs), that retroactively answers queries from a user’s history as new results arise. The QSR system addresses two important subproblems with applications beyond the system itself: (1) Automatic identification of queries in a user’s history that represent standing interests and unfulfilled needs. (2) Effective detection of interesting new results to these queries. We develop a variety of heuristics and algorithms to address these problems, and evaluate them through a study of Google history users. Our results strongly motivate the need for automatic detection of standing interests from a user’s history, and identifies the algorithms that are most useful in doing so. Our results also identify the algorithms, some which are counter-intuitive, that are most useful in identifying interesting new results for past queries, allowing us to achieve very high precision over our data set.

Phil 6.2.16

7:00 – 5:00 VTX

  • Writing
  • Write up sprint story – done
    • Develop a ‘training’ corpus known bad actors (KBA) for each domain.

      • KBAs will be pulled from, which provides a large list.
      • List of KBAs will be added to the content rating DB for human curation
      • HTML and PDF data will be used to populate a list of documents that will then be scanned and analyzed to prepare TF-IDF and LSI term-document tables.
      • The resulting table will in turn be analyzed using term centrality, with the output being an ordered list of terms to be evaluated for each domain.

  • Building view to get person, rating and link from the db – done, or at least V1
    CREATE VIEW view_ratings AS
      select, qo.search_type, po.first_name, po.last_name, po.pp_state, ro.person_characterization from item_object io
        INNER JOIN query_object qo ON io.query_id =
        INNER JOIN rating_object ro on = ro.result_id
        INNER JOIN poi_object po on qo.provider_id =;
  • Took results from and ran them through the whole system. The full results are in the Corpus file under and The results seem to make incredibly specific searches. Here are the two first examples. Note that there are very few .com sites.:

Phil 5.17.16

7:00 -7:00

  • Great discussion with Greg yesterday. Very encouraging.
  • Some thoughts that came up during Fahad’s (Successful!) defense
    • It should be possible to determine the ‘deletable’ codes at the bottom of the ranking by setting the allowable difference between the initial ranking and the trimmed rank.
    • The ‘filter’ box should also be set by clicking on one of the items in the list of associations for the selected items. This way, selection is a two-step process in this context.
    • Suggesting grouping of terms based on connectivity? Maybe second degree? Allows for domain independence?
    • Using a 3D display to show the shared second, third and nth degree as different layer
    • NLP tagged words for TF-IDF to produce a more characterized matrix?
    • 50 samples per iteration, 2,000 iterations? Check! And add info to spreadsheet! Done, and it’s 1,000 iterations
  • Writing
  • Parsing Jeremy’s JSON file
    • Moving the OptionalContent and JsonLoadable over to JavaJtils2
    • Adding javax.persistence-2.1.0
    • Adding json-simple-1.1.1
    • It worked, but it’s junk. It looks like these are un-curated pages
  • Long discussion with Aaron about calculating flag rollups.

Phil 5.11.16

7:00 – 4:30 VTX

  • Continuing paper – working on the ‘motivations’ section
  • Need to set the mode to interactive after a successful load
  • Need to find out where the JSON ratings are in the medicalpractitioner db? Or just rely on Jeremy’s interface? I guess it depends on what gets blown away. But it doesn’t seem like the JSON is in the db.
  • Added a stanfordNLP package to JavaUtils
    • NLPtoken stores all the extracted information about a token (word, lemma, index, POS, etc)
    • DocumentStatistics holds token data across one or more documents
    • StringAnnotator parses strings into NLPtokens.
  • Fixed a bunch of math issues (in Excel, too), but here are the two versions;
    am = 1.969
    be = 2.523
    da = 0.984
    do = 1.892
    i = 1.761
    is = 1.130
    it = 1.130
    let = 1.130
    not = 1.380
    or = 3.523
    thfor = 1.380
    think = 1.380
    to = 1.469
    what = 1.380

    And Excel:

     da	is	 it	 let	 not	 thfor	 think	 what	 to	 i	 do	 am	 be	 or
    0.984	1.130	1.130	1.130	1.380	1.380	1.380	1.380	1.469	1.761	1.892	1.969	2.523	3.523

Phil 5.5.16

7:00 – 5:30 VTX

  • Continuing An Introduction to the Bootstrap.
  • This helped a lot. I hope it’s right…
  • Had a thought about how to build the Bootstrap class. Build it using RealVector and then use Interface RealVectorPreservingVisitor to do whatever calculation is desired. Default methods for Mean, Median, Variance and StdDev. It will probably need arguments for max iteration and epsilon.
  • Didn’t do that at all. Wound up using ArrayRealVector for the population and Percentile to hold the mean and variance values. I can add something else later
  • I think to capture how the centrality affects the makeup of the data in a matrix. I think it makes sense to use the normalized eigenvector to multiply the counts in the initial matrix and submit that population (the whole matrix) to the Bootstrap
  • Meeting with Wayne? Need to finish tool updates though.
  • Got bogged down in understanding the Percentile class and how binomial distributions work.
  • Built and then fixed a copy ctor for Labled2DMatrix.
  • Testing. It looks ok, but I want to try multiplying the counts by the eigenVec. Tomorrow.

Phil 5.3.16

7:00 – 3:30 VTX

  • Out riding, I realized that I could have a column called ‘counts’ that would add up the total number of ‘terms per document’ and ‘documents per terms ‘. Unitizing the values would then show the number of unique terms per document. That’s useful, I think.
  • Helena pointed to an interesting CHI 2016 site. This is sort of the other side of extracting pertinence from relevant data. I wonder where they got their data from?
    • Found it!. It’s in a public set of Google docs, in XML and JSON formats. I found it by looking at the GitHub home page. In the example code  there was this structure:
      source: {
          gdocId: '0Ai6LdDWgaqgNdG1WX29BanYzRHU4VHpDUTNPX3JLaUE',
          tables: "Presidents"

      That gave me a hint of what to look for in the document source of the demo, where I found this:

      var urlBase = '';

      And that’s the link from above.

    • There appear to be other useful data sets as well. For example, there is an extensive CHI paper database sitting behind this demo.
    • So this makes generalizing the PageRank approach much more simple since it looks like I can pull the data down pretty simply. In my case I think the best thing would be to write small apps that pull down the data and build Excel spreadsheets that are read in by the tool for now.
  • Exporting a new data set from Atlas. Done and committed. I need to do runs before meeting with Wayne.
  • Added Counts in and refactored a bit.
  • I think I want a list of what a doc or term is directly linked to and the number of references. Addid the basics. Wiring up next. Done! But now I want to click on an item in the counts list and have it be selected? Or at least highlighted?
  • Stored the new version on dropbox:
  • Meeting with Wayne
    • There’s some bug with counts. Add it to the WeightedItem.toString() and test.
    • Add a ‘move to top’ button near the weight slider that adds just enough weight to move the item to the top of the list. This could be iterative?
    • Add code that compares the population of ranks with the population of scaled ranks. Maybe bootstrapping? Apache Commons Math has KolmogorovSmirnovTest, which has public double kolmogorovSmirnovTest(double[] x, double[] y, boolean strict), which looks promising.
  • Added ability to log out of the rating app.

Phil 4.29.16

7:00 – 5:00 VTX

  • Expense reports and timesheets! Done.
  • Continuing Informed Citizenship in a Media-Centric Way of Life
    • The pertinence interface may be an example of a UI affording the concept of monitorial citizenship.
      • Page 219: The monitorial citizen, in Schudson’s (1998) view, does environmental surveillance rather than gathering in-depth information. By implication, citizens have social awareness that spans vast territory without having in-depth understanding of specific topics. Related to the idea of monitorial instead of informed citizenship, Pew Center (2008) data identified an emerging group of young (18–34) mobile media users called news grazers. These grazers ind what they need by switching across media platforms rather than waiting for content to be served.
    • Page 222: Risk as Feelings. The abstract is below. There is an emotional hacking aspect here that traditional journalism has used (heuristically?) for most(?) of its history.
      • Virtually all current theories of choice under risk or uncertainty are cognitive and consequentialist. They assume that people assess the desirability and likelihood of possible outcomes of choice alternatives and integrate this information through some type of expectation-based calculus to arrive at a decision. The authors propose an alternative theoretical perspective, the risk-as-feelings hypothesis, that highlights the role of affect experienced at the moment of decision making. Drawing on research from clinical, physiological, and other subfields of psychology, they show that emotional reactions to risky situations often diverge from cognitive assessments of those risks. When such divergence occurs, emotional reactions often drive behavior. The risk-as-feelings hypothesis is shown to explain a wide range of phenomena that have resisted interpretation in cognitive–consequentialist terms.
    • At page 223 – Elections as the canon of participation

  • Working on getting tables to sort – Done

  • Loading excel file -done
  • Calculating – done
  • Using weights -done
  • Reset weights – done
  • Saving (don’t forget to add sheet with variables!) – done
  • Wrapped in executable – done
  • Uploading to dropbox. Wow – the files with JavaFX are *much* bigger than Swing.

Phil 4.28.16

7:00 – 5:00 VTX

  • Reading Informed Citizenship in a Media-Centric Way of Life
    • Jessica Gall Myrick
    • This is a bit out of the concentration of the thesis, but it addresses several themes that relate to system and social trust. And I’m thinking that behind these themes of social vs. system is the Designer’s Social Trust of the user. Think of it this way: If the designer has a high Social Trust intention with respect to the benevolence of the users, then a more ‘human’ interactive site may result with more opportunities for the user to see more deeply into the system and contribute more meaningfully. There is risks in this, such as hellish comment sections, but also rewards (see the YouTube comments section for The Idea Channel episodes). If the designer has a System Trust intention with respect to say, the reliability of the user watching ads, then different systems get designed that learns to generate click-bait using neural networks such as clickotron). Or, closer to home, Instagram might decide to curate a feed for you without affordances to support changing of feed options. The truism goes ‘If you’re not paying, then you’re the product’. And products aren’t people. Products are systems.
    • Page 218: Graber (2001) argues that researchers oten treat the information value of images as a subsidiary to verbal information, rather than having value themselves. Slowly, studies employing visual measures and examining how images facilitate knowledge gain are emerging (Grabe, Bas, & van Driel, 2015; Graber, 2001; Prior, 2014). In a burgeoning media age with citizens who overwhelmingly favor (audio)visually distributed information, research momentum on the role of visual modalities in shaping informed citizenship is needed. Paired with it, reconsideration of the written word as the preeminent conduit of information and rational thought are necessary.
      • The rise of infographics  makes me believe that it’s not image and video per se, but clear information with low cognitive load.
  • ————————–
  • Bob had a little trouble with inappropriate and unclear identity, as well as education, info and other
  • Got tables working for terms and docs.
  • Got callbacks working from table clicks
  • Couldn’t get the table to display. Had to use this ugly hack.
  • Realized that I need name, weight and eigenval. Sorting is by eigenval. Weight is the multiplier of the weights in a row or column associated with a term or document. Mostly done.

Phil 4.22.16

7:00 – 4:30 VTX

  • Had a thought going to sleep last night that it would be interesting to see the difference between a ‘naive’ ranking based on the number of quotes vs. PageRank. Pretty much as soon as I got up, I pulled down the spreadsheet and got the lists. It’s in the previous post, but I’ll pot them here too:
    • Sorted from most to least quotes
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      P 5: Saracevic_relevance_75.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      P77: The Law of Group Polarization.pdf
      P43: On the Accuracy of Media-based Conflict Event Data.pdf
      System Trust
      P37: Security-control methods for statistical databases – a comparative study.pdf
    • Sorted on Page Rank eigenvector
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      System Trust
      Social Trust
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      Credibility Cues
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • To me it’s really interesting how much better the codes are mixed in to the results. I actually thought it could be the other way, since the codes are common across many papers. Also, the concepts of System Trust, Social Trust and Credibility  Cues very much became a central point in my mind as I worked through the papers.
  • A second thought, which is the next step in the research, is to see ho weighting affects relationships. Right now, the the papers and codes are weighted by the number of quotes. What happens when all the weights are normalized (set to 1.0)?. And then there is the setup of the interactivity. With zero optimizations, this took 4.2 seconds to calculate on a modern laptop. Not sliderbar rates, but change a (some?) values and click a ‘run’ button.
  • So, moving forward, the next steps are to create the Swing App that will:
    • read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
  • I’d like to do something with the connectivity graph. Not sure what yet.
  • And I think I’ll do this in JavaFX rather than Swing this time.
  • Huh. JavaFX Scene Builder is no longer supported by Oracle. Now it’s a Gluon project.
  • Documentation still seems to be at Oracle though
  • Spent most of the day seeing what’s going on with the Crawl. Turns out it was bad formatting on the terms?

Phil 4.21.16

7:00 – VTX

  • A little more bitcoin
  • Installed *another* new Java 1.8.0_92
  • Discovered the arXiv API page. This might be very helpful. I need to dig into it a bit.
  • Testing ranking code. I hate to say this, but if it works I think I’m going to write *another* Swing app to check interactivity rates. Which means I need to instrument the matrix calculations for timing.
  • Ok, the rank table is consistent across all columns. In my test code, the eigenvector stabilizes after 5 iterations:
     , col1, col2, col3, col4,
    row1, 11, 21, 31, 41,
    row2, 12, 22, 32, 42,
    row3, 13, 23, 33, 43,
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 1, 0, 0, 0.26, 0.49, 0.72, 0.95,
    row2, 0, 1, 0, 0.28, 0.51, 0.74, 0.98,
    row3, 0, 0, 1, 0.3, 0.53, 0.77, 1,
    col1, 0.26, 0.28, 0.3, 1, 0, 0, 0,
    col2, 0.49, 0.51, 0.53, 0, 1, 0, 0,
    col3, 0.72, 0.74, 0.77, 0, 0, 1, 0,
    col4, 0.95, 0.98, 1, 0, 0, 0, 1,
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 0.61, 0.62, 0.64, 0.22, 0.41, 0.59, 0.78,
    row2, 0.62, 0.65, 0.67, 0.23, 0.42, 0.61, 0.8,
    row3, 0.64, 0.67, 0.69, 0.24, 0.43, 0.63, 0.83,
    col1, 0.22, 0.23, 0.24, 0.08, 0.15, 0.22, 0.29,
    col2, 0.41, 0.42, 0.43, 0.15, 0.27, 0.4, 0.52,
    col3, 0.59, 0.61, 0.63, 0.22, 0.4, 0.58, 0.76,
    col4, 0.78, 0.8, 0.83, 0.29, 0.52, 0.76, 1,
    row1, 1, 0.71, 0.62, 0.61, 0.61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row2, 0, 0.46, 0.61, 0.62, 0.62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row3, 0, 0.48, 0.63, 0.64, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col1, 0.26, 0.13, 0.21, 0.22, 0.22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col2, 0.49, 0.25, 0.38, 0.41, 0.41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col3, 0.72, 0.37, 0.55, 0.59, 0.59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col4, 0.95, 0.49, 0.73, 0.78, 0.78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  • And after a lot of banging away, here’s my lit review in PageRank: pageRank
  • And here’s the difference between PageRank and sorting based on number of quotes:
  • Page Rank
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    System Trust
    Social Trust
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    Credibility Cues
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • Sorted from most to least quotes
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    P 5: Saracevic_relevance_75.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    P77: The Law of Group Polarization.pdf
    P43: On the Accuracy of Media-based Conflict Event Data.pdf
    System Trust
    P37: Security-control methods for statistical databases – a comparative study.pdf

Phil 4.20.16

7:00 – 4:00 VTX

  • Read a little more of the BitCoin article. Nice description of the blockchain.
  • Generated a fresh Excel matrix of codes and papers. I excluded the ‘meta’ codes (Definitions, Methods, etc) and all the papers that have zero quotes
  • Duke Ellington & His Orchestra make great coding music.
  • Installed new Java
  • Well drat. I can’t increase the size of an existing matrix. Will have to return a new one.
  • Yay!
    FileUtils.getInputFileName() opening: mat1.xls
     , col1, col2, col3, col4, 
    row1, 11.0, 21.0, 31.0, 41.0, 
    row2, 12.0, 22.0, 32.0, 42.0, 
    row3, 13.0, 23.0, 33.0, 43.0, 
    done calculating
    done creating
     , row1, row2, row3, col1, col2, col3, col4, 
    row1, 0.0, 0.0, 0.0, 11.0, 21.0, 31.0, 41.0, 
    row2, 0.0, 0.0, 0.0, 12.0, 22.0, 32.0, 42.0, 
    row3, 0.0, 0.0, 0.0, 13.0, 23.0, 33.0, 43.0, 
    col1, 11.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 
    col2, 21.0, 22.0, 23.0, 0.0, 0.0, 0.0, 0.0, 
    col3, 31.0, 32.0, 33.0, 0.0, 0.0, 0.0, 0.0, 
    col4, 41.0, 42.0, 43.0, 0.0, 0.0, 0.0, 0.0,
  • Need to set Identity and other housekeeping.
  • Added ‘normalizeByMatrix that sets the entire matrix on a unit scale
  • Need to have a calcRank function that squares and normalizes until the difference between output eigenvectors are below a certain threshold or a limit of iterations. Done?

Phil 4.12.16

7:00 – 6:00 VTX

  • At the poster session yesterday, I had a nice chat with Yuanyuan about her poster on Supporting Common Ground Development in the Operation Room through Information Display Systems. It turns out that she is looking at information exchange patterns in groups independent of content, which is similar to what I’m looking at. We had a good discussion on group polarization and what might happen if misinformation was introduced into the OR. It turns out that this does happen – if the Attending Physician becomes convinced that, for example, all the instruments have been removed from the patient, the rest of the team can become convinced of this as well and self-reinforce the opinion.
  • Scanned through Deindividuation Effects on Group Polarization in Computer-Mediated Communication: The Role of Group Identification, Public-Self-Awareness, and Perceived Argument Quality. The upshot appears that individuation of participants acts as a drag on group polarization. So the more the information is personalized (and the more that the reader retains self awareness) the less the overall group polarization will move.
  • I’ve often said that humans innately communicate using stories and maps (Maps are comprehended at 3-4.5 years, Stories from when?). The above would support that stories are more effective ways of promoting ‘star’ information patterns. This is all starting to feel very fractal and self similar at differing scales…
  • Looking for children’s development of story comprehension led to this MIT PhD Thesis: TOWARD A MODEL OF CHILDREN’S STORY COMPREHENSION. Good lord – What a Committee: Marvin Minsky (thesis supervisor), Professors Joel Moses and Seymour Papert (thesis committee), Jeff Hill, Gerry Sussman, and Terry Winograd.
  • ———————
  • While reading Deep or Shallow, NLP is Breaking Out, I learned about word2vec. Googling led to, which has its own word2vec page, among a *lot* of other things. From their home page:
    • Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments, rather than as a research tool. Skymind is its commercial support arm.
    •  Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors.
    •  By following the instructions on our Quick Start page, you can run your first examples of trained neural nets in minutes.
  • The word vector alternative is from the Stanford NLP folks: GloVe: Global Vectors for Word Representation. The link also has trained (extracted?) word vectors.
  • Testing the behavior of query construction and search results. Fixing stupid bugs. Testing more. Lathering, rinsing and repeating.
  • Some good discussions with Aaron on inferencing and toxicity profiles. Basically taking the outputs and determining correlations with the inputs. Which led to a very long day.