Category Archives: IntelliJ

Phil 5.5.16

7:00 – 5:30 VTX

  • Continuing An Introduction to the Bootstrap.
  • This helped a lot. I hope it’s right…
  • Had a thought about how to build the Bootstrap class. Build it using RealVector and then use Interface RealVectorPreservingVisitor to do whatever calculation is desired. Default methods for Mean, Median, Variance and StdDev. It will probably need arguments for max iteration and epsilon.
  • Didn’t do that at all. Wound up using ArrayRealVector for the population and Percentile to hold the mean and variance values. I can add something else later
  • I think to capture how the centrality affects the makeup of the data in a matrix. I think it makes sense to use the normalized eigenvector to multiply the counts in the initial matrix and submit that population (the whole matrix) to the Bootstrap
  • Meeting with Wayne? Need to finish tool updates though.
  • Got bogged down in understanding the Percentile class and how binomial distributions work.
  • Built and then fixed a copy ctor for Labled2DMatrix.
  • Testing. It looks ok, but I want to try multiplying the counts by the eigenVec. Tomorrow.

Phil 5.3.16

7:00 – 3:30 VTX

  • Out riding, I realized that I could have a column called ‘counts’ that would add up the total number of ‘terms per document’ and ‘documents per terms ‘. Unitizing the values would then show the number of unique terms per document. That’s useful, I think.
  • Helena pointed to an interesting CHI 2016 site. This is sort of the other side of extracting pertinence from relevant data. I wonder where they got their data from?
    • Found it!. It’s in a public set of Google docs, in XML and JSON formats. I found it by looking at the GitHub home page. In the example code  there was this structure:
      source: {
          gdocId: '0Ai6LdDWgaqgNdG1WX29BanYzRHU4VHpDUTNPX3JLaUE',
          tables: "Presidents"
        }

      That gave me a hint of what to look for in the document source of the demo, where I found this:

      var urlBase = 'https://ca480fa8cd553f048c65766cc0d0f07f93f6fe2f.googledrive.com/host/0By6LdDWgaqgNfmpDajZMdHMtU3FWTEkzZW9LTndWdFg0Qk9MNzd0ZW9mcjA4aUJlV0p1Zk0/CHI2016/';
      

      And that’s the link from above.

    • There appear to be other useful data sets as well. For example, there is an extensive CHI paper database sitting behind this demo.
    • So this makes generalizing the PageRank approach much more simple since it looks like I can pull the data down pretty simply. In my case I think the best thing would be to write small apps that pull down the data and build Excel spreadsheets that are read in by the tool for now.
  • Exporting a new data set from Atlas. Done and committed. I need to do runs before meeting with Wayne.
  • Added Counts in and refactored a bit.
  • I think I want a list of what a doc or term is directly linked to and the number of references. Addid the basics. Wiring up next. Done! But now I want to click on an item in the counts list and have it be selected? Or at least highlighted?
  • Stored the new version on dropbox: https://www.dropbox.com/s/92err4z2posuaa1/LMN.zip?dl=0
  • Meeting with Wayne
    • There’s some bug with counts. Add it to the WeightedItem.toString() and test.
    • Add a ‘move to top’ button near the weight slider that adds just enough weight to move the item to the top of the list. This could be iterative?
    • Add code that compares the population of ranks with the population of scaled ranks. Maybe bootstrapping? Apache Commons Math has KolmogorovSmirnovTest, which has public double kolmogorovSmirnovTest(double[] x, double[] y, boolean strict), which looks promising.
  • Added ability to log out of the rating app.

Phil 4.29.16

7:00 – 5:00 VTX

  • Expense reports and timesheets! Done.
  • Continuing Informed Citizenship in a Media-Centric Way of Life
    • The pertinence interface may be an example of a UI affording the concept of monitorial citizenship.
      • Page 219: The monitorial citizen, in Schudson’s (1998) view, does environmental surveillance rather than gathering in-depth information. By implication, citizens have social awareness that spans vast territory without having in-depth understanding of specific topics. Related to the idea of monitorial instead of informed citizenship, Pew Center (2008) data identified an emerging group of young (18–34) mobile media users called news grazers. These grazers ind what they need by switching across media platforms rather than waiting for content to be served.
    • Page 222: Risk as Feelings. The abstract is below. There is an emotional hacking aspect here that traditional journalism has used (heuristically?) for most(?) of its history.
      • Virtually all current theories of choice under risk or uncertainty are cognitive and consequentialist. They assume that people assess the desirability and likelihood of possible outcomes of choice alternatives and integrate this information through some type of expectation-based calculus to arrive at a decision. The authors propose an alternative theoretical perspective, the risk-as-feelings hypothesis, that highlights the role of affect experienced at the moment of decision making. Drawing on research from clinical, physiological, and other subfields of psychology, they show that emotional reactions to risky situations often diverge from cognitive assessments of those risks. When such divergence occurs, emotional reactions often drive behavior. The risk-as-feelings hypothesis is shown to explain a wide range of phenomena that have resisted interpretation in cognitive–consequentialist terms.
    • At page 223 – Elections as the canon of participation

  • Working on getting tables to sort – Done

  • Loading excel file -done
  • Calculating – done
  • Using weights -done
  • Reset weights – done
  • Saving (don’t forget to add sheet with variables!) – done
  • Wrapped in executable – done
  • Uploading to dropbox. Wow – the files with JavaFX are *much* bigger than Swing.

Phil 4.28.16

7:00 – 5:00 VTX

  • Reading Informed Citizenship in a Media-Centric Way of Life
    • Jessica Gall Myrick
    • This is a bit out of the concentration of the thesis, but it addresses several themes that relate to system and social trust. And I’m thinking that behind these themes of social vs. system is the Designer’s Social Trust of the user. Think of it this way: If the designer has a high Social Trust intention with respect to the benevolence of the users, then a more ‘human’ interactive site may result with more opportunities for the user to see more deeply into the system and contribute more meaningfully. There is risks in this, such as hellish comment sections, but also rewards (see the YouTube comments section for The Idea Channel episodes). If the designer has a System Trust intention with respect to say, the reliability of the user watching ads, then different systems get designed that learns to generate click-bait using neural networks such as clickotron). Or, closer to home, Instagram might decide to curate a feed for you without affordances to support changing of feed options. The truism goes ‘If you’re not paying, then you’re the product’. And products aren’t people. Products are systems.
    • Page 218: Graber (2001) argues that researchers oten treat the information value of images as a subsidiary to verbal information, rather than having value themselves. Slowly, studies employing visual measures and examining how images facilitate knowledge gain are emerging (Grabe, Bas, & van Driel, 2015; Graber, 2001; Prior, 2014). In a burgeoning media age with citizens who overwhelmingly favor (audio)visually distributed information, research momentum on the role of visual modalities in shaping informed citizenship is needed. Paired with it, reconsideration of the written word as the preeminent conduit of information and rational thought are necessary.
      • The rise of infographics  makes me believe that it’s not image and video per se, but clear information with low cognitive load.
  • ————————–
  • Bob had a little trouble with inappropriate and unclear identity, as well as education, info and other
  • Got tables working for terms and docs.
  • Got callbacks working from table clicks
  • Couldn’t get the table to display. Had to use this ugly hack.
  • Realized that I need name, weight and eigenval. Sorting is by eigenval. Weight is the multiplier of the weights in a row or column associated with a term or document. Mostly done.

Phil 4.21.16

7:00 – VTX

  • A little more bitcoin
  • Installed *another* new Java 1.8.0_92
  • Discovered the arXiv API page. This might be very helpful. I need to dig into it a bit.
  • Testing ranking code. I hate to say this, but if it works I think I’m going to write *another* Swing app to check interactivity rates. Which means I need to instrument the matrix calculations for timing.
  • Ok, the rank table is consistent across all columns. In my test code, the eigenvector stabilizes after 5 iterations:
    initial
     , col1, col2, col3, col4,
    row1, 11, 21, 31, 41,
    row2, 12, 22, 32, 42,
    row3, 13, 23, 33, 43,
    
    derived
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 1, 0, 0, 0.26, 0.49, 0.72, 0.95,
    row2, 0, 1, 0, 0.28, 0.51, 0.74, 0.98,
    row3, 0, 0, 1, 0.3, 0.53, 0.77, 1,
    col1, 0.26, 0.28, 0.3, 1, 0, 0, 0,
    col2, 0.49, 0.51, 0.53, 0, 1, 0, 0,
    col3, 0.72, 0.74, 0.77, 0, 0, 1, 0,
    col4, 0.95, 0.98, 1, 0, 0, 0, 1,
    
    rank
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 0.61, 0.62, 0.64, 0.22, 0.41, 0.59, 0.78,
    row2, 0.62, 0.65, 0.67, 0.23, 0.42, 0.61, 0.8,
    row3, 0.64, 0.67, 0.69, 0.24, 0.43, 0.63, 0.83,
    col1, 0.22, 0.23, 0.24, 0.08, 0.15, 0.22, 0.29,
    col2, 0.41, 0.42, 0.43, 0.15, 0.27, 0.4, 0.52,
    col3, 0.59, 0.61, 0.63, 0.22, 0.4, 0.58, 0.76,
    col4, 0.78, 0.8, 0.83, 0.29, 0.52, 0.76, 1,
    
    EigenVec
    row1, 1, 0.71, 0.62, 0.61, 0.61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row2, 0, 0.46, 0.61, 0.62, 0.62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row3, 0, 0.48, 0.63, 0.64, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col1, 0.26, 0.13, 0.21, 0.22, 0.22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col2, 0.49, 0.25, 0.38, 0.41, 0.41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col3, 0.72, 0.37, 0.55, 0.59, 0.59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col4, 0.95, 0.49, 0.73, 0.78, 0.78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  • And after a lot of banging away, here’s my lit review in PageRank: pageRank
  • And here’s the difference between PageRank and sorting based on number of quotes:
  • Page Rank
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    System Trust
    Social Trust
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    Credibility Cues
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • Sorted from most to least quotes
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    P 5: Saracevic_relevance_75.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    P77: The Law of Group Polarization.pdf
    P43: On the Accuracy of Media-based Conflict Event Data.pdf
    System Trust
    P37: Security-control methods for statistical databases – a comparative study.pdf

Phil 4.20.16

7:00 – 4:00 VTX

  • Read a little more of the BitCoin article. Nice description of the blockchain.
  • Generated a fresh Excel matrix of codes and papers. I excluded the ‘meta’ codes (Definitions, Methods, etc) and all the papers that have zero quotes
  • Duke Ellington & His Orchestra make great coding music.
  • Installed new Java
  • Well drat. I can’t increase the size of an existing matrix. Will have to return a new one.
  • Yay!
    FileUtils.getInputFileName() opening: mat1.xls
    initial
     , col1, col2, col3, col4, 
    row1, 11.0, 21.0, 31.0, 41.0, 
    row2, 12.0, 22.0, 32.0, 42.0, 
    row3, 13.0, 23.0, 33.0, 43.0, 
    done calculating
    done creating
    derived
     , row1, row2, row3, col1, col2, col3, col4, 
    row1, 0.0, 0.0, 0.0, 11.0, 21.0, 31.0, 41.0, 
    row2, 0.0, 0.0, 0.0, 12.0, 22.0, 32.0, 42.0, 
    row3, 0.0, 0.0, 0.0, 13.0, 23.0, 33.0, 43.0, 
    col1, 11.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 
    col2, 21.0, 22.0, 23.0, 0.0, 0.0, 0.0, 0.0, 
    col3, 31.0, 32.0, 33.0, 0.0, 0.0, 0.0, 0.0, 
    col4, 41.0, 42.0, 43.0, 0.0, 0.0, 0.0, 0.0,
  • Need to set Identity and other housekeeping.
  • Added ‘normalizeByMatrix that sets the entire matrix on a unit scale
  • Need to have a calcRank function that squares and normalizes until the difference between output eigenvectors are below a certain threshold or a limit of iterations. Done?

Phil 4.14.16

7:00 – 3:30 VTX

  • Continuing The ‘like me’ framework for recognizing and becoming an intentional agent
  • Page 2: Perception influences production, and production influences perception, with substantial implications for social cognition.
    • This must be a foundational element of Social Trust. I see you do a thing. I imitate the thing. I feel (not think!) that it is the same thing. I do a thing. You imitate the thing. Think peekaboo. We establish a rapport. This is different from System Trust, where I put something somewhere and it’s still there. System trust may be derived fundamentally from Object Permanence, while Social Trust comes from imitation?
    • This is(?) tied to motor neurons. From Mirror neurons: Enigma of the metaphysical modular brainEssentially, mirror neurons respond to actions that we observe in others. The interesting part is that mirror neurons fire in the same way when we actually recreate that action ourselves.
      • Implications for design? Journalism is definitely built around the ‘like me’ concept that it is built around stories. IR is much less so, and is more data focused.
    • At section 3 – Experiment 1: learning tool-use by observing others
      • We have Social Trust first. Then we learn to use tools. Tools are different from, though related to the environment. They are not ‘like me’, but they extend me (Heidegger again). More later.
  • Page 3: For example, there is an intimate relation between striving to achieve a goal and a concomitant facial expression and effortful bodily acts.
    • This is like the boot loader or initial dictionary entry. Hard-wired common vocabulary.
  • Page 3: Humans, including preverbal infants, imbue the acts of others with felt meaning not solely (or at first) through a formal process of step-by-step reasoning, but because the other is processed as ‘like me.’ This is underwritten by the way humans represent action—the supramodal action code—and self experience
    • So is there a ‘more like me’ and ‘less like me’?
  • Meeting with Wayne this evening
    • Go over notes
    • Coding session
  • ——————
  • Check to see that reports are being made correctly
    • Fix “Get all rated” Numerous issues, including strings with commas
    • Fix “Get Match Counts” all zeros
    • Fix “Get No Match Counts” redundent
    • Change “Get Blacklist (CSV)” to “Black/White list (CSV)
    • Add “Get Whitelist (Google CSE)
    • Change the Sets in getBlack/Whitelist to use maps rather than sets so blacklist culling can be used with more informative rows.
  • Update remote DB and test a few pages. Ran into a problem with LONGTEXT and Postgress. Went back to TEXT
  • Went over Aaron’s ASB slides a couple of times. Introduced him to Partial Least Squares Structural Equation Modeling (PLS-SEM).
  • Present new system to Andy, Margarita and John. Tomorrow…

Phil 4.1.16

7:15 – 4:15 VTX

  • Had a bunch of paperwork to do for my folks. All handled now?
  • Continuing What is Trust? A Conceptual Analysis and An Interdisciplinary Model. Done
    • Disposition to Trust. This construct means the extent to which one displays a consistent tendency to be willing to depend on general others across a broad spectrum of situations and persons
      • a general propensity to be willing to depend on others.
      • does not necessarily imply that one believes others to be trustworthy
      • only has a major effect on one’s trust-related behavior when novel
        situations arise, in which the person and situation are unfamiliar
      • Disposition to Trust has two subconstructs, Faith in Humanity and Trusting Stance
        • Faith in Humanity means one assumes others are usually upright, well-meaning, and dependable.
        • Trusting Stance means that, regardless of what one assumes about other people generally, one assumes that one will achieve better outcomes by dealing with people as though they are well-meaning and reliable
      • Because Faith in Humanity relates to assumptions about peoples’ attributes, it is more likely to be an antecedent to Trusting Beliefs (in people) than is Trusting Stance. Trusting Stance may relate more to Trusting Intention, which, depending on the situation, is probably not based wholly on beliefs about the other person.
    • Institution-based Trust means one believes the needed conditions are in place to enable one to anticipate a successful outcome in an endeavor or aspect of one’s life
      • This construct comes from the sociology tradition that people can rely on others because of structures, situations, or roles  that provide assurances (Affordances???) that things will go well
      • Institution-based Trust has two subconstructs, Structural Assurance and Situational Normality.
        • Structural Assurance means one believes that success is likely because guarantees, contracts, regulations, promises, legal recourse, processes, or procedures are in place that assure success
        • Situational Normality means one believes that success is likely because the situation is normal or favorable. (I think that this comes from very primitive parts of our brains. It can be observed in many animals and may be one of those things that separates infant and adult behavior. If you trust too much, you are likely to get eaten..?)
          • Situational Normality means that a properly ordered setting is likely to facilitate a successful venture. When one believes one’s role and others’ roles in the situation are appropriate and conducive to success, then one has a basis for trusting the people in the situation.
          • likely related to Trusting Beliefs and Trusting Intention. A system developer who feels good about the roles and setting in which they work is likely to have Trusting Beliefs about the people in that setting.
    • Trusting Beliefs means one believes (and feels confident in believing) that the other person has one or more traits desirable to one in a situation in which negative consequences are possible.
      • We distinguish four main trusting belief subconstructs, while recognizing that others exist.
        • Trusting Belief-Competence means one believes the other person has the ability or power to do for one what one needs done.
        • Trusting Belief-Benevolence means one believes the other person cares about one and is motivated to act in one’s interest.  A benevolent person does not act opportunistically.
        • Trusting Belief-Integrity means one believes the other person makes good faith agreements, tells the truth, and fulfills promises
        • Trusting Belief-Predictability means one believes the other person’s actions (good or bad) are consistent enough that one can forecast them in a given situation
    • Trusting Intention means one is willing to depend on, or intends to depend on, the other person in a given task or situation  with a feeling of relative security, even though negative consequences are possible
      • Trusting intention subconstructs include Willingness to Depend and Subjective Probability of Depending.
        • Willingness to Depend means one is volitionally prepared to make oneself vulnerable to the other person in a situation by relying on them.
        • Subjective Probability of Depending means the extent to which one forecasts or predicts that one will depend on the other person.
      • Trusting Intention definitions embody five elements synthesized from the trust literature.
        1. The possibility of negative consequences or risk is what makes trust important but problematic.
        2. A readiness to depend or rely on another is central to trusting intention.
        3. A feeling of security means one feels safe, assured, and comfortable (not anxious or fearful) about the prospect of depending on another. Feelings of security reflect the affective side of trusting intention.
        4. Trusting intention is situation-specific.(???? why? Examples?)
        5. Trusting intention involves willingness that is not based on having control or power over the other party. Note that Trusting Intention relates well to the system development power literature because we define it in terms of dependence and control.
    • Another limitation relates to Whetten’s (1989) recommendation that Who and Where conditions should be placed around models.  Whereas we have assumed that the model applies to any kind of relationship between two people (Who) in any situation (Where), this may not be the case. Empirical research is needed to better define the boundary conditions of the model.
  • Starting Technology, Humanness, and Trust: Rethinking Trust in Technology, also by D. Harrison McKnight
    • Page 881 (Basic?) Social Trust: human-like trust constructs of integrity, ability/competence, and benevolence that researchers have traditionally used to measure interpersonal trust.
    • Page 881 (Basic?) System Trust: system-like trust constructs such as reliability,
      functionality, and helpfulness
    • Page 881. First, we hypothesize that technologies can differ in humanness. Second, we predict that users will develop trust in the technology differently depending on whether they perceive it as more or less human-like, which will result in human-like trust having a stronger.  influence on outcomes for more human-like technologies and system-like trust having a stronger influence on outcomes for more system-like technologies. (Cite Kate Bush Deeper Understanding 1989)
    • Here’s the beginning of a thought: What is self-trust? Just thinking about it, it seems to be a sense of the reliability of my future self to do what my present self desires. That’s different from Social Trust, which in the literature is more about integrity, competence and benevolence. It seems closer to system trust in that reliability and functionality are more significant. There are things that I trust that I will do tomorrow: Get up, go to work, exercise if the weather is good enough. But there are also things that I can’t trust myself to do. My future self will almost certainly eat more calories than my current self desires. My grocery shopping behaviors are based around this lack of trust. There are items that I do not bring into my house because I know that they will get eaten (I was going to write that I know that my will is weak around chocolate, but that’s not really it. Or at least, that’s not all of it, or maybe even most of it..). Because (interactive?) information technology is more like a self-amplifier, I wonder if what we think of system trust can be thought of as the trust in ourselves, but the part of ourselves that is more reliable and trustworthy. A search tomorrow will work as well as a search today. Maybe better. And the effectiveness of that search reflect somehow my ability to interact effectively with the external world? This is starting to sound a lot my point of view that living a life in prolonged contact with a compiler changes you in profound ways.
    • So what would that mean? I think it’s a reasonable hypothesis to change search results from focusing on pertinence to revelation. This does not mean that the ‘Ten Blue Links’ need to go away. But it does imply that peripheral information could be just as important, so that a less casually polarized worldview might be developed.
  • Finishing up the CSE version control setup – need to write up the process for confluence – done.
  • Since I need to be able to now read in the Excella data, I was going to look to Gregg’s ontology as a way to determine the table structure. But it’s way too big and nested. In a Person’s description includes a reference to a complete organization, activities, charges, arrests, and it doesn’t even have room for nice things yet (will we have co-authors?). Anyway, To avoid this, I’m going to have basic person characteristics with an associated  StringMaps, NumMaps and DateMaps. Anything that’s not recognized as a column gets added to that. Need to see how persistence will work with that in some testing first.
  • Got the code working. JPA 2 says you should be able to build a map entirely without annotations, but I couldn’t get it to work. Modified JsonLoadable so that it goes through the Json Object and anything that is not a member of the current class is added to HashMaps of PoiOptionalStrings. It should be very straightforward to extend to number and date types. Probably worth doing?

Phil 2.5.16

6:45 – 4:15 VTX

  • Change the JsonLoaded class to only look at declared fields – done
  • Register for Periscope Charts -done. Callback on Monday?
  • Working on parsing the query result.
    • Had to set the charset to UTF-8. Huh.
    • Can we pull back items by cacheId? Then we don’t need to load the primary store with internet info.
    • Had a STUPID mistake in getting JPA set up. Had all the annotations pointing at each other, but forgot when creating the result objects that I had to pass the ‘parent’ query object in to get the mapping. Sigh.
    • Adding a dirt-simple rating scheme
      • Java app iterates over all the urls returned and the user can pick from:
        1 - not appropriate at all
        2 - medical and or legal
        3 - Correct person
        4 - Correct person with flaggable

        The Java app then either opens the page or downloads and opens the file with the default application.

      • The user picks the value, the result object persists with the rating and we move on to the next item. Right now the DB is on my local machine, but if we made it networkable everyone could rate a few pages. Most of the results should only take a few seconds to evaluate.
  • I have the Google/db code running in one sandbox and the user eval running in another. Monday I’ll integrate them.

Phil 2.3.16

7:00 – 3:00 VTX

  • Just discovered Publius –  a Web publishing system that is highly resistant to censorship and provides publishers with a high degree of anonymity. No longer active, but produced a paper.
  • Continuing On the Accuracy of Media-based Conflict Event Data. Currently starting Matching Media-based Conflict Reports with Military Records
  • Back to Googlehacking
    • Since I’ve got the provider JSON, setting up objects that I can use for more in-depth parsing. Thinking that this could be an example of ‘code’ in the dictionary. A work can be an object that knows how to look through a section of text to see if it can find itself.
    • I think running several dictionaries over a document could be interesting. For example, using a medical and a legal dictionary on a document would let the system infer malpractice as opposed to a document on foreign aid.
    • Generating the right queries and they work in the browser:
      "Ram Singh"
      	ALL_GOV(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_GOV(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_GOV(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_GOV(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
      	ALL_US(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+sanctions
      	ALL_US(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+criminal
      	ALL_US(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+malpractice
      	ALL_US(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+board+actions
      	ALL_ORG(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+sanctions
      	ALL_ORG(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+criminal
      	ALL_ORG(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+malpractice
      	ALL_ORG(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+board+actions
      	RESTRICTED_COM(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+sanctions
      	RESTRICTED_COM(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+criminal
      	RESTRICTED_COM(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+malpractice
      	RESTRICTED_COM(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+board+actions
      	ALL_EDU(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_EDU(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_EDU(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_EDU(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
  • So the next thing is to start running these queries and looking at the results to see if there are patterns. And I would be further along, but IntelliJ choked when I tried to add JPA. After flailing for a while I just gave up, created a new project, copied all the lib src and persistence directories over, updated the structure, and it all works. Grumble grumble.

Phil 1.26.16

7:00 – 3:00 VTX

  • Finished the Crowdseeding paper. I was checking out the authors, and went to Macartan Humphreys’ website. He’s been doing interesting work, and he’s up in NYC at Colombia, so it would be possible to visit. Anyway, there is one paper that looks very interesting: Mixing Methods: A Bayesian Approach. It’s about inferring information from quantitative and qualitative sources. Anyway, it sounds related, both to how I’m putting together my proposal and how the overall system should(?) work.
  • Reviewing a paper. Don’t forget to mention other analytic systems like Palantir Gotham
  • On to Theme-based Retrieval of Web News. And in looking at papers that cite this, found The Hybrid Representation Model for Web Document Classification. Not too impressed with the former. The latter looks like it contains some good overview in the previous works section. One of the authors: Mark Last (lots of data discovery in large data sets)
  • Downloading new IntelliJ. Ok, back to normal and the tutorial.
    • Huh. Tried loading the (compact) “N-TRIPLES” format, which barfed, even though Jena wrote out the file. The (pretty) “RDF/XML-ABBREV” works for read and write though. Maybe I’m using the wrong read() method? Pretty is good for now anyway. The goal is to have a human-readable / RDF format anyway.
    • Can do some primitive search and navigation-like behavior, but not getting where I want to go. For example, it’s possible to list all the resources:
      ResIterator iter = model.listResourcesWithProperty(prop);
      while(iter.hasNext()){
          Resource r = iter.nextResource();
          StmtIterator iter = resource.listProperties(prop);
          while(iter.hasNext()){
              System.out.println("\t"+iter.nextStatement().getObject().toString());
          }
      }
    • But getting the parent of any of those resources is not supported. It looks like this requires using the Jena Ontology API, so on to the next tutorial…
    • Got Gregg’s simpleCredentials.owl file and was able to parse. Now I need to unpack it and create a dictionary.
    • Finished with the Jena Ontology API . No useful navigation, so very disappointing. Going to take the model.listStatements and see if I can assemble a tree (with relationships?) for the dictionary taxonomy conversion tomorrow.