Category Archives: Java

Phil 4.1.16

7:15 – 4:15 VTX

  • Had a bunch of paperwork to do for my folks. All handled now?
  • Continuing What is Trust? A Conceptual Analysis and An Interdisciplinary Model. Done
    • Disposition to Trust. This construct means the extent to which one displays a consistent tendency to be willing to depend on general others across a broad spectrum of situations and persons
      • a general propensity to be willing to depend on others.
      • does not necessarily imply that one believes others to be trustworthy
      • only has a major effect on one’s trust-related behavior when novel
        situations arise, in which the person and situation are unfamiliar
      • Disposition to Trust has two subconstructs, Faith in Humanity and Trusting Stance
        • Faith in Humanity means one assumes others are usually upright, well-meaning, and dependable.
        • Trusting Stance means that, regardless of what one assumes about other people generally, one assumes that one will achieve better outcomes by dealing with people as though they are well-meaning and reliable
      • Because Faith in Humanity relates to assumptions about peoples’ attributes, it is more likely to be an antecedent to Trusting Beliefs (in people) than is Trusting Stance. Trusting Stance may relate more to Trusting Intention, which, depending on the situation, is probably not based wholly on beliefs about the other person.
    • Institution-based Trust means one believes the needed conditions are in place to enable one to anticipate a successful outcome in an endeavor or aspect of one’s life
      • This construct comes from the sociology tradition that people can rely on others because of structures, situations, or roles  that provide assurances (Affordances???) that things will go well
      • Institution-based Trust has two subconstructs, Structural Assurance and Situational Normality.
        • Structural Assurance means one believes that success is likely because guarantees, contracts, regulations, promises, legal recourse, processes, or procedures are in place that assure success
        • Situational Normality means one believes that success is likely because the situation is normal or favorable. (I think that this comes from very primitive parts of our brains. It can be observed in many animals and may be one of those things that separates infant and adult behavior. If you trust too much, you are likely to get eaten..?)
          • Situational Normality means that a properly ordered setting is likely to facilitate a successful venture. When one believes one’s role and others’ roles in the situation are appropriate and conducive to success, then one has a basis for trusting the people in the situation.
          • likely related to Trusting Beliefs and Trusting Intention. A system developer who feels good about the roles and setting in which they work is likely to have Trusting Beliefs about the people in that setting.
    • Trusting Beliefs means one believes (and feels confident in believing) that the other person has one or more traits desirable to one in a situation in which negative consequences are possible.
      • We distinguish four main trusting belief subconstructs, while recognizing that others exist.
        • Trusting Belief-Competence means one believes the other person has the ability or power to do for one what one needs done.
        • Trusting Belief-Benevolence means one believes the other person cares about one and is motivated to act in one’s interest.  A benevolent person does not act opportunistically.
        • Trusting Belief-Integrity means one believes the other person makes good faith agreements, tells the truth, and fulfills promises
        • Trusting Belief-Predictability means one believes the other person’s actions (good or bad) are consistent enough that one can forecast them in a given situation
    • Trusting Intention means one is willing to depend on, or intends to depend on, the other person in a given task or situation  with a feeling of relative security, even though negative consequences are possible
      • Trusting intention subconstructs include Willingness to Depend and Subjective Probability of Depending.
        • Willingness to Depend means one is volitionally prepared to make oneself vulnerable to the other person in a situation by relying on them.
        • Subjective Probability of Depending means the extent to which one forecasts or predicts that one will depend on the other person.
      • Trusting Intention definitions embody five elements synthesized from the trust literature.
        1. The possibility of negative consequences or risk is what makes trust important but problematic.
        2. A readiness to depend or rely on another is central to trusting intention.
        3. A feeling of security means one feels safe, assured, and comfortable (not anxious or fearful) about the prospect of depending on another. Feelings of security reflect the affective side of trusting intention.
        4. Trusting intention is situation-specific.(???? why? Examples?)
        5. Trusting intention involves willingness that is not based on having control or power over the other party. Note that Trusting Intention relates well to the system development power literature because we define it in terms of dependence and control.
    • Another limitation relates to Whetten’s (1989) recommendation that Who and Where conditions should be placed around models.  Whereas we have assumed that the model applies to any kind of relationship between two people (Who) in any situation (Where), this may not be the case. Empirical research is needed to better define the boundary conditions of the model.
  • Starting Technology, Humanness, and Trust: Rethinking Trust in Technology, also by D. Harrison McKnight
    • Page 881 (Basic?) Social Trust: human-like trust constructs of integrity, ability/competence, and benevolence that researchers have traditionally used to measure interpersonal trust.
    • Page 881 (Basic?) System Trust: system-like trust constructs such as reliability,
      functionality, and helpfulness
    • Page 881. First, we hypothesize that technologies can differ in humanness. Second, we predict that users will develop trust in the technology differently depending on whether they perceive it as more or less human-like, which will result in human-like trust having a stronger.  influence on outcomes for more human-like technologies and system-like trust having a stronger influence on outcomes for more system-like technologies. (Cite Kate Bush Deeper Understanding 1989)
    • Here’s the beginning of a thought: What is self-trust? Just thinking about it, it seems to be a sense of the reliability of my future self to do what my present self desires. That’s different from Social Trust, which in the literature is more about integrity, competence and benevolence. It seems closer to system trust in that reliability and functionality are more significant. There are things that I trust that I will do tomorrow: Get up, go to work, exercise if the weather is good enough. But there are also things that I can’t trust myself to do. My future self will almost certainly eat more calories than my current self desires. My grocery shopping behaviors are based around this lack of trust. There are items that I do not bring into my house because I know that they will get eaten (I was going to write that I know that my will is weak around chocolate, but that’s not really it. Or at least, that’s not all of it, or maybe even most of it..). Because (interactive?) information technology is more like a self-amplifier, I wonder if what we think of system trust can be thought of as the trust in ourselves, but the part of ourselves that is more reliable and trustworthy. A search tomorrow will work as well as a search today. Maybe better. And the effectiveness of that search reflect somehow my ability to interact effectively with the external world? This is starting to sound a lot my point of view that living a life in prolonged contact with a compiler changes you in profound ways.
    • So what would that mean? I think it’s a reasonable hypothesis to change search results from focusing on pertinence to revelation. This does not mean that the ‘Ten Blue Links’ need to go away. But it does imply that peripheral information could be just as important, so that a less casually polarized worldview might be developed.
  • Finishing up the CSE version control setup – need to write up the process for confluence – done.
  • Since I need to be able to now read in the Excella data, I was going to look to Gregg’s ontology as a way to determine the table structure. But it’s way too big and nested. In a Person’s description includes a reference to a complete organization, activities, charges, arrests, and it doesn’t even have room for nice things yet (will we have co-authors?). Anyway, To avoid this, I’m going to have basic person characteristics with an associated  StringMaps, NumMaps and DateMaps. Anything that’s not recognized as a column gets added to that. Need to see how persistence will work with that in some testing first.
  • Got the code working. JPA 2 says you should be able to build a map entirely without annotations, but I couldn’t get it to work. Modified JsonLoadable so that it goes through the Json Object and anything that is not a member of the current class is added to HashMaps of PoiOptionalStrings. It should be very straightforward to extend to number and date types. Probably worth doing?

Phil 3.11.16

8:00 – VTX

  • Created new versions of the Friday crawl scheduler, one for GOV, one for ORG.
  • The gap between inaccurate viral news stories and the truth is 13 hours, based on this paper: Hoaxy – A Platform for Tracking Online Misinformation
  • Here’s a rough list on why UGC stored in a graph might be the best way to handle the BestPracticesService.
    • Self generating, self correcting information using incentivized contributions (every time a page you contributed to is used, you get money/medals/other…)
    • Graph database, maybe document elements rather than documents
      BPS has its own network, but it connects to doctors and possibly patients (anonymized?) and their symptoms.
    • Would support Results-driven medicine from a variety of interesting dimensions. For example we could calculate the best ‘route’ from symptoms to treatment using A*. Conversely, we could see how far from the optimal some providers are.
    • Because it’s UGC, there can be a robust mechanism for keeping information current (think Wikipedia) as well as handling disputes
    • Could be opened up as its own diagnostic/RDM tool.
    • A graph model allows for easy determination of provenience.
    • A good paper to look at: http://www.mdpi.com/1660-4601/6/2/492/htm. One of the social sites it looked at was Medscape, which seems to be UGC
  • Got the new Rating App mostly done. Still need to look into inbound links
  • Updated the blacklists on everything

Phil 2.25.16

7:00 – 5:00 VTX

  • Thinking more about the economics of contributing trustworthy information. Recently, I’ve discovered the PBS Idea Channel, which is a show that explores pop culture with a philosophical bent (LA Times review here). For example, Deadpool is explored from a phenomenology perspective. But what’s really interesting and seems unique to me is the relationship of the show with its commenters. For each show, there is a follow-on show where the most interesting comments are discussed by the host, Mike Rugnetta. And the comments are surprisingly cogent and good. I think that this is because Rugnetta is acting like the anchor of an interactive news program where the commenters are the reporters. He sets up the topic, gets the ball rolling, and then incorporates the best comments (stories) to wrap up the story. Interestingly, in a recent comment section on aesthetic (which I can’t find now?), he brings up a comment that about science and philosophy and invites the commenter into a deeper discussion and also discusses the potential of an episode about that.
  • To get a flavor, here’s one of the longer comments (with 25 replies on its own) from the Deadpool show:
    I could actually buy that DeadPool’s ability to understand the medium he’s in if it weren’t for one thing he does very often: references to our world. If his fourth wall breaks were limited to interacting with the panels, making quips and nods about the idea of “readers”, and joking about general comic book (or video game or movie) tropes, then I’d be on board with the idea that he is hyper-aware due to his constant physical torment and knowledge of his own perceptions. however, he somehow has knowledge of things that do not seem to exist in the world he inhabits, such as memes, pop culture references, and things like “Leeroy Jenkins”. His hypersensitivity can explain his knowledge of the medium he’s in (an integral part of the reality he inhabits), but I don’t see a way that it could explain him knowing about things that, as far as I’m aware, do not exist in his reality.
  • Compare that to the comments for the MIT opencourseware intro to MIT 6.034, which I ‘took’ and found well presented and deeply interesting, though not as flashy. Here’s a rough equivalent (with 21 replies):
    wow ..it’s such an overwhelming feeling for a guy like me ..who had no chance in hell of ever getting into MIT or any other ivy’s to be able to listen and learn from this lectures online and that too free. :’)
  • To me, it seems like the Deadpool post is deeply involved with the subject matter of the episode, while the MIT comment is more typical of a YouTube comment in that it is more about the commenter and less about the content. This does imply that working on providing value to good commenting through inclusion in the content of the show can improve the quality and relevance of the comments.
  • To continue the ‘News Anchor’ thought from above, it might be possible to structure a news entity of some kind where different areas (sports, entertainment, local/regional, etc) could have their own anchors that produce interactive content with their commenters. Some additional capability to handle multimedia uploads from commenters should probably be supported and better navigation, but this sounds more to me like a 21st century news product than many other things that I’ve seen. It’s certainly the opposite of the Sweden paper.
  • And speaking of papers, here’s one on YouTube comments: Commenting on YouTube Videos: From Guatemalan Rock to El Big Bang
  • Starting on Incentivizing High-quality User-Generated Content.
    • References look really good. Only 8? For a WWW paper?
    • This is starting to look like what I was trying to find. Nash Equilibrium. Huh. The model predicts, as observed in practice, that if exposure is independent of quality, there will be a flood of low quality contributions in equilibrium. An ideal mechanism in this context would elicit both high quality and high participation in equilibrium.
  • Need to add ‘change password’ option. Done. And now that I know my way around JPA, I like it a lot
  • Added role-based enabling of menu choices
  • The code base could really use a cleanup. We have the classic research->production problem…
  • Adding match/nomatch and blacklist queries. Note that blacklist needs to be by search engine
    • Finished match
    • Finished nomatch
    • Working on Blacklist
    • Create a loop that changes all the QueryObjects so that qo.getUnquotedName() is used and persist.

Phil 2.15.16

7:30 – 1:30 VTX

Phil 2.12.16

6:30 – 4:30 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Create xml configuration file
  • Integrate Flyway?
  • Meeting on rating tool. Thoughts:
    • Add a ‘I goofed’ button to the GUI (or maybe a ‘back’ button that lets you change the rating?
    • Add more info that pops up medical provider.
    • Add an analytics app that looks for ratings that disagree, either as outliers (watch out for that reviewer) or there is disagreement (are we having problems with terms, fuzzy matching, or what?)
    • Add a second app that tags the ontology onto the ‘Flaggable Match’
    • Write up a guidance manual for edge conditions. Comes up when you click ‘help’
    • When an url comes up that has already been reviewed more than N times and the reviews match substantially (A majority? – means odd numbers of reviews) for the same provider don’t run that result item, just add a copy of the rating object wit the name of (‘computed’)
  • Return from NJ

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 2.10.16

Phil 8:00 – 6:00 VTX

  • Finished Anonymity Loves Company – Anonymous Web Transactions with Crowds
  • Figured out how to use code families. Not obvious at all fromthe documentation (too many types of families!), but obvious once you see it. Just select one or more codes in the code manager, right-click in the ‘family’ pane and select ‘New from Selected Items’
  • Enough with the cryptography and back to people! Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Up to NJ with Aaron for the rest of the week.
  • Start adding capability to rate existing query results. Done
  • Some output!
    MariaDB [googlecse1]> select search_type, display_link, rating, date_rated, user_name from view_rated_items order by rating;
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | search_type                         | display_link                 | rating          | date_rated          | user_name |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | ALL_ORG(Ram Singh: malpractice)     | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:43:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | caselaw.findlaw.com          | flaggable match | 2016-02-10 15:37:25 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:26:19 | Phil      |
    | ALL_US(Ram Singh: criminal)         | w3.health.state.ny.us        | flaggable match | 2016-02-10 15:17:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:33:06 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | law.resource.org             | flaggable match | 2016-02-10 15:27:10 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.courtlistener.com        | flaggable match | 2016-02-10 15:39:12 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.ncmedboard.org           | flaggable match | 2016-02-10 15:31:59 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | law.resource.org             | flaggable match | 2016-02-10 15:32:12 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.rfhha.org                | flaggable match | 2016-02-10 15:43:25 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.ncmedboard.org           | flaggable match | 2016-02-10 15:44:43 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | legal           | 2016-02-10 15:36:26 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | imageserver.library.yale.edu | legal           | 2016-02-10 15:36:28 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | legal           | 2016-02-10 15:35:44 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.co.jefferson.tx.us       | legal           | 2016-02-10 15:16:41 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:25:51 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | legal           | 2016-02-10 15:15:23 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | archive.org                  | legal           | 2016-02-10 15:45:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:26:00 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:32:34 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.legalindia.com           | legal           | 2016-02-09 14:57:59 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.norcobar.org             | legal           | 2016-02-10 15:40:44 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.indianbarassociation.org | legal           | 2016-02-10 15:34:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:30:54 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.indiankanoon.com         | legal           | 2016-02-10 15:38:38 | Phil      |
    | ALL_US(Ram Singh: board actions)    | docslide.us                  | legal           | 2016-02-09 14:59:35 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | legal           | 2016-02-10 15:43:52 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | ww3.lawschool.cornell.edu    | legal           | 2016-02-10 15:36:20 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.clarkcountymedical.org   | match           | 2016-02-10 15:41:51 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.healthgrades.com         | match           | 2016-02-09 14:57:29 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.intelius.com             | match           | 2016-02-10 15:38:22 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | jmidlifehealth.org           | medical         | 2016-02-10 15:44:17 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | mic.com                      | Not appropriate | 2016-02-10 15:37:09 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | Not appropriate | 2016-02-10 15:42:24 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:33:18 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.pbs.org                  | Not appropriate | 2016-02-10 15:45:57 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wtkr.com                     | Not appropriate | 2016-02-10 15:39:23 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.law.fsu.edu              | Not appropriate | 2016-02-10 15:34:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | modelminority.com            | Not appropriate | 2016-02-10 15:38:56 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | Not appropriate | 2016-02-10 15:34:42 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wiki.verkata.com             | Not appropriate | 2016-02-10 15:38:30 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.facebook.com             | Not appropriate | 2016-02-10 15:37:55 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | search.ancestry.com          | Not appropriate | 2016-02-10 15:37:40 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | Not appropriate | 2016-02-10 15:35:18 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:36:36 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:35:53 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.utexas.edu               | Not appropriate | 2016-02-10 15:34:55 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | netsecu.org                  | Not appropriate | 2016-02-10 15:32:47 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.gutenberg.us             | Not appropriate | 2016-02-09 14:59:57 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.leg.state.mn.us          | Not appropriate | 2016-02-09 14:59:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:59:02 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.acoe.k12.ca.us           | Not appropriate | 2016-02-09 14:58:59 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:58:30 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 14:58:16 | Phil      |
    | ALL_US(Ram Singh: board actions)    | newweb.altoona.k12.wi.us     | Not appropriate | 2016-02-09 14:58:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.linkedin.com             | Not appropriate | 2016-02-09 14:57:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | en.wikipedia.org             | Not appropriate | 2016-02-09 14:57:06 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.dailymail.co.uk          | Not appropriate | 2016-02-09 14:57:02 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.ndtv.com                 | Not appropriate | 2016-02-09 14:56:56 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.india.com                | Not appropriate | 2016-02-09 14:56:52 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:52:41 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.youtube.com              | Not appropriate | 2016-02-09 14:48:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.curatedobject.us         | Not appropriate | 2016-02-09 15:00:04 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 15:00:10 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.curatedobject.us         | Not appropriate | 2016-02-10 15:14:14 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.acoe.org                 | Not appropriate | 2016-02-10 15:31:06 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | en.wikipedia.org             | Not appropriate | 2016-02-10 15:30:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | fr.wikipedia.org             | Not appropriate | 2016-02-10 15:28:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikipedia.org             | Not appropriate | 2016-02-10 15:26:40 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:26:35 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:25:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikisource.org            | Not appropriate | 2016-02-10 15:24:59 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:24:43 | Phil      |
    | ALL_US(Ram Singh: criminal)         | hodges-directory.us          | Not appropriate | 2016-02-10 15:18:46 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | Not appropriate | 2016-02-10 15:15:52 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:37 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:34 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.acoe.k12.ca.us           | Not appropriate | 2016-02-10 15:15:31 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.gutenberg.us             | Not appropriate | 2016-02-10 15:14:33 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:46:59 | Phil      |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    80 rows in set (0.02 sec)

Phil 2.9.16

7:00 – 4:00 VTX

  • Finished Publius: A robust, tamper-evident, censorship-resistant web publishing system
  • Starting Anonymity Loves Company – Anonymous Web Transactions with Crowds by Mike Reiter and Aviel Ruben, who was one of the co-authors on the Publius paper.
    • Crowds could probably be built with PeerJS. The ISP would still know traffic, but that’s it.
  • Found this nice article in Communications of the ACM: Schema.org: Evolution of Structured Data on the Web. Nice overview. Very current.
  • The Big List of Naughty Strings
  • Time to combine everything
    • Optional generation of Providers and queries – default is to load them from the DB
    • Run queries from the DB
      • Show the number available and allow a request – done
      • Iterating over the queries and pages. Need to create, append and persist a rating Done
      • Named queries for
        • Queries that have the lowest number of results.ratings – done-ish. Currently it looks for -1 as a flag. Should also look for queries that have unrated results.
        • Queries associated with ‘bad’ providers
        • Queries associated with ‘good’ providers
      • Connect to DB remotely
    • Wrap the app (done, with Launch4j. Very nice!) and test it on the other laptop. Note, it doesn’t have enough disk to install java on. That will have to wait.
    • Packing up the laptop. Debating bringing multi monitor support. I’ll have the other laptop…
    • Gratuitous screenshot: SwingFlashback

Phil 2.8.16

7:00 – 5:00 VTX

  • My 401k still isn’t being done right. Sheesh.
  • More Publius: A robust, tamper-evident, censorship-resistant web publishing system
    • Very good introduction, then it dives into the weeds of how the system was implemented and and the cryptologic challenges. Good stuff, and should be addressed. It does imply that the information stored in my system could be encrypted and sharded as an additional layer of protection agains malicious editing. Since in this case, text can have annotations pointing to it but the source should be archival.
    • I think I also need to set up a new doc db of news items that I can use to make the story more readable.
      • Stories of people fooled by misinformation
      • Stories of people damaged by lack of anonymity
      • Stories about citizen journalism
      • Stories about computational journalism
      • Something about CSCW, Wikipedia maybe?
    • Anderson’s Eternity Service?
  • Need to make the ProviderObject persistent. Done
  • Need a rating object – date , who, the rating, anything else? Done-ish
  • Need to make a quick & dirty swing app for people to use – started. Once that’s working, then build the rating object that it will create
  • Need to connect to a remote DB
    • Will also need summary statistics and charts to see how queries do.
    • Will also need to store the good (“match” and “flaggable”) pages for later training.
  • Should make the app stand-alone-ish Jsmooth?
  • Discussion with Mike G., Heath, Bob H., and Theresa on how to integrate current NLP/NER

Phil 2.5.16

6:45 – 4:15 VTX

  • Change the JsonLoaded class to only look at declared fields – done
  • Register for Periscope Charts -done. Callback on Monday?
  • Working on parsing the query result.
    • Had to set the charset to UTF-8. Huh.
    • Can we pull back items by cacheId? Then we don’t need to load the primary store with internet info.
    • Had a STUPID mistake in getting JPA set up. Had all the annotations pointing at each other, but forgot when creating the result objects that I had to pass the ‘parent’ query object in to get the mapping. Sigh.
    • Adding a dirt-simple rating scheme
      • Java app iterates over all the urls returned and the user can pick from:
        1 - not appropriate at all
        2 - medical and or legal
        3 - Correct person
        4 - Correct person with flaggable

        The Java app then either opens the page or downloads and opens the file with the default application.

      • The user picks the value, the result object persists with the rating and we move on to the next item. Right now the DB is on my local machine, but if we made it networkable everyone could rate a few pages. Most of the results should only take a few seconds to evaluate.
  • I have the Google/db code running in one sandbox and the user eval running in another. Monday I’ll integrate them.

Phil 2.4.16

7:00 – 4:00 VTX

  • The way to handle multidimensional (human) ranking of documents (i.e. web pages) is to take the dimensions and and webpages and put them on a matrix? Each page has a greater or lesser score on that dimension. Then apply page rank. Tweak weights until pages order the way we think they should
  • Does “authority” mean quality? predicting expert quality ratings of Web documents
  • LandScan (Oak Ridge Labs)
  • Uppsala Conflict Data Program Geo-referenced Event Dataset
  • Nils Weidmann Dataverse (University of Konstanz)
  • Continuing On the Accuracy of Media-based Conflict Event Data. Done. Wow. And look at all the databases ^^^ !
  • Microsoft bot API
  • Back to GoogleHacking
    • Added ‘CredEngine1’ as BASELINE search engine
    • Looks like we blew through our limits. Using my key. Verified that the BASELINE search runs. That does mean that the current 4 queries factor out to 24 searches (6 search engines * 4 queries)
    • Building search persistent object
    • Building result item object. Actually, building a JasonLoadable base class since this trick is going to be used for the query items and info object
    • Need a result info object that stores the meta information.
    • Just stumbled across a GCS twitter search. Neat.
    • Hitting the CSE and getting results. Tomorrow I’ll finish of the classes that will persist the search results. I’ve got a buffered search result to use instead of hitting google. Although it will still need to pull down the document referenced in the result. I wonder how Jsoup handles pdf and Word documents?

Phil 2.3.16

7:00 – 3:00 VTX

  • Just discovered Publius –  a Web publishing system that is highly resistant to censorship and provides publishers with a high degree of anonymity. No longer active, but produced a paper.
  • Continuing On the Accuracy of Media-based Conflict Event Data. Currently starting Matching Media-based Conflict Reports with Military Records
  • Back to Googlehacking
    • Since I’ve got the provider JSON, setting up objects that I can use for more in-depth parsing. Thinking that this could be an example of ‘code’ in the dictionary. A work can be an object that knows how to look through a section of text to see if it can find itself.
    • I think running several dictionaries over a document could be interesting. For example, using a medical and a legal dictionary on a document would let the system infer malpractice as opposed to a document on foreign aid.
    • Generating the right queries and they work in the browser:
      "Ram Singh"
      	ALL_GOV(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_GOV(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_GOV(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_GOV(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
      	ALL_US(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+sanctions
      	ALL_US(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+criminal
      	ALL_US(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+malpractice
      	ALL_US(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+board+actions
      	ALL_ORG(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+sanctions
      	ALL_ORG(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+criminal
      	ALL_ORG(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+malpractice
      	ALL_ORG(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+board+actions
      	RESTRICTED_COM(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+sanctions
      	RESTRICTED_COM(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+criminal
      	RESTRICTED_COM(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+malpractice
      	RESTRICTED_COM(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+board+actions
      	ALL_EDU(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_EDU(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_EDU(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_EDU(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
  • So the next thing is to start running these queries and looking at the results to see if there are patterns. And I would be further along, but IntelliJ choked when I tried to add JPA. After flailing for a while I just gave up, created a new project, copied all the lib src and persistence directories over, updated the structure, and it all works. Grumble grumble.

Phil 1.27.16

7:00 – 4:00VTX

Phil 1.26.16

7:00 – 3:00 VTX

  • Finished the Crowdseeding paper. I was checking out the authors, and went to Macartan Humphreys’ website. He’s been doing interesting work, and he’s up in NYC at Colombia, so it would be possible to visit. Anyway, there is one paper that looks very interesting: Mixing Methods: A Bayesian Approach. It’s about inferring information from quantitative and qualitative sources. Anyway, it sounds related, both to how I’m putting together my proposal and how the overall system should(?) work.
  • Reviewing a paper. Don’t forget to mention other analytic systems like Palantir Gotham
  • On to Theme-based Retrieval of Web News. And in looking at papers that cite this, found The Hybrid Representation Model for Web Document Classification. Not too impressed with the former. The latter looks like it contains some good overview in the previous works section. One of the authors: Mark Last (lots of data discovery in large data sets)
  • Downloading new IntelliJ. Ok, back to normal and the tutorial.
    • Huh. Tried loading the (compact) “N-TRIPLES” format, which barfed, even though Jena wrote out the file. The (pretty) “RDF/XML-ABBREV” works for read and write though. Maybe I’m using the wrong read() method? Pretty is good for now anyway. The goal is to have a human-readable / RDF format anyway.
    • Can do some primitive search and navigation-like behavior, but not getting where I want to go. For example, it’s possible to list all the resources:
      ResIterator iter = model.listResourcesWithProperty(prop);
      while(iter.hasNext()){
          Resource r = iter.nextResource();
          StmtIterator iter = resource.listProperties(prop);
          while(iter.hasNext()){
              System.out.println("\t"+iter.nextStatement().getObject().toString());
          }
      }
    • But getting the parent of any of those resources is not supported. It looks like this requires using the Jena Ontology API, so on to the next tutorial…
    • Got Gregg’s simpleCredentials.owl file and was able to parse. Now I need to unpack it and create a dictionary.
    • Finished with the Jena Ontology API . No useful navigation, so very disappointing. Going to take the model.listStatements and see if I can assemble a tree (with relationships?) for the dictionary taxonomy conversion tomorrow.

Phil 1.22.16

6:45 – 2:15 VTX

  • Timesheet day? Nope. Next week.
  • Ok, now that I think I understand Laplace Transforms and why they matter, I think I can get back to Calibrating Noise to Sensitivity in Private Data Analysis. Ok, kinda hit the wall on the math on this one. These aren’t formulas that I would be using at this point in the research. It’s nice to know that they’re here, and can probably help me determine the amount of noise that would be needed in calculating the biometric projection (which inherently removes information/adds noise).
  • Starting on Security-Control  Methods  for  Statistical  Databases: A  Comparative  Study
  • Article on useful AI chatbots. Sent SemanticMachines an email asking about their chatbot technology.
  • Got the name disambiguation working pretty well. Here’s the text:
    • – RateMDs Name Signup | Login Claim Doctor Profile | Claim Doctor Profile See what’s new! Account User Dashboard [[ doctor.name ]] Claim Doctor Profile Reports Admin Sales Admin: Doctor Logout Toggle navigation Menu Find A Doctor Find A Facility Health Library Health Blog Health Forum Doctors › Columbia › Family Doctor / G.P. › Unfollow Follow Share this Doctor: twitter facebook Dr. Robert S. Goodwin Family Doctor / G.P. 29 reviews #9 of 70 Family Doctors / G.P.s in Columbia, Maryland Male Dr Goodwin & Associates Unavailable View Map & ……………plus a lot more ………………..Hospitalizes Infant In Spain Wellness How Did Google Cardboard Save This baby’s life? Health 7 Amazing Stretches To Do On a Plane Follow Us You may also like Dr. Charles L. Crist Family Doctor / G.P. 24 reviews Top Family Doctors / G.P.s in Columbia, MD Dr. Mark V. Sivieri 21 reviews #1 of 70 Dr. Susan B. Brown Schoenfeld 8 reviews #2 of 70 Dr. Nj Udochi 4 reviews #3 of 70 Dr. Sarah L. Connor 4 reviews #4 of 70 Dr. Kisa S. Crosse 7 reviews #5 of 70 Sign up for our newsletter and get the latest health news and tips. Name Email Address Subscribe About RateMDs About Press Contact FAQ Advertise Privacy & Terms Claim Doctor Profile Top Specialties Family G.P. Gynecologist/OBGYN Dentist Orthopedics/Sports Cosmetic Surgeon Dermatologist View all specialties > Top Local Doctors New York Chicago Houston Los Angeles Boston Toronto Philadelphia Follow Us Facebook Twitter Google+ ©2004-2016 RateMDs Inc. – The original and largest doctor rating site.
    • Here’s the list of extracted people:
      PERSON: Robert S. Goodwin
      PERSON: Robert S. Goodwin
      PERSON: L. Crist
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: G
      PERSON: Robert S. Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Ajay Kumar
      PERSON: Charles L. Crist
      PERSON: Mark V. Sivieri
      PERSON: B. Brown Schoenfeld
      PERSON: L. Connor
      PERSON: S. Crosse
    • And here some tests against that set (low scores are better. Information Distance):
      Best match for Robert S. Goodwin is PERSON: Robert S. Goodwin (score = 0.0)
      Best match for Goodwin Robert S. is PERSON: Robert S. Goodwin (score = 0.0)
      Best match for Dr. Goodwin is PERSON: Robert S. Goodwin (score = 1.8)
      Best match for Bob Goodwin is PERSON: Robert S. Goodwin (score = 2.0)
      Best match for Rob Goodman is PERSON: Robert S. Goodwin (score = 2.6)
  • So I can cluster together similar (and misspelled) words, and SNLP hands me information about DATE, DURATION, PERSON, ORGANIZATION, LOCATION
  • Don’t know why I didn’t see this before – this is the page for the NER with associated papers. That’s kind as close to a guide as I think you’ll find in this system