Phil 2.17.16

7:00 – 5:00 VTX

  • Starting to list strawman hypothesis
  • Reading Connectivism paper. Very good so far.
  • Albert-László Barabásipublications Google Scholar Profile
  • LexRank: graph-based lexical centrality as salience in text
  • Talked to Thresea about the human rating app/results and sent her this article on Schema.org
  • Add doctor disambiguation popup – done
  • Add a ‘total results’ search. That shows how many relevant documents exist.
    MariaDB [googlecse1]> select distinct search_type, total_results from query_object where total_results > 0 order by total_results desc;
    +------------------------------------------------+---------------+
    | search_type                                    | total_results |
    +------------------------------------------------+---------------+
    | RESTRICTED_COM(Ram Singh: board actions)       |         12600 |
    | RESTRICTED_COM(Ram Singh: criminal)            |          7490 |
    | ALL_ORG(Ram Singh: board actions)              |          4200 |
    | BASELINE(Ram Singh: board actions)             |          3360 |
    | BASELINE(Ram Singh: criminal)                  |          1880 |
    | RESTRICTED_COM(Ram Singh: sanctions)           |          1580 |
    | ALL_ORG(Ram Singh: criminal)                   |          1390 |
    | ALL_ORG(Ram Singh: sanctions)                  |           539 |
    | ALL_GOV(Ram Singh: board actions)              |           401 |
    | BASELINE(Ram Singh: sanctions)                 |           284 |
    | ALL_US(Ram Singh: board actions)               |           157 |
    | ALL_EDU(Ram Singh: criminal)                   |           126 |
    | ALL_EDU(Ram Singh: board actions)              |           125 |
    | RESTRICTED_COM(Ram Singh: malpractice)         |           108 |
    | ALL_US(Ram Singh: criminal)                    |           103 |
    | ALL_GOV(Ram Singh: criminal)                   |            57 |
    | ALL_EDU(Ram Singh: sanctions)                  |            50 |
    | BASELINE(Ram Singh: malpractice)               |            34 |
    | ALL_ORG(Ram Singh: malpractice)                |            31 |
    | ALL_GOV(Ram Singh: sanctions)                  |            15 |
    | RESTRICTED_COM(Russell Johnson: criminal)      |             9 |
    | ALL_US(Ram Singh: sanctions)                   |             8 |
    | RESTRICTED_COM(Tommy Osborne: criminal)        |             8 |
    | ALL_EDU(Ram Singh: malpractice)                |             7 |
    | RESTRICTED_COM(Russell Johnson: board actions) |             7 |
    | RESTRICTED_COM(Tommy Osborne: board actions)   |             7 |
    | RESTRICTED_COM(Tommy Osborne: malpractice)     |             7 |
    | ALL_ORG(Tommy Osborne: board actions)          |             5 |
    | ALL_GOV(Ram Singh: malpractice)                |             4 |
    | ALL_US(Ram Singh: malpractice)                 |             4 |
    | ALL_ORG(Tommy Osborne: malpractice)            |             3 |
    | BASELINE(Tommy Osborne: board actions)         |             3 |
    | BASELINE(Tommy Osborne: malpractice)           |             3 |
    | ALL_GOV(Tommy Osborne: board actions)          |             2 |
    | ALL_GOV(Tommy Osborne: criminal)               |             2 |
    | ALL_GOV(Tommy Osborne: malpractice)            |             2 |
    | ALL_GOV(Tommy Osborne: sanctions)              |             2 |
    | ALL_ORG(Tommy Osborne: criminal)               |             2 |
    | RESTRICTED_COM(Tommy Osborne: sanctions)       |             2 |
    | BASELINE(Tommy Osborne: criminal)              |             1 |
    | BASELINE(Tommy Osborne: sanctions)             |             1 |
    | RESTRICTED_COM(Russell Johnson: malpractice)   |             1 |
    | RESTRICTED_COM(Russell Johnson: sanctions)     |             1 |
    +------------------------------------------------+---------------+
    43 rows in set (0.00 sec)
  • Need to run about 30 doctors through the system to get statistical significance for making recommendations
  • CommonCrawl vs. Google approximation. For this analysis, I listed all the domains that produced a ‘flaggable match’ and fed them into the common crawl index search for November 2015 (the most recent at the time of this writing). In the results listed below, the number indicates the number of blocks stored in the CommonCrawl. A value of zero indicates that the CommonCrawl index did not contain any reference to that domain:
    1 - w3.health.state.ny.us
    6 - www.consumerwatchdog.org
    2 - law.resource.org
    3 - www.ncmedboard.org
    40 - caselaw.findlaw.com
    0 - www.courtlistener.com
    1 - www.rfhha.org
    1 - www.dhp.virginia.gov
    2 - www.vahealthprovider.com
    0 - w3.nyhealth.gov
    2 - medboard.nv.gov
    2 - www.courts.state.va.us
    0 - www.physicianus.org
    0 - wwwapps.ncmedboard.org
    240 - www.healthgrades.com
    0 - www.dos.pa.gov
    3 - law.justia.com
    3 - ezdoctor.com
  • As can be seen, 5 out of 18 domains, or approximately 27% of the domains containing useful information are missing. Of the remaining sites, it is an open question as to whether the crawl contains the full data from the site.
  • Here’s the ratios of search results to hits
    search type			pertenence	relevance	ratio
    ALL_GOV(Tommy Osborne: board actions)	2		2	100.00%
    ALL_GOV(Tommy Osborne: criminal)	2		2	100.00%
    ALL_GOV(Tommy Osborne: malpractice)	2		2	100.00%
    ALL_GOV(Tommy Osborne: sanctions)	2		2	100.00%
    BASELINE(Tommy Osborne: criminal)	1		1	100.00%
    BASELINE(Tommy Osborne: sanctions)	1		1	100.00%
    RESTRICTED_COM(Russell Johnson: malpractice)	1	1	100.00%
    ALL_ORG(Tommy Osborne: malpractice)	2		3	66.67%
    ALL_ORG(Tommy Osborne: board actions)	3		5	60.00%
    RESTRICTED_COM(Tommy Osborne: board actions)	4	7	57.14%
    ALL_GOV(Ram Singh: malpractice)		2		4	50.00%
    RESTRICTED_COM(Tommy Osborne: sanctions)	1	2	50.00%
    BASELINE(Tommy Osborne: board actions)	1		3	33.33%
    BASELINE(Tommy Osborne: malpractice)	1		3	33.33%
    RESTRICTED_COM(Russell Johnson: board actions)	2	7	28.57%
    RESTRICTED_COM(Tommy Osborne: malpractice)	2	7	28.57%
    ALL_US(Ram Singh: malpractice)		1		4	25.00%
    ALL_GOV(Ram Singh: sanctions)		2		15	13.33%
    RESTRICTED_COM(Tommy Osborne: criminal)	1		8	12.50%
    ALL_ORG(Ram Singh: malpractice)		3		31	9.68%
    ALL_GOV(Ram Singh: criminal)		1		57	1.75%
    ALL_GOV(Ram Singh: board actions)	4		401	1.00%
    ALL_US(Ram Singh: criminal)		1		103	0.97%
    RESTRICTED_COM(Ram Singh: malpractice)	1		108	0.93%
    ALL_ORG(Ram Singh: criminal)		2		1390	0.14%
    ALL_ORG(Ram Singh: board actions)	3		4200	0.07%
    RESTRICTED_COM(Ram Singh: criminal)	2		7490	0.03%
    RESTRICTED_COM(Ram Singh: board actions)	2	12600	0.02%
    

Phil 2.16.16

7:00 – 4:00 VTX

  • Interesting stuff from Stephen Wolfram’s blog: Data Science of the Facebook World. Makes me wonder if you can infer age and gender from writing. Is this global or just US?
  • Meeting today with Wayne at 4:00
  • Added Config load
  • Added provider load
  • Added query generation. I realized that there is no need to generate a new query that is the same as a query that has never been run, so after generating all the potential new queries, I compare them to the untested list and remove any common items before persisting.
    • HOWEVER, while doing the calculation, I was adding all the QueryObjects to the ProviderObjects and then deleting them, so that when I persists, I was adding HUGE numbers of lines. Moved the testing around so that it happens before a potential QueryObject is created.

Phil 2.15.16

7:30 – 1:30 VTX

Phil 2.12.16

6:30 – 4:30 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Create xml configuration file
  • Integrate Flyway?
  • Meeting on rating tool. Thoughts:
    • Add a ‘I goofed’ button to the GUI (or maybe a ‘back’ button that lets you change the rating?
    • Add more info that pops up medical provider.
    • Add an analytics app that looks for ratings that disagree, either as outliers (watch out for that reviewer) or there is disagreement (are we having problems with terms, fuzzy matching, or what?)
    • Add a second app that tags the ontology onto the ‘Flaggable Match’
    • Write up a guidance manual for edge conditions. Comes up when you click ‘help’
    • When an url comes up that has already been reviewed more than N times and the reviews match substantially (A majority? – means odd numbers of reviews) for the same provider don’t run that result item, just add a copy of the rating object wit the name of (‘computed’)
  • Return from NJ

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 2.10.16

Phil 8:00 – 6:00 VTX

  • Finished Anonymity Loves Company – Anonymous Web Transactions with Crowds
  • Figured out how to use code families. Not obvious at all fromthe documentation (too many types of families!), but obvious once you see it. Just select one or more codes in the code manager, right-click in the ‘family’ pane and select ‘New from Selected Items’
  • Enough with the cryptography and back to people! Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Up to NJ with Aaron for the rest of the week.
  • Start adding capability to rate existing query results. Done
  • Some output!
    MariaDB [googlecse1]> select search_type, display_link, rating, date_rated, user_name from view_rated_items order by rating;
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | search_type                         | display_link                 | rating          | date_rated          | user_name |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | ALL_ORG(Ram Singh: malpractice)     | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:43:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | caselaw.findlaw.com          | flaggable match | 2016-02-10 15:37:25 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:26:19 | Phil      |
    | ALL_US(Ram Singh: criminal)         | w3.health.state.ny.us        | flaggable match | 2016-02-10 15:17:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:33:06 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | law.resource.org             | flaggable match | 2016-02-10 15:27:10 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.courtlistener.com        | flaggable match | 2016-02-10 15:39:12 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.ncmedboard.org           | flaggable match | 2016-02-10 15:31:59 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | law.resource.org             | flaggable match | 2016-02-10 15:32:12 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.rfhha.org                | flaggable match | 2016-02-10 15:43:25 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.ncmedboard.org           | flaggable match | 2016-02-10 15:44:43 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | legal           | 2016-02-10 15:36:26 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | imageserver.library.yale.edu | legal           | 2016-02-10 15:36:28 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | legal           | 2016-02-10 15:35:44 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.co.jefferson.tx.us       | legal           | 2016-02-10 15:16:41 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:25:51 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | legal           | 2016-02-10 15:15:23 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | archive.org                  | legal           | 2016-02-10 15:45:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:26:00 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:32:34 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.legalindia.com           | legal           | 2016-02-09 14:57:59 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.norcobar.org             | legal           | 2016-02-10 15:40:44 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.indianbarassociation.org | legal           | 2016-02-10 15:34:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:30:54 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.indiankanoon.com         | legal           | 2016-02-10 15:38:38 | Phil      |
    | ALL_US(Ram Singh: board actions)    | docslide.us                  | legal           | 2016-02-09 14:59:35 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | legal           | 2016-02-10 15:43:52 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | ww3.lawschool.cornell.edu    | legal           | 2016-02-10 15:36:20 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.clarkcountymedical.org   | match           | 2016-02-10 15:41:51 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.healthgrades.com         | match           | 2016-02-09 14:57:29 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.intelius.com             | match           | 2016-02-10 15:38:22 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | jmidlifehealth.org           | medical         | 2016-02-10 15:44:17 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | mic.com                      | Not appropriate | 2016-02-10 15:37:09 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | Not appropriate | 2016-02-10 15:42:24 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:33:18 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.pbs.org                  | Not appropriate | 2016-02-10 15:45:57 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wtkr.com                     | Not appropriate | 2016-02-10 15:39:23 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.law.fsu.edu              | Not appropriate | 2016-02-10 15:34:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | modelminority.com            | Not appropriate | 2016-02-10 15:38:56 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | Not appropriate | 2016-02-10 15:34:42 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wiki.verkata.com             | Not appropriate | 2016-02-10 15:38:30 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.facebook.com             | Not appropriate | 2016-02-10 15:37:55 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | search.ancestry.com          | Not appropriate | 2016-02-10 15:37:40 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | Not appropriate | 2016-02-10 15:35:18 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:36:36 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:35:53 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.utexas.edu               | Not appropriate | 2016-02-10 15:34:55 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | netsecu.org                  | Not appropriate | 2016-02-10 15:32:47 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.gutenberg.us             | Not appropriate | 2016-02-09 14:59:57 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.leg.state.mn.us          | Not appropriate | 2016-02-09 14:59:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:59:02 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.acoe.k12.ca.us           | Not appropriate | 2016-02-09 14:58:59 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:58:30 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 14:58:16 | Phil      |
    | ALL_US(Ram Singh: board actions)    | newweb.altoona.k12.wi.us     | Not appropriate | 2016-02-09 14:58:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.linkedin.com             | Not appropriate | 2016-02-09 14:57:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | en.wikipedia.org             | Not appropriate | 2016-02-09 14:57:06 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.dailymail.co.uk          | Not appropriate | 2016-02-09 14:57:02 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.ndtv.com                 | Not appropriate | 2016-02-09 14:56:56 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.india.com                | Not appropriate | 2016-02-09 14:56:52 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:52:41 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.youtube.com              | Not appropriate | 2016-02-09 14:48:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.curatedobject.us         | Not appropriate | 2016-02-09 15:00:04 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 15:00:10 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.curatedobject.us         | Not appropriate | 2016-02-10 15:14:14 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.acoe.org                 | Not appropriate | 2016-02-10 15:31:06 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | en.wikipedia.org             | Not appropriate | 2016-02-10 15:30:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | fr.wikipedia.org             | Not appropriate | 2016-02-10 15:28:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikipedia.org             | Not appropriate | 2016-02-10 15:26:40 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:26:35 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:25:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikisource.org            | Not appropriate | 2016-02-10 15:24:59 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:24:43 | Phil      |
    | ALL_US(Ram Singh: criminal)         | hodges-directory.us          | Not appropriate | 2016-02-10 15:18:46 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | Not appropriate | 2016-02-10 15:15:52 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:37 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:34 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.acoe.k12.ca.us           | Not appropriate | 2016-02-10 15:15:31 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.gutenberg.us             | Not appropriate | 2016-02-10 15:14:33 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:46:59 | Phil      |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    80 rows in set (0.02 sec)

Phil 2.9.16

7:00 – 4:00 VTX

  • Finished Publius: A robust, tamper-evident, censorship-resistant web publishing system
  • Starting Anonymity Loves Company – Anonymous Web Transactions with Crowds by Mike Reiter and Aviel Ruben, who was one of the co-authors on the Publius paper.
    • Crowds could probably be built with PeerJS. The ISP would still know traffic, but that’s it.
  • Found this nice article in Communications of the ACM: Schema.org: Evolution of Structured Data on the Web. Nice overview. Very current.
  • The Big List of Naughty Strings
  • Time to combine everything
    • Optional generation of Providers and queries – default is to load them from the DB
    • Run queries from the DB
      • Show the number available and allow a request – done
      • Iterating over the queries and pages. Need to create, append and persist a rating Done
      • Named queries for
        • Queries that have the lowest number of results.ratings – done-ish. Currently it looks for -1 as a flag. Should also look for queries that have unrated results.
        • Queries associated with ‘bad’ providers
        • Queries associated with ‘good’ providers
      • Connect to DB remotely
    • Wrap the app (done, with Launch4j. Very nice!) and test it on the other laptop. Note, it doesn’t have enough disk to install java on. That will have to wait.
    • Packing up the laptop. Debating bringing multi monitor support. I’ll have the other laptop…
    • Gratuitous screenshot: SwingFlashback

Phil 2.8.16

7:00 – 5:00 VTX

  • My 401k still isn’t being done right. Sheesh.
  • More Publius: A robust, tamper-evident, censorship-resistant web publishing system
    • Very good introduction, then it dives into the weeds of how the system was implemented and and the cryptologic challenges. Good stuff, and should be addressed. It does imply that the information stored in my system could be encrypted and sharded as an additional layer of protection agains malicious editing. Since in this case, text can have annotations pointing to it but the source should be archival.
    • I think I also need to set up a new doc db of news items that I can use to make the story more readable.
      • Stories of people fooled by misinformation
      • Stories of people damaged by lack of anonymity
      • Stories about citizen journalism
      • Stories about computational journalism
      • Something about CSCW, Wikipedia maybe?
    • Anderson’s Eternity Service?
  • Need to make the ProviderObject persistent. Done
  • Need a rating object – date , who, the rating, anything else? Done-ish
  • Need to make a quick & dirty swing app for people to use – started. Once that’s working, then build the rating object that it will create
  • Need to connect to a remote DB
    • Will also need summary statistics and charts to see how queries do.
    • Will also need to store the good (“match” and “flaggable”) pages for later training.
  • Should make the app stand-alone-ish Jsmooth?
  • Discussion with Mike G., Heath, Bob H., and Theresa on how to integrate current NLP/NER

Phil 2.5.16

6:45 – 4:15 VTX

  • Change the JsonLoaded class to only look at declared fields – done
  • Register for Periscope Charts -done. Callback on Monday?
  • Working on parsing the query result.
    • Had to set the charset to UTF-8. Huh.
    • Can we pull back items by cacheId? Then we don’t need to load the primary store with internet info.
    • Had a STUPID mistake in getting JPA set up. Had all the annotations pointing at each other, but forgot when creating the result objects that I had to pass the ‘parent’ query object in to get the mapping. Sigh.
    • Adding a dirt-simple rating scheme
      • Java app iterates over all the urls returned and the user can pick from:
        1 - not appropriate at all
        2 - medical and or legal
        3 - Correct person
        4 - Correct person with flaggable

        The Java app then either opens the page or downloads and opens the file with the default application.

      • The user picks the value, the result object persists with the rating and we move on to the next item. Right now the DB is on my local machine, but if we made it networkable everyone could rate a few pages. Most of the results should only take a few seconds to evaluate.
  • I have the Google/db code running in one sandbox and the user eval running in another. Monday I’ll integrate them.

Phil 2.4.16

7:00 – 4:00 VTX

  • The way to handle multidimensional (human) ranking of documents (i.e. web pages) is to take the dimensions and and webpages and put them on a matrix? Each page has a greater or lesser score on that dimension. Then apply page rank. Tweak weights until pages order the way we think they should
  • Does “authority” mean quality? predicting expert quality ratings of Web documents
  • LandScan (Oak Ridge Labs)
  • Uppsala Conflict Data Program Geo-referenced Event Dataset
  • Nils Weidmann Dataverse (University of Konstanz)
  • Continuing On the Accuracy of Media-based Conflict Event Data. Done. Wow. And look at all the databases ^^^ !
  • Microsoft bot API
  • Back to GoogleHacking
    • Added ‘CredEngine1’ as BASELINE search engine
    • Looks like we blew through our limits. Using my key. Verified that the BASELINE search runs. That does mean that the current 4 queries factor out to 24 searches (6 search engines * 4 queries)
    • Building search persistent object
    • Building result item object. Actually, building a JasonLoadable base class since this trick is going to be used for the query items and info object
    • Need a result info object that stores the meta information.
    • Just stumbled across a GCS twitter search. Neat.
    • Hitting the CSE and getting results. Tomorrow I’ll finish of the classes that will persist the search results. I’ve got a buffered search result to use instead of hitting google. Although it will still need to pull down the document referenced in the result. I wonder how Jsoup handles pdf and Word documents?

Phil 2.3.16

7:00 – 3:00 VTX

  • Just discovered Publius –  a Web publishing system that is highly resistant to censorship and provides publishers with a high degree of anonymity. No longer active, but produced a paper.
  • Continuing On the Accuracy of Media-based Conflict Event Data. Currently starting Matching Media-based Conflict Reports with Military Records
  • Back to Googlehacking
    • Since I’ve got the provider JSON, setting up objects that I can use for more in-depth parsing. Thinking that this could be an example of ‘code’ in the dictionary. A work can be an object that knows how to look through a section of text to see if it can find itself.
    • I think running several dictionaries over a document could be interesting. For example, using a medical and a legal dictionary on a document would let the system infer malpractice as opposed to a document on foreign aid.
    • Generating the right queries and they work in the browser:
      "Ram Singh"
      	ALL_GOV(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_GOV(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_GOV(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_GOV(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
      	ALL_US(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+sanctions
      	ALL_US(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+criminal
      	ALL_US(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+malpractice
      	ALL_US(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:9qwxkhnqoi0&q=%22Ram+Singh%22+VA+board+actions
      	ALL_ORG(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+sanctions
      	ALL_ORG(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+criminal
      	ALL_ORG(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+malpractice
      	ALL_ORG(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:ux1lfnmx3ou&q=%22Ram+Singh%22+VA+board+actions
      	RESTRICTED_COM(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+sanctions
      	RESTRICTED_COM(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+criminal
      	RESTRICTED_COM(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+malpractice
      	RESTRICTED_COM(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:swl1wknfxia&q=%22Ram+Singh%22+VA+board+actions
      	ALL_EDU(sanctions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+sanctions
      	ALL_EDU(criminal): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+criminal
      	ALL_EDU(malpractice): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+malpractice
      	ALL_EDU(board actions): https://www.googleapis.com/customsearch/v1?key=AIzaSyAj6wa-zWuNWXrjeJ4FteuBMKj92mRP4vo&cx=017379340413921634422:lqt7ih7tgci&q=%22Ram+Singh%22+VA+board+actions
  • So the next thing is to start running these queries and looking at the results to see if there are patterns. And I would be further along, but IntelliJ choked when I tried to add JPA. After flailing for a while I just gave up, created a new project, copied all the lib src and persistence directories over, updated the structure, and it all works. Grumble grumble.

Phil 2/2/16

7:00 –

Phil 2.1.16

9:00 – 4:00VTX

Phil 1.29.16

7:00 – 3:30 VTX

Phil 1.28.16

5:30 – 3:30 VTX

  • Continuing The Hybrid Representation Model for Web Document Classification. Good stuff, well written. This paper (An Efficient Algorithm for Discovering Frequent Subgraphs) may be good for recognizing patterns between stories. Possibly also images.
  • Useful page for set symbols that I can never remember: http://www.rapidtables.com/math/symbols/Set_Symbols.htm
  • Finally discovered why the RdfStatementNodes aren’t assembling properly. There is no root statement… Fixed! We can now go from:
    <rdf:RDF
      xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
      xmlns:vCard='http://www.w3.org/2001/vcard-rdf/3.0#'
       >
    
      <rdf:Description rdf:about="http://somewhere/JohnSmith/">
        <vCard:FN>John Smith</vCard:FN>
        <vCard:N rdf:parseType="Resource">
       <vCard:Family>Smith</vCard:Family>
       <vCard:Given>John</vCard:Given>
        </vCard:N>
      </rdf:Description>
    
      <rdf:Description rdf:about="http://somewhere/RebeccaSmith/">
        <vCard:FN>Becky Smith</vCard:FN>
        <vCard:N rdf:parseType="Resource">
       <vCard:Family>Smith</vCard:Family>
       <vCard:Given>Rebecca</vCard:Given>
        </vCard:N>
      </rdf:Description>
    
      <rdf:Description rdf:about="http://somewhere/SarahJones/">
        <vCard:FN>Sarah Jones</vCard:FN>
        <vCard:N rdf:parseType="Resource">
       <vCard:Family>Jones</vCard:Family>
       <vCard:Given>Sarah</vCard:Given>
        </vCard:N>
      </rdf:Description>
    
      <rdf:Description rdf:about="http://somewhere/MattJones/">
        <vCard:FN>Matt Jones</vCard:FN>
        <vCard:N
       vCard:Family="Jones"
       vCard:Given="Matthew"/>
      </rdf:Description>
    
    </rdf:RDF>

    to this:

    [1]: http://somewhere/SarahJones/
    --[5] Subject: http://somewhere/SarahJones/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#FN, Object Literal:  "Sarah Jones"
    --[4] Subject: http://somewhere/SarahJones/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#N, Object(b81a776:1528928f544:-7ffd)
    ----[6] Subject: b81a776:1528928f544:-7ffd, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Given, Object Literal:  "Sarah"
    ----[7] Subject: b81a776:1528928f544:-7ffd, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Family, Object Literal:  "Jones"
    [3]: http://somewhere/MattJones/
    --[15] Subject: http://somewhere/MattJones/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#FN, Object Literal:  "Matt Jones"
    --[14] Subject: http://somewhere/MattJones/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#N, Object(b81a776:1528928f544:-7ffc)
    ----[11] Subject: b81a776:1528928f544:-7ffc, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Family, Object Literal:  "Jones"
    ----[10] Subject: b81a776:1528928f544:-7ffc, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Given, Object Literal:  "Matthew"
    [0]: http://somewhere/RebeccaSmith/
    --[3] Subject: http://somewhere/RebeccaSmith/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#FN, Object Literal:  "Becky Smith"
    --[2] Subject: http://somewhere/RebeccaSmith/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#N, Object(b81a776:1528928f544:-7ffe)
    ----[9] Subject: b81a776:1528928f544:-7ffe, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Family, Object Literal:  "Smith"
    ----[8] Subject: b81a776:1528928f544:-7ffe, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Given, Object Literal:  "Rebecca"
    [2]: http://somewhere/JohnSmith/
    --[12] Subject: http://somewhere/JohnSmith/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#N, Object(b81a776:1528928f544:-7fff)
    ----[1] Subject: b81a776:1528928f544:-7fff, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Family, Object Literal:  "Smith"
    ----[0] Subject: b81a776:1528928f544:-7fff, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#Given, Object Literal:  "John"
    --[13] Subject: http://somewhere/JohnSmith/, Predicate: http://www.w3.org/2001/vcard-rdf/3.0#FN, Object Literal:  "John Smith"
  • Some thoughts about information retrieval using graphs
  • Sent a note to Theresa asking for people to do manual flag extraction