Category Archives: thesis

Phil 2.15.16

7:30 – 1:30 VTX

Phil 2.12.16

6:30 – 4:30 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Create xml configuration file
  • Integrate Flyway?
  • Meeting on rating tool. Thoughts:
    • Add a ‘I goofed’ button to the GUI (or maybe a ‘back’ button that lets you change the rating?
    • Add more info that pops up medical provider.
    • Add an analytics app that looks for ratings that disagree, either as outliers (watch out for that reviewer) or there is disagreement (are we having problems with terms, fuzzy matching, or what?)
    • Add a second app that tags the ontology onto the ‘Flaggable Match’
    • Write up a guidance manual for edge conditions. Comes up when you click ‘help’
    • When an url comes up that has already been reviewed more than N times and the reviews match substantially (A majority? – means odd numbers of reviews) for the same provider don’t run that result item, just add a copy of the rating object wit the name of (‘computed’)
  • Return from NJ

Phil 2.11.16

6:00 – 4:00 VTX

  • Continuing Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Need to see if I can get this on Monday: Rethinking Journalism: trust and participation in a transformed news landscape. Got the kindle book.
  • Need to add a menubar to the Gui app that has a ‘data’ and ‘queries’ tab. Data runs the data generation code. Queries has a list of questions that clears the output and then sends the results to the text area.
  • Still need to move the db to a server. Just realized that it could be a MySql db on Dreamhost too. Having trouble with that. It might be the eclipse jar? Here’s the hibernate jar location in maven:
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.0-api</artifactId>
    <version>1.0.1.Final</version>
  • Gave up on connecting to Dreamhost. I think it’s a permissions thing. Asked Heath to look into creating a stable DB somewhere. He needs to talk to Damien.
  • Webhose.io – direct access to live & structured data from millions of sources.
  • Search by date: https://support.google.com/news/answer/3334?hl=en
    • Google news search that produces Json for the last 24 hours:
      ?q=malpractice&safe=off&hl=en&gl=us&authuser=0&tbm=nws&source=lnt&tbs=qdr:d
  • Played around with a bunch of queries, but in the end, I figured that it was better to write the whole works out in a .csv file and do pivot tables in Excel.
  • Adding the ability to read a config file to set the search engines, lables, etc for generation.

Data Architecture Meeting 2.11.15

Testing what we have

  • Relevance score
  • Pertinence score
  • Charts for management

Vinny

  • Terminology
  • gov
  • Bias towards trustworthy unstructured sources.
  • What about getting structured data.

Aaron

  • Isolate V1 capability
  • Metrics!
  • We need the structured data!!

Matt

  • Dsds

Scott

  • Questions about unstructured query

Phil 2.10.16

Phil 8:00 – 6:00 VTX

  • Finished Anonymity Loves Company – Anonymous Web Transactions with Crowds
  • Figured out how to use code families. Not obvious at all fromthe documentation (too many types of families!), but obvious once you see it. Just select one or more codes in the code manager, right-click in the ‘family’ pane and select ‘New from Selected Items’
  • Enough with the cryptography and back to people! Participatory journalism – the (r)evolution that wasn’t. Content and user behavior in Sweden 2007–2013
  • Up to NJ with Aaron for the rest of the week.
  • Start adding capability to rate existing query results. Done
  • Some output!
    MariaDB [googlecse1]> select search_type, display_link, rating, date_rated, user_name from view_rated_items order by rating;
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | search_type                         | display_link                 | rating          | date_rated          | user_name |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    | ALL_ORG(Ram Singh: malpractice)     | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:43:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | caselaw.findlaw.com          | flaggable match | 2016-02-10 15:37:25 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:26:19 | Phil      |
    | ALL_US(Ram Singh: criminal)         | w3.health.state.ny.us        | flaggable match | 2016-02-10 15:17:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.consumerwatchdog.org     | flaggable match | 2016-02-10 15:33:06 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | law.resource.org             | flaggable match | 2016-02-10 15:27:10 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.courtlistener.com        | flaggable match | 2016-02-10 15:39:12 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.ncmedboard.org           | flaggable match | 2016-02-10 15:31:59 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | law.resource.org             | flaggable match | 2016-02-10 15:32:12 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.rfhha.org                | flaggable match | 2016-02-10 15:43:25 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.ncmedboard.org           | flaggable match | 2016-02-10 15:44:43 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | legal           | 2016-02-10 15:36:26 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | imageserver.library.yale.edu | legal           | 2016-02-10 15:36:28 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | legal           | 2016-02-10 15:35:44 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.co.jefferson.tx.us       | legal           | 2016-02-10 15:16:41 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:25:51 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | legal           | 2016-02-10 15:15:23 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | archive.org                  | legal           | 2016-02-10 15:45:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | indiankanoon.org             | legal           | 2016-02-10 15:26:00 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:32:34 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.legalindia.com           | legal           | 2016-02-09 14:57:59 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.norcobar.org             | legal           | 2016-02-10 15:40:44 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.indianbarassociation.org | legal           | 2016-02-10 15:34:02 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | indiankanoon.org             | legal           | 2016-02-10 15:30:54 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.indiankanoon.com         | legal           | 2016-02-10 15:38:38 | Phil      |
    | ALL_US(Ram Singh: board actions)    | docslide.us                  | legal           | 2016-02-09 14:59:35 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | legal           | 2016-02-10 15:43:52 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | ww3.lawschool.cornell.edu    | legal           | 2016-02-10 15:36:20 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.clarkcountymedical.org   | match           | 2016-02-10 15:41:51 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.healthgrades.com         | match           | 2016-02-09 14:57:29 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.intelius.com             | match           | 2016-02-10 15:38:22 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | jmidlifehealth.org           | medical         | 2016-02-10 15:44:17 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | mic.com                      | Not appropriate | 2016-02-10 15:37:09 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | indiankanoon.org             | Not appropriate | 2016-02-10 15:42:24 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:33:18 | Phil      |
    | ALL_ORG(Ram Singh: malpractice)     | www.pbs.org                  | Not appropriate | 2016-02-10 15:45:57 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wtkr.com                     | Not appropriate | 2016-02-10 15:39:23 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.law.fsu.edu              | Not appropriate | 2016-02-10 15:34:38 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | modelminority.com            | Not appropriate | 2016-02-10 15:38:56 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.alasu.edu                | Not appropriate | 2016-02-10 15:34:42 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | wiki.verkata.com             | Not appropriate | 2016-02-10 15:38:30 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | www.facebook.com             | Not appropriate | 2016-02-10 15:37:55 | Phil      |
    | RESTRICTED_COM(Ram Singh: criminal) | search.ancestry.com          | Not appropriate | 2016-02-10 15:37:40 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.academia.edu             | Not appropriate | 2016-02-10 15:35:18 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:36:36 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | lists.washlaw.edu            | Not appropriate | 2016-02-10 15:35:53 | Phil      |
    | ALL_EDU(Ram Singh: criminal)        | www.utexas.edu               | Not appropriate | 2016-02-10 15:34:55 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | netsecu.org                  | Not appropriate | 2016-02-10 15:32:47 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.gutenberg.us             | Not appropriate | 2016-02-09 14:59:57 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.leg.state.mn.us          | Not appropriate | 2016-02-09 14:59:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:59:02 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.acoe.k12.ca.us           | Not appropriate | 2016-02-09 14:58:59 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-09 14:58:30 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 14:58:16 | Phil      |
    | ALL_US(Ram Singh: board actions)    | newweb.altoona.k12.wi.us     | Not appropriate | 2016-02-09 14:58:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.linkedin.com             | Not appropriate | 2016-02-09 14:57:11 | Phil      |
    | BASELINE(Ram Singh: board actions)  | en.wikipedia.org             | Not appropriate | 2016-02-09 14:57:06 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.dailymail.co.uk          | Not appropriate | 2016-02-09 14:57:02 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.ndtv.com                 | Not appropriate | 2016-02-09 14:56:56 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.india.com                | Not appropriate | 2016-02-09 14:56:52 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:52:41 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.youtube.com              | Not appropriate | 2016-02-09 14:48:13 | Phil      |
    | ALL_US(Ram Singh: board actions)    | www.curatedobject.us         | Not appropriate | 2016-02-09 15:00:04 | Phil      |
    | ALL_US(Ram Singh: board actions)    | datab.us                     | Not appropriate | 2016-02-09 15:00:10 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.curatedobject.us         | Not appropriate | 2016-02-10 15:14:14 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | www.acoe.org                 | Not appropriate | 2016-02-10 15:31:06 | Phil      |
    | ALL_ORG(Ram Singh: board actions)   | en.wikipedia.org             | Not appropriate | 2016-02-10 15:30:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | fr.wikipedia.org             | Not appropriate | 2016-02-10 15:28:13 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikipedia.org             | Not appropriate | 2016-02-10 15:26:40 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | www.vacouncilofchurches.org  | Not appropriate | 2016-02-10 15:26:35 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:25:21 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | en.wikisource.org            | Not appropriate | 2016-02-10 15:24:59 | Phil      |
    | ALL_ORG(Ram Singh: criminal)        | ca.wikipedia.org             | Not appropriate | 2016-02-10 15:24:43 | Phil      |
    | ALL_US(Ram Singh: criminal)         | hodges-directory.us          | Not appropriate | 2016-02-10 15:18:46 | Phil      |
    | ALL_US(Ram Singh: criminal)         | docslide.us                  | Not appropriate | 2016-02-10 15:15:52 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:37 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.nhusd.k12.ca.us          | Not appropriate | 2016-02-10 15:15:34 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.acoe.k12.ca.us           | Not appropriate | 2016-02-10 15:15:31 | Phil      |
    | ALL_US(Ram Singh: criminal)         | www.gutenberg.us             | Not appropriate | 2016-02-10 15:14:33 | Phil      |
    | BASELINE(Ram Singh: board actions)  | www.firstpost.com            | Not appropriate | 2016-02-09 14:46:59 | Phil      |
    +-------------------------------------+------------------------------+-----------------+---------------------+-----------+
    80 rows in set (0.02 sec)

Phil 2.9.16

7:00 – 4:00 VTX

  • Finished Publius: A robust, tamper-evident, censorship-resistant web publishing system
  • Starting Anonymity Loves Company – Anonymous Web Transactions with Crowds by Mike Reiter and Aviel Ruben, who was one of the co-authors on the Publius paper.
    • Crowds could probably be built with PeerJS. The ISP would still know traffic, but that’s it.
  • Found this nice article in Communications of the ACM: Schema.org: Evolution of Structured Data on the Web. Nice overview. Very current.
  • The Big List of Naughty Strings
  • Time to combine everything
    • Optional generation of Providers and queries – default is to load them from the DB
    • Run queries from the DB
      • Show the number available and allow a request – done
      • Iterating over the queries and pages. Need to create, append and persist a rating Done
      • Named queries for
        • Queries that have the lowest number of results.ratings – done-ish. Currently it looks for -1 as a flag. Should also look for queries that have unrated results.
        • Queries associated with ‘bad’ providers
        • Queries associated with ‘good’ providers
      • Connect to DB remotely
    • Wrap the app (done, with Launch4j. Very nice!) and test it on the other laptop. Note, it doesn’t have enough disk to install java on. That will have to wait.
    • Packing up the laptop. Debating bringing multi monitor support. I’ll have the other laptop…
    • Gratuitous screenshot: SwingFlashback

Phil 2.4.16

7:00 – 4:00 VTX

  • The way to handle multidimensional (human) ranking of documents (i.e. web pages) is to take the dimensions and and webpages and put them on a matrix? Each page has a greater or lesser score on that dimension. Then apply page rank. Tweak weights until pages order the way we think they should
  • Does “authority” mean quality? predicting expert quality ratings of Web documents
  • LandScan (Oak Ridge Labs)
  • Uppsala Conflict Data Program Geo-referenced Event Dataset
  • Nils Weidmann Dataverse (University of Konstanz)
  • Continuing On the Accuracy of Media-based Conflict Event Data. Done. Wow. And look at all the databases ^^^ !
  • Microsoft bot API
  • Back to GoogleHacking
    • Added ‘CredEngine1’ as BASELINE search engine
    • Looks like we blew through our limits. Using my key. Verified that the BASELINE search runs. That does mean that the current 4 queries factor out to 24 searches (6 search engines * 4 queries)
    • Building search persistent object
    • Building result item object. Actually, building a JasonLoadable base class since this trick is going to be used for the query items and info object
    • Need a result info object that stores the meta information.
    • Just stumbled across a GCS twitter search. Neat.
    • Hitting the CSE and getting results. Tomorrow I’ll finish of the classes that will persist the search results. I’ve got a buffered search result to use instead of hitting google. Although it will still need to pull down the document referenced in the result. I wonder how Jsoup handles pdf and Word documents?

Phil 2/2/16

7:00 –

Phil 2.1.16

9:00 – 4:00VTX

Phil 1.29.16

7:00 – 3:30 VTX

Phil 1.24.16

7:00 – 9:00(am)

  • Boy, that was a lot of snow…
  • Finished Security-Controlled Methods for Statistical Databases. Lots of good stuff, but the main takeaway is that data from each user could be adjusted by a fixed value so that its means and variances would be indistinguishable from some other user. We’d have to save those offsets for differentiation, but those are small values that can be encrypted and even stored offline.
  • Starting Crowdseeding Conflict Data.
    • Just found out about FrontlineSMS and SimLab
    • ACLED (Armed Conflict Location & Event Data Project)
    • We close with reflections on the ethical implications of taking a project like this to scale. During the pilot project we faced no incidents that threatened the safety of the phone holders. However, this might be dierent when the project is scaled up and the attention of armed groups is drawn to it. For both humanitarian and research purposes a project such as Voix des Kivus becomes truly useful only when it is taken to scale; but those are precisely the conditions which might create the greatest risks. We did not assess these risks because we could not bear them ourselves. But given the importance and utility of the data these are risks that others might be better placed to bear.
    • Internal validation seems to help a lot. This really does beg the question as to what the interface should look like to enforce conformity without leading to information overload.
    • So restrict the user choice (like the codes used here), or have the system infer categories? A mix? Maybe like the search autocomplete?
    • Remember, this needs to work for mobile, even SMS. I’m thinking that maybe a system that has a simple question/answer interaction that leads down a tree might be general enough. As the system gets more sophisticated, the text could get more conversational.
    • This could be tested on Twitter as a bot. It would need to keep track of the source’s id to maintain the conversation, and could ask for posts of images, videos, etc.

Phil 1.22.16

6:45 – 2:15 VTX

  • Timesheet day? Nope. Next week.
  • Ok, now that I think I understand Laplace Transforms and why they matter, I think I can get back to Calibrating Noise to Sensitivity in Private Data Analysis. Ok, kinda hit the wall on the math on this one. These aren’t formulas that I would be using at this point in the research. It’s nice to know that they’re here, and can probably help me determine the amount of noise that would be needed in calculating the biometric projection (which inherently removes information/adds noise).
  • Starting on Security-Control  Methods  for  Statistical  Databases: A  Comparative  Study
  • Article on useful AI chatbots. Sent SemanticMachines an email asking about their chatbot technology.
  • Got the name disambiguation working pretty well. Here’s the text:
    • – RateMDs Name Signup | Login Claim Doctor Profile | Claim Doctor Profile See what’s new! Account User Dashboard [[ doctor.name ]] Claim Doctor Profile Reports Admin Sales Admin: Doctor Logout Toggle navigation Menu Find A Doctor Find A Facility Health Library Health Blog Health Forum Doctors › Columbia › Family Doctor / G.P. › Unfollow Follow Share this Doctor: twitter facebook Dr. Robert S. Goodwin Family Doctor / G.P. 29 reviews #9 of 70 Family Doctors / G.P.s in Columbia, Maryland Male Dr Goodwin & Associates Unavailable View Map & ……………plus a lot more ………………..Hospitalizes Infant In Spain Wellness How Did Google Cardboard Save This baby’s life? Health 7 Amazing Stretches To Do On a Plane Follow Us You may also like Dr. Charles L. Crist Family Doctor / G.P. 24 reviews Top Family Doctors / G.P.s in Columbia, MD Dr. Mark V. Sivieri 21 reviews #1 of 70 Dr. Susan B. Brown Schoenfeld 8 reviews #2 of 70 Dr. Nj Udochi 4 reviews #3 of 70 Dr. Sarah L. Connor 4 reviews #4 of 70 Dr. Kisa S. Crosse 7 reviews #5 of 70 Sign up for our newsletter and get the latest health news and tips. Name Email Address Subscribe About RateMDs About Press Contact FAQ Advertise Privacy & Terms Claim Doctor Profile Top Specialties Family G.P. Gynecologist/OBGYN Dentist Orthopedics/Sports Cosmetic Surgeon Dermatologist View all specialties > Top Local Doctors New York Chicago Houston Los Angeles Boston Toronto Philadelphia Follow Us Facebook Twitter Google+ ©2004-2016 RateMDs Inc. – The original and largest doctor rating site.
    • Here’s the list of extracted people:
      PERSON: Robert S. Goodwin
      PERSON: Robert S. Goodwin
      PERSON: L. Crist
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: G
      PERSON: Robert S. Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Goodwin
      PERSON: Ajay Kumar
      PERSON: Charles L. Crist
      PERSON: Mark V. Sivieri
      PERSON: B. Brown Schoenfeld
      PERSON: L. Connor
      PERSON: S. Crosse
    • And here some tests against that set (low scores are better. Information Distance):
      Best match for Robert S. Goodwin is PERSON: Robert S. Goodwin (score = 0.0)
      Best match for Goodwin Robert S. is PERSON: Robert S. Goodwin (score = 0.0)
      Best match for Dr. Goodwin is PERSON: Robert S. Goodwin (score = 1.8)
      Best match for Bob Goodwin is PERSON: Robert S. Goodwin (score = 2.0)
      Best match for Rob Goodman is PERSON: Robert S. Goodwin (score = 2.6)
  • So I can cluster together similar (and misspelled) words, and SNLP hands me information about DATE, DURATION, PERSON, ORGANIZATION, LOCATION
  • Don’t know why I didn’t see this before – this is the page for the NER with associated papers. That’s kind as close to a guide as I think you’ll find in this system

Phil 1.21.16

7:00 – 4:00 VTX

  • Inverse Laplace examples
  • Dirac delta function
  • Useful link of the day: Firefox user agent strings
  • Design Overview presentation.
  • Working on (simple!) name disambiguation
    • Building word chains of sequential tokens that are entities (PERSON and ORGANIZATION) Done
    • Given a name, split by spaces and get best match on last name, then look ahead one or two words for best match on first name. If both sets are triples, then check the middle. Wound up iterating over all the elements looking for the best match. This does let things like reverse order work. Not sure if it’s best
    • Checks need to look for initials for first and middle name in source and target. Still working on this one.
    • Results (lower is better):
      ------------------------------
      Robert S. Goodwin
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: L. Crist score = 6.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: G score = 2.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Ajay Kumar score = 9.0
      PERSON: Charles L. Crist score = 13.0
      PERSON: Mark V. Sivieri score = 10.0
      PERSON: B. Brown Schoenfeld score = 13.0
      PERSON: L. Connor score = 6.0
      PERSON: S. Crosse score = 6.0
      
      ------------------------------
      Goodwin Robert S.
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: L. Crist score = 6.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: G score = 2.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Ajay Kumar score = 9.0
      PERSON: Charles L. Crist score = 13.0
      PERSON: Mark V. Sivieri score = 10.0
      PERSON: B. Brown Schoenfeld score = 13.0
      PERSON: L. Connor score = 6.0
      PERSON: S. Crosse score = 6.0

Phil 1.20.16

7:00 – 5:30 VTX

 

Phil 1.18.16

7:00 – 4:00 VTX

  • Started Calibrating Noise to Sensitivity in Private Data Analysis.
    • In TAJ, I think the data source (what’s been typed into the browser) may need to be perturbed before it gets to the server in a way that someone looking at the text can’t figure out who wrote it. The trick here is to create a mapping function that can recognize but not reconstruct. My intuition is that this would resemble a noisy mapping function (Which is why this paper is in the list). Think of a 3D shape. It can cast a shadow that can be recognizable, and with no other information, could not be used to reconstruct the 3D shape. However, multiple samples over time as the shape rotates could be used to reconstruct the shape. To get around that, either the original 3D or the derived 2D shape might have to have noise introduced in some way.
    • And reading the paper means that I have to brush up on Laplace Transforms. Hello, Khan Academy….
  • Next step is getting the dictionary to produce networks. Time to drill down more into the Stanford NLP Looking at the paper and the book to begin with. Chapter 18 looks to be particularly useful. Also downloaded all of 3.6 for reference. It contains the Stanford typed dependencies manual, which is also looking useful (But impossible to use without this guide to the Penn Treebank tags). There don’t seem to be any tutorials to speak of. Interestingly, the Cognitive Computation Group  at Urbana has similar research and better documentation (example), including Medical NLP Packages. Fallback?
  • Checking through the documentation, and both lemmas (edu.stanford.nlp.process.Morphology) and edit distance (edu.stanford.nlp.util.EditDistance) appear to be supported in a straightforward way.
  • Getting a Exception in thread “main” java.lang.RuntimeException: edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model.
  • Which seems to be caused by: Unable to resolve “edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger” as either class path, filename or URL
  • Which is not in the code that I downloaded. Making a fill download from Github. Huh. Not there either.
  • Ah! It’s in the stanford-corenlp-xxx-models.jar.
  • Ok, everything works. It’s installed from the Maven Repo, so it’s version 3.5.2, except for the models, which are 3.6, which are contained in the download mentioned above. I also pulled out the models directory, since some of the examples want to use some files explicitly.  Anyway, I’m not sure what all the pieces do, but I can start playing with parts.

Phil 1.15.16

7:00 – 4:00 VTX

  • Finished Communication Power and Counter-power in the Network Society
  • Started The Future of Journalism: Networked Journalism
  • Here’s a good example of a page with a lot of outbound links, videos and linked images. It’s about the Tunisia uprising before it got real traction. So can we now vet it as a trustworthy source? Is this a good pattern? The post is by Ethan Zuckerman. He directs the Center for Civic Media at MIT, among other things.
  • Public Insight Network: “Every day, sources in the Public Insight Network add contextdepthhumanity and relevance to news stories at trusted newsrooms around the country.”
  • Hey, my computer wasn’t restarted last night. Picking up JPA at Queries and Uncommitted Changes.
  • Updating all the nodes as objects:
    //@NamedQuery(name = "BaseNode.getAll", query = "SELECT bn FROM base_nodes bn")
    TypedQuery<BaseNode> getNodes = em.createNamedQuery("BaseNode.getAll", BaseNode.class);
    List<BaseNode> nodeList = getNodes.getResultList();
    Date date = new Date();
    em.getTransaction().begin();
    for(BaseNode bn : nodeList){
        bn.setLastAccessedOn(date);
        bn.setAccessCount(bn.getAccessCount()+1);
        em.persist(bn);
    }
    em.getTransaction().commit();
  • Updating all nodes with a JPQL call:
    //@NamedQuery(name = "BaseNode.touchAll", query = "UPDATE base_nodes bn set bn.accessCount = (bn.accessCount+1), bn.lastAccessedOn = :lastAccessed")
    em.getTransaction().begin();
    TypedQuery<BaseNode> touchAllQuery = em.createNamedQuery("BaseNode.touchAll", BaseNode.class);
    touchAllQuery.setParameter("lastAccessed", new Date());
    touchAllQuery.executeUpdate();
    em.getTransaction().commit();
  • And we can even add in query logic. This updates the accessed date and increments the accessed count if it’s not null:
    @NamedQuery(name = "BaseNode.touchAll", query = "UPDATE base_nodes bn " +
            "set bn.accessCount = (bn.accessCount+1), " +
            "bn.lastAccessedOn = :lastAccessed " +
            "where NOT (bn.accessCount IS NULL )")