Phil 2.23.16

7:00 – 3:30VTX

  • Much needed vacation is now history. I started Information Rules – A Strategic Guide to the Network Economy. Probably not going to read the whole book, but it does address the economics issues I’m thinking about. Though more of a focus on financial transactions. For example, if discusses how people place different values on information, which makes me think about the differences between Sweden, Egypt and Turkey, as well as Crisis Informatics in general.
    • From my notes: This is why there is no blogging in Sweden. Since the reporting is good enough for most people, the only reason people blog was about things that weren’t covered in the news – personal expression or similar arcanum. Where news is not available, the value for this kind of information goes up, and people who respond to the perceived need step in to fill the gap.This is important – an individual can have an information need, but also a perception of information needs in others, and have a need/desire to provide for that need.
  • And based on one of Paul Krugman’s blog entries, I went and found the wikipedia entry on information economics, which looks like it will be worth looking at. This part in particular leaped out at me: The subject of “information economics” is treated under Journal of Economic Literature classification code JEL D8 – Information, Knowledge, and Uncertainty. The present article reflects topics included in that code. There are several subfields of information economics. Information as signal has been described as a kind ofnegative measure of uncertainty.[2] It includes complete and scientific knowledge as special cases. The first insights in information economics related to the economics of information goods.
  • Submitting paperwork for CHIR
  • And, back to normal… Continue to refine the rating app?
    • Make uploading a super user thing. Which means user accounts and passwords. Probably add everyone to a DB and just let them put in/change passwords.
    • Add code to scan the DB for previous pages that had the same rating for the same doctor (and the same term?)
    • Add an analytics app that looks for ratings that disagree, either as outliers (watch out for that reviewer) or there is disagreement (are we having problems with terms, fuzzy matching, or what?)
    • Add a second app that tags the ontology onto the ‘Flaggable Match’
    • Write up a guidance manual for edge conditions. Comes up when you click ‘help’
    • Add a ‘total MATCH’ search. That shows how many relevant documents were returned
    • Add a ‘total NO MATCH’ search. That shows how many non-relevant documents were returned – basically
      select search_type, count(*) as matches, total_results from view_rated_items2 where rating NOT LIKE '%match%' group by search_type;
    • Add a blacklist query that lists all root domains that only show up in non-match results
    • Incorporate Flywaydb
      • Verified that I can generate just the table structure with mysqldump: mysqldump -u xxx-pyyy -d googlecse1 > gcse1Tables.sql
    • Get DB deployed somewhere and validate – talk to Damien and specify what’s needed. He’ll cost out hours. Done
    • Build a web repo that contains gold standard data that we can point a special test GoogleCSE and keep track of return changes.
    • Machine Learning framework
      • Get back up to speed on WEKA
      • probably have to write some java data translator generator code
      • Run some tests, get some results in the interactive mode,
      • Redo programatically, so a collection of urls (text? Yeah, extracted text. Compare Stanford and Alchemy?)
      • Data flow:
        • Raw pages,
        • Cleaned content
        • Machine learning (per provider?) returns scored pages
        • Extraction of flags from highly-ranked pages
  • Took all of the above and rolled it into stories. For points I built an Excel spreadsheet. Turns out that Excel doesn’t have Fibonacci, so I used this version.