Phil 1.29.16

7:00 – 3:30 VTX

Continuing The Hybrid Representation Model for Web Document Classification.
- Finished. That one’s a keeper.
Based on a discussion with a retired cop (Deputy Sheriff of Worcester County, Criminal, Narcotic, etc) who mentioned the Reid technique for evaluating truth telling, I thought it might be a good idea to look for an overview of the field (snowball methods!). So I’m starting Eliciting Information and Detecting Lies in Intelligence Interviewing: An Overview Of Recent Research. Both authors, Pär Anders Granhag and Aldert Vrij, have additional publications on credibility. There are quite a few papers on cognitive load, so that would be an interesting piece to incorporate into the interface…
Mothballing the Ontology to Dictionary work for a bit
Stanford Entity Resolution Framework (SERF)
Learning-based Entity Resolution with MapReduce
Palantir Gotham
IBM Infosphere
A taxonomy of tools that support the fluent and flexible use of visualizations
Modern Information Retrieval: A Brief Overview (By Google in 2001. Describes how all the pieces work)
Starting on White Paper
- Definitions
  - Precision – the fraction of retrieved instances that are relevant
    - We can measure this. In the top N results from our test query, how many were useful?
  - Recall – the fraction of relevant instances that are retrieved. We can’t measure this from Google, but we could with a static repository like CommonCrawl.
  - Rank – the ordering of the returned result, determined by some algorithm (i.e. The Eigenvector from PageRank)
  - Entity Resolution
- Previous Work
  - Research
  - Other Systems
- The problems as I see them
  - Finding the Corpus to search for entities (best signal-to-noise)
    - Finding reputable documents also needs human-evaluated documents
      - Guidelines for raters. An interview with a rater describing the work
    - Look for words or the words in back links pointing to the document
  - Finding correct entities within the corpus
  - Finding information that correspond to to Flags
  - Associating Flags with Entities
  - Ordering
- The current model
  - No baseline data currently exists
  - Building ‘Gold Standard’ data to aid in productionAlso, here’s a Google video showing how Google uses human raters to build ‘gold standard’ data to evaluate information retrieval quality: https://www.youtube.com/watch?v=nmo3z8pHX1E
- Improving the current model
  - Mechanical Turk
- Alternate models
  - Finding the Corpus to search for entities (best signal-to-noise)
  - Finding correct entities within the corpus
  - Finding information that correspond to to Flags
  - Associating Flags with Entities
- Conclusions and Recommendations

viztales

Dimension reduction, State, Orientation, and Speed

Phil 1.29.16

Share this:

Related