7:00 – 3:30 VTX
- Continuing The Hybrid Representation Model for Web Document Classification.
- Finished. That one’s a keeper.
- Based on a discussion with a retired cop (Deputy Sheriff of Worcester County, Criminal, Narcotic, etc) who mentioned the Reid technique for evaluating truth telling, I thought it might be a good idea to look for an overview of the field (snowball methods!). So I’m starting Eliciting Information and Detecting Lies in Intelligence Interviewing: An Overview Of Recent Research. Both authors, Pär Anders Granhag and Aldert Vrij, have additional publications on credibility. There are quite a few papers on cognitive load, so that would be an interesting piece to incorporate into the interface…
- Mothballing the Ontology to Dictionary work for a bit
- Stanford Entity Resolution Framework (SERF)
- Learning-based Entity Resolution with MapReduce
- Palantir Gotham
- IBM Infosphere
- A taxonomy of tools that support the fluent and flexible use of visualizations
- Modern Information Retrieval: A Brief Overview (By Google in 2001. Describes how all the pieces work)
- Starting on White Paper
- Definitions
- Precision – the fraction of retrieved instances that are relevant
- We can measure this. In the top N results from our test query, how many were useful?
- Recall – the fraction of relevant instances that are retrieved. We can’t measure this from Google, but we could with a static repository like CommonCrawl.
- Rank – the ordering of the returned result, determined by some algorithm (i.e. The Eigenvector from PageRank)
- Entity Resolution
- Precision – the fraction of retrieved instances that are relevant
- Previous Work
- Research
- Other Systems
- The problems as I see them
- Finding the Corpus to search for entities (best signal-to-noise)
- Finding correct entities within the corpus
- Finding information that correspond to to Flags
- Associating Flags with Entities
- Ordering
- The current model
- No baseline data currently exists
- Building ‘Gold Standard’ data to aid in productionAlso, here’s a Google video showing how Google uses human raters to build ‘gold standard’ data to evaluate information retrieval quality: https://www.youtube.com/watch?v=nmo3z8pHX1E
- Improving the current model
- Mechanical Turk
- Alternate models
- Finding the Corpus to search for entities (best signal-to-noise)
- Finding correct entities within the corpus
- Finding information that correspond to to Flags
- Associating Flags with Entities
- Conclusions and Recommendations
- Definitions
