Phil 2.4.16

7:00 – 4:00 VTX

  • The way to handle multidimensional (human) ranking of documents (i.e. web pages) is to take the dimensions and and webpages and put them on a matrix? Each page has a greater or lesser score on that dimension. Then apply page rank. Tweak weights until pages order the way we think they should
  • Does “authority” mean quality? predicting expert quality ratings of Web documents
  • LandScan (Oak Ridge Labs)
  • Uppsala Conflict Data Program Geo-referenced Event Dataset
  • Nils Weidmann Dataverse (University of Konstanz)
  • Continuing On the Accuracy of Media-based Conflict Event Data. Done. Wow. And look at all the databases ^^^ !
  • Microsoft bot API
  • Back to GoogleHacking
    • Added ‘CredEngine1’ as BASELINE search engine
    • Looks like we blew through our limits. Using my key. Verified that the BASELINE search runs. That does mean that the current 4 queries factor out to 24 searches (6 search engines * 4 queries)
    • Building search persistent object
    • Building result item object. Actually, building a JasonLoadable base class since this trick is going to be used for the query items and info object
    • Need a result info object that stores the meta information.
    • Just stumbled across a GCS twitter search. Neat.
    • Hitting the CSE and getting results. Tomorrow I’ll finish of the classes that will persist the search results. I’ve got a buffered search result to use instead of hitting google. Although it will still need to pull down the document referenced in the result. I wonder how Jsoup handles pdf and Word documents?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.