9:00 – 4:00VTX
- Seminar today from 10:00 – 12:00. Need to send Coleen an email asking how to charge that. Done
- Need to write a brief description of each coding term. How convenient! Atlas allows you to edit code description and then changes the icon. I am liking this package…
- Back to reading about interrogation. Done. Not directly related to what I’m doing but still interesting was the section on the Scharff technique
- Adding the Armed Conflict & Event Data Project User’s Manual. A nice example of good coding and definitions, I think. Also pointed to On the Accuracy of Media-based Conflict Event Data, which looks like a must-read.
- Ok, Let’s get back to better searches
- Looked at common crawl and the common crawl index some more. I’m worried that it misses smaller targets, as philfeldman.com doesn’t show, and that’s been up for years. We’ll come back to that later if I can’t make Google place nicer?
- Playing with the google search API(s)
- This lovely example (from Google yet) seems to provide everything you need in JSON. Even without a key…
- Their example: https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Paris%20Hilton&userip=USERS-IP-ADDRESS
- A version that excludes all .com sites and irs.gov: https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=1040+-site%3A*.com+-site%3Airs.gov
- Wow – you can get back links
- Many options, including start and num. Num doesn’t seem to work in JSON, but start does (the first value is zero). So you seem to be limited to 4 returns at a time?
- Same query starting at the 20th result
- Looks like a complete list of operators
- So now I’m going to try getting better provider queries
- https://www.google.com/search?safe=off&q=ramh+singh+malpractice+-site%3A+healthgrades.com
- This kinda works. It seems to exclude a lot more than I was expecting. healthgrades is gone, but so is a bunch of other sites like doctorwiki.com
- Regions work though: https://www.google.com/search?q=%22ram+singh%22+malpractice+site%3A.org&cr=countryUS
- When Aaron is in tomorrow, I’ll ask him how the CSE/JSON integration works, and where to get ids. I got one from Google, but is sure doesn’t look right.
