7:00 – 5:00 VTX
- Starting to list strawman hypothesis
- Reading Connectivism paper. Very good so far.
- Albert-László Barabási – publications Google Scholar Profile
- LexRank: graph-based lexical centrality as salience in text
- Talked to Thresea about the human rating app/results and sent her this article on Schema.org
- Add doctor disambiguation popup – done
- Add a ‘total results’ search. That shows how many relevant documents exist.
MariaDB [googlecse1]> select distinct search_type, total_results from query_object where total_results > 0 order by total_results desc; +------------------------------------------------+---------------+ | search_type | total_results | +------------------------------------------------+---------------+ | RESTRICTED_COM(Ram Singh: board actions) | 12600 | | RESTRICTED_COM(Ram Singh: criminal) | 7490 | | ALL_ORG(Ram Singh: board actions) | 4200 | | BASELINE(Ram Singh: board actions) | 3360 | | BASELINE(Ram Singh: criminal) | 1880 | | RESTRICTED_COM(Ram Singh: sanctions) | 1580 | | ALL_ORG(Ram Singh: criminal) | 1390 | | ALL_ORG(Ram Singh: sanctions) | 539 | | ALL_GOV(Ram Singh: board actions) | 401 | | BASELINE(Ram Singh: sanctions) | 284 | | ALL_US(Ram Singh: board actions) | 157 | | ALL_EDU(Ram Singh: criminal) | 126 | | ALL_EDU(Ram Singh: board actions) | 125 | | RESTRICTED_COM(Ram Singh: malpractice) | 108 | | ALL_US(Ram Singh: criminal) | 103 | | ALL_GOV(Ram Singh: criminal) | 57 | | ALL_EDU(Ram Singh: sanctions) | 50 | | BASELINE(Ram Singh: malpractice) | 34 | | ALL_ORG(Ram Singh: malpractice) | 31 | | ALL_GOV(Ram Singh: sanctions) | 15 | | RESTRICTED_COM(Russell Johnson: criminal) | 9 | | ALL_US(Ram Singh: sanctions) | 8 | | RESTRICTED_COM(Tommy Osborne: criminal) | 8 | | ALL_EDU(Ram Singh: malpractice) | 7 | | RESTRICTED_COM(Russell Johnson: board actions) | 7 | | RESTRICTED_COM(Tommy Osborne: board actions) | 7 | | RESTRICTED_COM(Tommy Osborne: malpractice) | 7 | | ALL_ORG(Tommy Osborne: board actions) | 5 | | ALL_GOV(Ram Singh: malpractice) | 4 | | ALL_US(Ram Singh: malpractice) | 4 | | ALL_ORG(Tommy Osborne: malpractice) | 3 | | BASELINE(Tommy Osborne: board actions) | 3 | | BASELINE(Tommy Osborne: malpractice) | 3 | | ALL_GOV(Tommy Osborne: board actions) | 2 | | ALL_GOV(Tommy Osborne: criminal) | 2 | | ALL_GOV(Tommy Osborne: malpractice) | 2 | | ALL_GOV(Tommy Osborne: sanctions) | 2 | | ALL_ORG(Tommy Osborne: criminal) | 2 | | RESTRICTED_COM(Tommy Osborne: sanctions) | 2 | | BASELINE(Tommy Osborne: criminal) | 1 | | BASELINE(Tommy Osborne: sanctions) | 1 | | RESTRICTED_COM(Russell Johnson: malpractice) | 1 | | RESTRICTED_COM(Russell Johnson: sanctions) | 1 | +------------------------------------------------+---------------+ 43 rows in set (0.00 sec)
- Need to run about 30 doctors through the system to get statistical significance for making recommendations
- CommonCrawl vs. Google approximation. For this analysis, I listed all the domains that produced a ‘flaggable match’ and fed them into the common crawl index search for November 2015 (the most recent at the time of this writing). In the results listed below, the number indicates the number of blocks stored in the CommonCrawl. A value of zero indicates that the CommonCrawl index did not contain any reference to that domain:
1 - w3.health.state.ny.us 6 - www.consumerwatchdog.org 2 - law.resource.org 3 - www.ncmedboard.org 40 - caselaw.findlaw.com 0 - www.courtlistener.com 1 - www.rfhha.org 1 - www.dhp.virginia.gov 2 - www.vahealthprovider.com 0 - w3.nyhealth.gov 2 - medboard.nv.gov 2 - www.courts.state.va.us 0 - www.physicianus.org 0 - wwwapps.ncmedboard.org 240 - www.healthgrades.com 0 - www.dos.pa.gov 3 - law.justia.com 3 - ezdoctor.com
- As can be seen, 5 out of 18 domains, or approximately 27% of the domains containing useful information are missing. Of the remaining sites, it is an open question as to whether the crawl contains the full data from the site.
- Here’s the ratios of search results to hits
search type pertenence relevance ratio ALL_GOV(Tommy Osborne: board actions) 2 2 100.00% ALL_GOV(Tommy Osborne: criminal) 2 2 100.00% ALL_GOV(Tommy Osborne: malpractice) 2 2 100.00% ALL_GOV(Tommy Osborne: sanctions) 2 2 100.00% BASELINE(Tommy Osborne: criminal) 1 1 100.00% BASELINE(Tommy Osborne: sanctions) 1 1 100.00% RESTRICTED_COM(Russell Johnson: malpractice) 1 1 100.00% ALL_ORG(Tommy Osborne: malpractice) 2 3 66.67% ALL_ORG(Tommy Osborne: board actions) 3 5 60.00% RESTRICTED_COM(Tommy Osborne: board actions) 4 7 57.14% ALL_GOV(Ram Singh: malpractice) 2 4 50.00% RESTRICTED_COM(Tommy Osborne: sanctions) 1 2 50.00% BASELINE(Tommy Osborne: board actions) 1 3 33.33% BASELINE(Tommy Osborne: malpractice) 1 3 33.33% RESTRICTED_COM(Russell Johnson: board actions) 2 7 28.57% RESTRICTED_COM(Tommy Osborne: malpractice) 2 7 28.57% ALL_US(Ram Singh: malpractice) 1 4 25.00% ALL_GOV(Ram Singh: sanctions) 2 15 13.33% RESTRICTED_COM(Tommy Osborne: criminal) 1 8 12.50% ALL_ORG(Ram Singh: malpractice) 3 31 9.68% ALL_GOV(Ram Singh: criminal) 1 57 1.75% ALL_GOV(Ram Singh: board actions) 4 401 1.00% ALL_US(Ram Singh: criminal) 1 103 0.97% RESTRICTED_COM(Ram Singh: malpractice) 1 108 0.93% ALL_ORG(Ram Singh: criminal) 2 1390 0.14% ALL_ORG(Ram Singh: board actions) 3 4200 0.07% RESTRICTED_COM(Ram Singh: criminal) 2 7490 0.03% RESTRICTED_COM(Ram Singh: board actions) 2 12600 0.02%