Phil 5.20.16

7:00 – 3:30

  • Writing
  • Going to try LSI. I think the term clustering is simply the sum if the TF-IDF across docs by term. That should give a topic list. Then use that for centrality calculations? Take the top n words?
    • Actually, then the user could group words into concepts and that could make a smaller matrix where the concept count is the union of the counts of its component terms.
  • Have a LSI-lite version going that sums the TF-IDF scores and then sorts based on the sum of all scores * (number of docs with score / number of docs). Then sort and take the top n terms.
  • Need to multiply the matrix by something so that the count gets populated with something reasonable. Maybe 100? Tried that – it looks good.
  • Got the PDF parsing working. Need to get it to work with webpages next and try it on Moby Dick. Then output from the flag data
  • Need to make sure that I use the above pointing at the demo system. From Andy’s email:

    Yes …looks you are looking at dev….in Confluence, search on environment details…that Will give you the urls for the dashboards on dev, ci and demo…we are working on demo now.

