Phil 5.20.16

7:00 – 3:30

  • Writing
  • Going to try LSI. I think the term clustering is simply the sum if the TF-IDF across docs by term. That should give a topic list. Then use that for centrality calculations? Take the top n words?
    • Actually, then the user could group words into concepts and that could make a smaller matrix where the concept count is the union of the counts of its component terms.
  • Have a LSI-lite version going that sums the TF-IDF scores and then sorts based on the sum of all scores * (number of docs with score / number of docs). Then sort and take the top n terms.
  • Need to multiply the matrix by something so that the count gets populated with something reasonable. Maybe 100? Tried that – it looks good.
  • Got the PDF parsing working. Need to get it to work with webpages next and try it on Moby Dick. Then output from the flag data
  • Need to make sure that I use the above pointing at the demo system. From Andy’s email:

    Yes …looks you are looking at dev….in Confluence, search on environment details…that Will give you the urls for the dashboards on dev, ci and demo…we are working on demo now.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.