7:00 – 3:30
- Writing
- Going to try LSI. I think the term clustering is simply the sum if the TF-IDF across docs by term. That should give a topic list. Then use that for centrality calculations? Take the top n words?
- Actually, then the user could group words into concepts and that could make a smaller matrix where the concept count is the union of the counts of its component terms.
- Have a LSI-lite version going that sums the TF-IDF scores and then sorts based on the sum of all scores * (number of docs with score / number of docs). Then sort and take the top n terms.
- Need to multiply the matrix by something so that the count gets populated with something reasonable. Maybe 100? Tried that – it looks good.
- Got the PDF parsing working. Need to get it to work with webpages next and try it on Moby Dick. Then output from the flag data
https://dockerapps5.eip.nj.vistronix.com:9443/authenticationendpoint/login.do?client_id=w674kmsNj7flgKkTp_t_8ArPES0a&commonAuthCallerPath=%2Foauth2%2Fauthorize&forceAuth=false&passiveAuth=false&redirect_uri=http%3A%2F%2Fdockerapps.vistronix.com%2Flogin&response_type=code&scope=openid&state=RrKxRY&tenantDomain=carbon.super&sessionDataKey=fbcaf4a0-679a-4eed-93df-5464bca702ff&relyingParty=w674kmsNj7flgKkTp_t_8ArPES0a&type=oidc&sp=EIP-CI&isSaaSApp=false&authenticators=BasicAuthenticator:LOCAL http://dockerapps.vistronix.com/gtc-server/physicianservice/flags
- Need to make sure that I use the above pointing at the demo system. From Andy’s email:
Yes …looks you are looking at dev….in Confluence, search on environment details…that Will give you the urls for the dashboards on dev, ci and demo…we are working on demo now.
