Phil 3.4.16

VTX 7:00 – 5:00

  • Continuing A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web
    • Adding N. Diakopoulos and M. Naaman. Topicality, Time, and Sentiment in Online News CommentsConference on Human Factors in Computing Systems (CHI) Works in Progress. May, 2011. [PDF] Short! Yay!
    • Added Adaptive Faceted Ranking for Social Media Comments. I think it may touch on my idea of Pertinence ranking using Markov Chains.
  • Scanned Exploiting Social Context for Review Quality Prediction and realized that it’s got some very good hints for markers that can be used to use for machine learning on the doctor records
    Feature Name 	Type 		Feature Description
    NumToken 	Text-Stat 	Total number of tokens.
    NumSent 	Text-Stat 	Total number of sentences.
    UniqWordRatio 	Text-Stat 	Ratio of unique words
    SentLen 	Text-Stat 	Average sentence length.
    CapRatio 	Text-Stat 	Ratio of capitalized sentences.
    POS:NN 		Syntactic 	Ratio of nouns.
    POS:ADJ 	Syntactic 	Ratio of adjectives.
    POS:COMP 	Syntactic 	Ratio of comparatives.
    POS:V: 		Syntactic 	Ratio of verbs.
    POS:RB 		Syntactic 	Ratio of adverbs.
    POS:FW 		Syntactic 	Ratio of foreign words.
    POS:SYM 	Syntactic 	Ratio of symbols.
    POS:CD 		Syntactic 	Ratio of numbers.
    POS:PP 		Syntactic 	Ratio of punctuation symbols.
    KLall 		Conformity 	KL div DKL(Tr||Ti)
    PosSEN 		Sentiment 	Ratio of positive sentiment words.
    NegSEN 		Sentiment 	Ratio of negative sentiment words.
  • This means I need to store the whole page in the rating app so that I can evaluate machine ratings after getting human ratings.
  • Finished the UI part of the display, now to change the DB back end. I’m going to start the DB over again since there is so much new stuff.
  • Cleaning up classes. Moved LoginDialog and CheckboxGroup to utils.
  • Meeting about the relative merits of StanfordNLP and Rosette. We’ll stick with Stanford for now. I have some questions about how Webhose.io will be handled, but Aaron thinks that it can be filtered in the TAS, with a query string preprocessor.