Phil 12.2.12

7:00 – 6:00 ASRC

  • Wrote up notes from yesterday’s meeting with Aaron
  • Google had lined this up for me: A Study of Factuality, Objectivity and Relevance: Three Desiderata in Large-Scale Information Retrieval?
  • Back to Sociophysics
    • I appear to be working with (maybe?) class ‘C’ social networks, where links connect people indirectly [pg 19].Covered in chapter 7 – Of Flocks, Flows and Transports
  • Starting on NMF LMN tool. I need some interactivity to see how manipulation works
    • Copied the guts over from LMN. I think the best way to do this first is to show the two factored mats as tables and then show the result as text. Then scale rows, cols or cells
    • Changing Labled2DMatrix to handle reading from a given sheet name/number – done
  • Meeting with Don to work through Opinion Dynamics paper math.
    • Word for today – Quotient Set, the splitting of a set into subsets that the rule allows and that it does not. The rule isDigit() compared against set {1,w,3,r,5,h} would produce {{1, 3, 5}, {w, r, h}}

Phil 12.1.16

7:00 – 6:00 ASRC

  • Back to Sociophysics
  • More NMF
    • Run through the row/col mats and get the top N items as topic or document clusters. Look for big jumps? Cluster using DBSCAN?
    • Got the factor matrices’ columns labeled by setting all the columns but one in the row/column weight mat to zero. The recreated matrix can then be sorted by row or column. The thought is that the highest values are the items that are most sensitive. It’s hard to get a feel though. I think I have to build an interactive app that I can watch the effects. Intuitively, since we are building a matrix from the dot product of the rows in the two factor mats, the effects should be matrix wide.
    • I’m also saving the wrong items out from the corpus manager. I need to save the matrix factors, not the recreated matrix. I think one document with two spreadsheets would be a nice way to store. Done. Included the reconstructed matrix since it’s (a) needed to produce the column names and (b) stochastically produced, so it’s uniqe and tightly coupled to the factor matrices.
    • Results for today:
      rMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5, 3, 0, 1, 
      Doc2, 4, 0, 0, 1, 
      Doc3, 1, 1, 0, 5, 
      Doc4, 1, 0, 0, 4, 
      Doc5, 0, 1, 5, 4, 
      
      newMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5.03, 2.91, 4.39, 0.99, 
      Doc2, 3.97, 2.3, 3.64, 0.99, 
      Doc3, 1.09, 0.77, 5.03, 4.99, 
      Doc4, 0.96, 0.67, 4.08, 3.99, 
      Doc5, 2.05, 1.3, 4.9, 4.05, 
      average difference = 0.0757473875544334
      sorted columns {Trm3=22.05575184668782, Trm4=15.022558101226334, Trm1=13.097328480010614, Trm2=7.951735152321482}
      sorted rows {Doc1=13.33704116430316, Doc5=12.301738392587723, Doc3=11.885670818212631, Doc2=10.911189185303712, Doc4=9.691734019839023}
      
      rowMat
       , Trm1-Trm3, Trm4-Trm3, 
      Doc1, 2.33, 0.3, 
      Doc2, 1.83, 0.33, 
      Doc3, 0.4, 2.04, 
      Doc4, 0.36, 1.63, 
      Doc5, 0.87, 1.63, 
      
      colMat
       , Doc1-Doc2, Doc3-Doc5, 
      Trm1, 2.15, 0.11, 
      Trm2, 1.23, 0.14, 
      Trm3, 1.61, 2.15, 
      Trm4, 0.11, 2.43,
  • More BRC? At least verify what my story is.
  • Axios? We are a new media company delivering vital, trustworthy news and analysis in the most efficient, illuminating and shareable ways possible. We offer a mix of original and smartly narrated coverage of media trends, tech, business and politics with expertise, voice AND smart brevity — on a new and innovative mobile platform. At Axios — the Greek word for worthy — we provide only content worthy of people’s time, attention and trust.
  • Dr. Phyllis Schneck  – 3:30pm Thursday, 1 December 2016, UC 310, UMBC
    • Indicators of attack
      • IP address
      • Domain name
      • attachment
      • clusters
      • No PII
    • Reputation system? What kind of feeds do you need? Looking for very high accuracy. Multiple streams for improved statistical power?
    • Got me thinking about a ‘legal-targeted Stuxnet’. Imagine something that was set into legal databases (Legislation, regulation and case law) that simply changed some small percentage of ‘shalls’ to ‘wills’. That could be pretty damaging over the long run. Something smarter could be more subtle and directed, just like the Natanz attack. Wound up stopping by Aaron M’s office and chatted for about an hour about this. Also some potential research discussions before Dec 21.

Phil 11.30.16

7:00 – 3:30 ASRC

  • Wrote up my notes from chat with Shimei. I think the first step is to look through the UTOPIAN paper again and see how (if?) summary and coclustering is being handled.
    • Downloaded her suggested papers
    • It looks like the row and column matricies might be useful and manipulable. Digging into the NMF java class for some more manipulation
    • Added raw, weight and scaled matrices
    • Need to add ranked row, column and cell output for L2DMat – done here’s some data and thoughts:
      rMat
       , D1, D2, D3, D4, 
      U1, 5, 3, 0, 1, 
      U2, 4, 0, 0, 1, 
      U3, 1, 1, 0, 5, 
      U4, 1, 0, 0, 4, 
      U5, 0, 1, 5, 4, 
      
      newMat
       , D1, D2, D3, D4, 
      U1, 5.05, 2.87, 5.26, 1, 
      U2, 3.96, 2.25, 4.27, 1, 
      U3, 1.11, 0.71, 4.4, 4.99, 
      U4, 0.94, 0.6, 3.57, 3.99, 
      U5, 2.35, 1.39, 4.87, 4.05, 
      average difference = 0.09750770110043207
      sorted columns {D3=22.36862672329615, D4=15.038484762558607, D1=13.410342394629499, D2=7.815842574518472}
      sorted rows {U1=14.17790755369198, U5=12.657839100920228, U2=11.485548694067901, U3=11.209516468182759, U4=9.102484638139858}
      
      Manipulting row weights by column
      
      newMat weight col 0 set to 1.0
       , D1, D2, D3, D4, 
      U1, 4.9, 2.76, 4.44, 0, 
      U2, 3.81, 2.15, 3.45, 0, 
      U3, 0.35, 0.2, 0.32, 0, 
      U4, 0.34, 0.19, 0.31, 0, 
      U5, 1.73, 0.98, 1.57, 0, 
      sorted columns {D1=11.121458227331996, D3=10.081893895718448, D2=6.276360972184673, D4=0.0}
      sorted rows {U1=12.101008726739368, U2=9.406070587569038, U5=4.271932099958697, U3=0.869188591004756, U4=0.8315130899632566}
      
      newMat weight col 1 set to 1.0
       , D1, D2, D3, D4, 
      U1, 0.15, 0.1, 0.82, 1, 
      U2, 0.15, 0.1, 0.82, 1, 
      U3, 0.76, 0.51, 4.08, 4.99, 
      U4, 0.61, 0.41, 3.26, 3.99, 
      U5, 0.62, 0.41, 3.31, 4.05, 
      sorted columns {D4=15.038484762558607, D3=12.286732827577703, D1=2.2888841672975038, D2=1.539481602333799}
      sorted rows {U3=10.340327877178003, U5=8.38590700096153, U4=8.2709715481766, U2=2.079478106498862, U1=2.076898826952612}
    • According to Choo, the columns in the factor mats are the latent topics. That means, for example, when all the document columns are zeroed out but one, the high-ranked terms are the topics for that document (And LSI will extract those terms???). And when all the term columns are zeroed out but one, the documents are sorted relevant to that term. Big gaps mean clusters, or maybe just the cluster is up to the first gap???
  • Add this one to the list? Characteristics to look for? Hate Spin: The Twin Political Strategies of Religious Incitement and Offense-Taking
  • Deep Learning MIT book (pdf)
  • Back to Sociophysics.
    • To build a scale-free network, AL Barabási, R Albert in Emergence of scaling in random networks start with a small random network and incrementally add nodes where the probability of connecting a new node with existing nodes is proportional to how many connections the current nodes have.
      network.createInitialNodes(SOME_SMALL_VALUE)
      for(i = 0 to desired)
      	n = createNewNode()
      	totalLinks = countAllLinks()
      	for(j = 0 to network.numNodes)
      		curNode = getNode(j)
      		links = curNode.getLinks
      		probability = links/totalLinks
      		curNode.addNeighbor(n, probability)
      	network.addNode(n)
    • Does node aging matter in this model?
    • Null Models For Social Networks (for comparison and testing)
  • Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources <- One of the most popular articles from 2016 via Altmetric
  • Skype messenger meeting with Aaron and Katy going over the data we have

Phil 11.29.16

7:00 – 5:30 ASRC

  • How Le Monde is taking on fake news
  • Thinking about Jonathan Albright‘s work. How is it crawled? Is it really just inbound links? Can I get the data? I need to ask.
  • Back to Sociophysics.
    • Clustering coefficient (video)
      CC = 0
      numNodes = 0
      for(i = 0 to max)
      	for(j = 0 to max)
      		n = node(i,j)
      		k = n.numNeighbors()
      		a = n.numLinksBetweenNeighbors()
      		n.setNodeCC((2*a)/(k*(k-1)))
      		CC += n.getNodeCC()
      		numNodes++
      CC = CC/numNodes
    • Clustering coefficient ordering: random -> small world -> regular
  • Got the NMF built into CorpusManager. Here’s the first four chapters of Moby Dick as:
    • BOW: there think harpooneer about little landlord sleep could would
    • TF-IDF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • NMF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • BOW/centrality: there think would queequeg could about whale little first
    • TF-IDF/centrality : about harpooneer night landlord stand light nantucket where other
    • NMF/centrality : harpooneer queequeg landlord water nantucket circumstance sailor passenger about
    • (centrality with equalized docs)
  • Meeting with Shimei

Phil 11.28.16

7:00 – 5:00 ASRC

  • Stumbled upon the ACM Transactions on Interactive Intelligent Systems (TIIS). They have two interesting upcoming issues:
  • Jonathan Albright came up on my Twitter feed. He’s doing interesting data journalism. Here’s his thoughts on fake news. It’s really odd that he’s not published peer reviewed. Is this because he’s at a teaching university?
  • Looking through Sociophysics, and finding some interesting references.
    • Minority Opinion Spreading in Random Geometry
      • Abstract: The dynamics of spreading of the minority opinion in public debates (a reform proposal, a behavior change, a military retaliation) is studied using a diffusion reaction model. People move by discrete step on a landscape of random geometry shaped by social life (offices, houses, bars, and restaurants). A perfect world is considered with no advantage to the minority. A one person-one argument principle is applied to determine locally individual mind changes. In case of equality, a collective doubt is evoked which in turn favors the Status Quo. Starting from a large in favor of the proposal initial majority, repeated random size local discussions are found to drive the majority reversal along the minority hostile view. Total opinion refusal is completed within few days. Recent national collective issues are revisited. The model may apply to rumor and fear propagation.
  • Updating intellij and waiting for 497MB to download
  • Continue to generalize NMF. get k tested and implicit in the matrix passing. Start NMF class as part of JavaUtils. Done
  • Start to integrate NMF into CorpusManager. Initially, I’m just going to use it to produce the matrix, like TF-IDF.
    • Computing, now I need to sort and trim
  • Fika with Aaron on writing. Need to ask for his slide deck.
  • Meeting with Wayne, mostly catching up. What book should I give hime? The most tabbed are Sciences of the Artificial, Last Place on Earth, and Social Science.

Phil 11.23.16

7:30 – 10:30 ASRC

  • Wrote up notes from yesterday’s meetings with Don and Shimei.
  • Really just getting ready for T-day, but I ran my list of recipies through the TF-IDF and LMN tools and now I have a nice, sparse matrix that I can try the NMF on.
  • Finish Matrix dot-product code and promote to Labled2DMatrix – done!!

Phil 11.22.16

7:00 – 5:00 ASRC

  • Worked on getting the spreadsheet of conferences, journals and grant started
  • Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
    • When δ increases, the communities become smaller but more densely connected.
    • It should be very interesting to look at belief velocity at different scales.
  • A Plethora of Data Set Repositories
  • More NMF. Getting closer
  • Installing Python on the laptop for discussion with Don
  • Got everything working in java! Need to move the dot product code into Labeled2DMatrix and flesh out the other cases.
    rMat
     , D1, D2, D3, D4, 
    U1, 5, 3, 0, 1, 
    U2, 4, 0, 0, 1, 
    U3, 1, 1, 0, 5, 
    U4, 1, 0, 0, 4, 
    U5, 0, 1, 5, 4, 
    
    rowMat
    
    U1, 0.67, 0.89, 
    U2, 0.36, 0.47, 
    U3, 0.51, 0.27, 
    U4, 0.11, 0.84, 
    U5, 0.23, 0.88, 
    
    colMat
    
    D1, 0.36, 0.68, 
    D2, 0.84, 0.06, 
    D3, 0.07, 0.06, 
    D4, 0.65, 0.16, 
    
    steps = 5000
    
    P
    Array2DRowRealMatrix{{0.1714659334,2.4334642215},{0.2222526463,1.8424266034},{1.8809519431,0.3877676639},{1.5002592207,0.3319796716},{1.398228183,1.5413729554}}
    
    Q
    Array2DRowRealMatrix{{0.1642944844,0.083284122,1.152720993,2.6155442597},{2.0998133805,1.0434120295,2.0884233062,0.228777745}}
    
    rowMat
    
    U1, 0.17, 2.43, 
    U2, 0.22, 1.84, 
    U3, 1.88, 0.39, 
    U4, 1.5, 0.33, 
    U5, 1.4, 1.54, 
    
    colMat
    
    D1, 0.16, 2.1, 
    D2, 0.08, 1.04, 
    D3, 1.15, 2.09, 
    D4, 2.62, 0.23, 
    
    newMat
     , D1, D2, D3, D4, 
    U1, 5.14, 2.55, 5.28, 1.01, 
    U2, 3.91, 1.94, 4.1, 1, 
    U3, 1.12, 0.56, 2.98, 5.01, 
    U4, 0.94, 0.47, 2.42, 4, 
    U5, 3.47, 1.72, 4.83, 4.01,
  • Meeting with Don.
    • Looked through the modelling and UTOPIAN papers, and walked through some of the math. We’ll meet next Friday to try to convert some of the equations into java code
  • Meeting with Shimei
    • There are ways of getting better stability with LDA. Still ok to do NMF, though there may be issues with scaling. That’s where a stable version of LDA might make sense.

Phil 11.21.16

6:45 – 4:45 ASRC

  • Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
  • More NMF
    P = [[ 0.67503659  0.89795272]
     [ 0.36939303  0.47816356]
     [ 0.51019257  0.27772317]
     [ 0.1130504   0.84860109]
     [ 0.23238542  0.88222005]]
    
    Q = [[ 0.36692407  0.6844149 ]
     [ 0.84469693  0.06331073]
     [ 0.07366106  0.06603799]
     [ 0.65677669  0.16947152]]
    
    nP = [[ 0.16286496  2.42456084]
     [ 0.21647521  1.83981127]
     [ 1.9047257   0.39049035]
     [ 1.52103295  0.33509559]
     [ 1.41350212  1.51711067]]
    
    nQ = [[ 0.15875994  2.09665688]
     [ 0.08334172  1.04818927]
     [ 1.16320811  2.09280482]
     [ 2.56431807  0.24424636]]
    
    nQt = [[ 0.15875994  0.08334172  1.16320811  2.56431807]
     [ 2.09665688  1.04818927  2.09280482  0.24424636]]
    
    R = [[5 3 0 1]
     [4 0 0 1]
     [1 1 0 5]
     [1 0 0 4]
     [0 1 5 4]]
    
    nR = [[ 5.10932861  2.55497211  5.26357846  1.00982771]
     [ 3.89182055  1.94651185  4.10217161  1.00447849]
     [ 1.12111842  0.56805092  3.03281247  4.97969837]
     [ 0.94405957  0.4780091   2.47056752  3.98225815]
     [ 3.40526805  1.70802283  4.81921366  3.99521777]]
    • Hard coded the random values for gradient descent to compare python and java
    • Stepping h
  • Sprint stuff?
    • Scrum
    • Sent Jeremy the svn file names for my Vistronix code
  • Fika
  • Meeting with Wayne? Basic catching up. started the spreadsheet of conferences and grants

Phil 11.17.16

7:00 – 10:00, 10:30 – 5:30 ASRC

Phil 11.16.16

7:00 – 4:00 ASRC

Phil 11.14.16

7:00 – 5:00 ASRC

Phil 11.11.16

8:00 – 12:00 – UMBC

  • Finished the IUI reviews
  • Doing Shimei’s review
  • Setting up meeting with Christelle Viauroux
  • Too frazzled to do coding. Reading Last Place on Earth.