Phil 12.14.16

7:00 – 6:00 ASRC

  • Continuing with Sociophysics
    • Social Phenomena on complex networks
    • Loops of nodes behave differently from trees. what to do about that? I think loops drive the echo chamber process? It is, after all, feedback..
    • There is also a ‘freezing’ issue, where a stable state is reached where two cliques containing different states are lightly connected, but not enough that the neighbors in one clique can be convinced to change their opinion [Fig. 6.2, pg 135]
    • Residual Energy: The difference between the actual energy and the known energy of the perfectly-ordered ground state (full consensus).
  • BRC
    • Need to not split quoted columns
    • Generate a matrix where flags that are Yes -> 1 and empty/null -> o
    • Retrospective
    • Had a thought that NMF might work in tensors as well. I need to rewrite the gradient descent so that it takes an arbitrary number of dimension.
    • Meeting with Nir. Sold him on clustering.

Phil 12.13.16

7:00 – 5:30 ASRC

  • Added a page for my model notes
  • Continuing with Sociophysics
  • Integrity meeting
    • Bellrock needs a summary of why no HCAHPS display. We could put something really simple that does not get rolled into the scoring of the impactors. Like claim numbers. Katy suggests a ‘ticker’ (Sparkline?)  of claim volume/amounts
    • Need Gregg’s suggestion of what the ‘hot button’ indicators should be (action item), based on the claims data.
    • Small number of items that we can be tracking the actual values of (no calculation) that we can roll up and display.
    • NPPS as an input that gets added to stand in for self-reported flags??
    • Goal of the system is to decide whether the users should spend coordination time on expensive patients.
    • NDC – National Drug Code. Counts of drugs by claim period. Where does this come from? Counts of denied?
    • Claim period is monthly?
  • Having issues with getting lines read cleanly. For the time being, I’m going to throw away the bad lines, but later, I want to make persistent objects and get the data from postgres directly.
  • Tensor spectral clustering for partitioning higher-order network structures
  • Multilinear PageRankIn this paper, we first extend the celebrated PageRank modification to a higher-order Markov chain. Although this system has attractive theoretical properties, it is computationally intractable for many interesting problems. We next study a computationally tractable approximation to the higher-order PageRank vector that involves a system of polynomial equations called multilinear PageRank. This is motivated by a novel “spacey random surfer” model, where the surfer remembers bits and pieces of history and is influenced by this information. The underlying stochastic process is an instance of a vertex-reinforced random walk. We develop convergence theory for a simple fixed-point method, a shifted fixed-point method, and a Newton iteration in a particular parameter regime. In marked contrast to the case of the PageRank vector of a Markov chain where the solution is always unique and easy to compute, there are parameter regimes of multilinear PageRank where solutions are not unique and simple algorithms do not converge. We provide a repository of these non-convergent cases that we encountered through exhaustive enumeration and randomly sampling that we believe is useful for future study of the problem

Phil 12.12.16

7:00 – 3:00 ASRC

  • Register for Spring 2017
  • Continuing with Sociophysics
  • Integrity data
    • Pulling all empty columns, and columns that contain encrypted data. Stupid slow and error prone. Going to write a quick app. It should be able to store and output data about all the useful columns in the tables. I should also be able to incorporate Gregg’s data dictionary (And look for toLower(“need definition”))
    • Possibly output persistent queries for Java?
    • Sparsity score?
    • Column matches across tables?
    • Once master matrix is built, then use NMF to cluster columns for predictive capability
  • Sprint review
    • Ticket to fix my vpn access?

Phil 12.9.16

7:00 – 5:00 ASRC

  • Clickbait? How to Make an Amazing Tensorflow Chatbot Easily
  • Good article on information bubbles (Clinton and Trump). Visualizations clearly show bubble and star. Parallel Narratives
  • Continuing with Sociophysics
    • Opinion Formation
    • On [page 62], the authors discuss Phase transitions in a two-parameter model of opinion dynamics with random kinetic exchanges, which they say shows  that agent behavior is unrealistic when there is only positive influence. This could support my Anti-belief element in my model.
    • On [page 66] the authors briefly discuss Opinion dynamics with confidence threshold: an alternative to the Axelrod model, where voter have continuous opinion. Clusters happen when confidence is 0 < c < 1. Interval notation (0, 1)
    • An important point seems to be the number of opinions. For low numbers of discretized opinions and many agents, clustering happens. For the reverse, pretty much every agent has their own opinion. Runs at different number of opinions can show the thresholds that these these transitions happen (called precipitation?).
    • On [page 69] there is a brief mention of a model with a vector of opinions, which sounds a lot like my ‘belief’ being a set of statements. The title looks good too: Different topologies for a herding model of opinion (abstract below)
    • Back to NMF
      • Have sliders adjust scalar matrices
      • Chain the raw, scalar and scaled matrices together.
      • Make sure that the changed matrices are visible.
      • Add a load option that if there is only one matrix, that factorization can be run on that matrix, rather than having to read in the factored spreadsheets. Will need a (k) size select and a ‘calculate factors’ button
    • BRC
      • It’s impossible to get the VPN to work. Asked Stan for a better machine.
      • Matt is getting the full DB downloaded so I can work on it locally. Done
      • Can’t create the DB. Messages into Matt and Gregg.
        • Wound up doing the following to create the postgres db in pgAdminIII
          1. Created user postgres
          2. Created new db npi_raw as a UTF8 db, with postgres as owner.
          3. Created dev_ci and integrety_ci schemas with postgres as owner
          4. Right-click on the dev_ci schema and select restore. Navigate to the npi_raw.backup file and click ‘Restore’
          5. Right-click on the integrity_ci schema and select restore. Navigate to the integrity_ci folder and then restore each of the backup files.

Phil 12.8.16

7:00 – 4:00 ASRC

  • Continuing with Sociophysics
    • Found while reading about opinion dynamics modelling [page 56]: Heterogeneous bounds of confidence: meet, discuss and find consensus! – In this paper, heterogeneous bounds of confidence are studied. The surprising result is that a society of agents with two different bounds of confidence (open-minded and closed minded agents) can find consensus even when both bounds of confidence are significantly below the critical bound of confidence of a homogeneous society. I think that this may represent exploiters and explorers. Need to read.

  • Back to NMF
    • setting up sliders to manipulate row, column or cell for a matrix.
      • Changes to the row or col mat should result in a recalculation of the product matrix. Product matrix should just recalculate based on scalars
      • Got the event handlers set up, working on putting sliders in place
  • Getting the VPN set up so I can access the DB – no luck. Need to take it up with Heath tomorrow
  • 1:00 – 4:00 meeting with Aaron, Theresa, Katy, Greg & Jeremy

Phil 12.7.16

7:00 – 5:30 ASRC

  • Continuing with Sociophysics
  • Opinion Formation
  • NMF WMN
    • Got the product matrix calculated and showing
    • Added RowSum
    • Adding ColumnSum – done
    • Thinking about how to manipulate all this.
      • Adding scalar and scaled matrices. I still need to wire them up, but I’m going to work on modifying the weight matrix first.
      • Got the row, column and value of the selected cell. Next is to set up sliders to scale same on the selected matrix/cell.
  • Progress for today nmftestbed_12_7_16
  • More discussions on data modeling.
  • Meeting with Aaron and Katy.

Phil 12.6.16

7:00 – 4:00 ASRC

  • Getting a server
    • Campus cluster (underutilized) Free Student access (Damian Doyle) HPCF.umbc.edu. Write a note for Wayne to send. Done
  • Note to Wayne on getting a Google grant. Done.
  • Using Java Persistence API (JPA) with Cloud SQL
  • Google datasets
  • Continuing with Sociophysics – nope, maybe this afternoon.
  • Working on incorporating the factor matrices into the NmfModelGui. Looks like I use Maps as discussed at the bottom of this page. Building a L2DMat2Table class that should take care of doing this for each matrix.
  • Need to be able to output a Labled2DMatrix as a map. Working on that. Done. Oh, I need column maps. No, not really, just a complete header list.
  • And it’s working! Here’s the two factored matrices: factoredtables
  • Worked with Aaron on the data document. The tables are poorly put together and let’s just say, not self-documenting.

Phil 12.5.16

7:00 – 5:00 ASRC

  • Here’s Where Donald Trump Gets His News

    • It would be interesting to do a crawl on those sources (weighted by the amount visited?) and do a word model analysis.
  • Continuing with Sociophysics
  • Fixed the header problem in Labled2dMatrix.fromExcelSheet()
  • Here’s the term extraction using NMF with a k = 2:
    chapter-3-the-spouter-inn, chapter-1-loomings, chapter-2-the-carpet-bag
    harpooneer
    water
    landlord
    about
    light
    stand
    thousand
    other
    night
    passenger
    
    chapter-2-the-carpet-bag, chapter-4-the-counterpane, chapter-1-loomings
    queequeg
    landlord
    harpooneer
    sailor
    money
    nantucket
    passenger
    sight
    where
    dream
  • And here are the document clusters. Just looking at this, I see that each term-triple (the top three from the above list) could easily cluster into three groups each:
    queequeg-landlord-harpooneer
    chapter-2-the-carpet-bag 4.158255493
    chapter-4-the-counterpane 2.889135473
    chapter-1-loomings 2.370002854
    chapter-3-the-spouter-inn 0.651989511
    
    harpooneer-water-landlord
    chapter-3-the-spouter-inn 7.111998945
    chapter-1-loomings 1.72918472
    chapter-2-the-carpet-bag 1.157125575
    chapter-4-the-counterpane 0.000284084
  • Working on incorporating the factor matrices into the NmfModelGui. Looks like I use Maps as discussed at the bottom of this page. Building a L2DMat2Table class that should take care of doing this for each matrix.
  • Some thoughts abut Jobs, work and the Amish. Are they economically competitive? Value-add? Why does this work?
  • Fika talk Dr. Quincey Brown
    • Educational Games – Microsoft Imagine Cup
    • Mobile intelligent tutoring systems. Math on iPads
    • Children using touch and gesture. Different than adult usage?
    • Broadening participation in computing, Grace Hopper, etc.
    • Science and Technology fellowships at the AAAS (during healthcare.gov launch)
    • White House nation of makers
    •  No real effort to cultivate press relationships and new media ways to get the word out.Leverage YouTube?
  • Meeting with Wayne
    • Latent social hacking
  • Getting a server
    • Contact Ron for an abandoned Blade
    • Campus cluster (underutilized) Free Student access (Damian Doyle) HPCF.umbc.edu. Write a note for Wayne to send.
    • UMBC agreement with AWS
  • Note to Wayne on getting a Google grant

Phil 12.2.12

7:00 – 6:00 ASRC

  • Wrote up notes from yesterday’s meeting with Aaron
  • Google had lined this up for me: A Study of Factuality, Objectivity and Relevance: Three Desiderata in Large-Scale Information Retrieval?
  • Back to Sociophysics
    • I appear to be working with (maybe?) class ‘C’ social networks, where links connect people indirectly [pg 19].Covered in chapter 7 – Of Flocks, Flows and Transports
  • Starting on NMF LMN tool. I need some interactivity to see how manipulation works
    • Copied the guts over from LMN. I think the best way to do this first is to show the two factored mats as tables and then show the result as text. Then scale rows, cols or cells
    • Changing Labled2DMatrix to handle reading from a given sheet name/number – done
  • Meeting with Don to work through Opinion Dynamics paper math.
    • Word for today – Quotient Set, the splitting of a set into subsets that the rule allows and that it does not. The rule isDigit() compared against set {1,w,3,r,5,h} would produce {{1, 3, 5}, {w, r, h}}

Phil 12.1.16

7:00 – 6:00 ASRC

  • Back to Sociophysics
  • More NMF
    • Run through the row/col mats and get the top N items as topic or document clusters. Look for big jumps? Cluster using DBSCAN?
    • Got the factor matrices’ columns labeled by setting all the columns but one in the row/column weight mat to zero. The recreated matrix can then be sorted by row or column. The thought is that the highest values are the items that are most sensitive. It’s hard to get a feel though. I think I have to build an interactive app that I can watch the effects. Intuitively, since we are building a matrix from the dot product of the rows in the two factor mats, the effects should be matrix wide.
    • I’m also saving the wrong items out from the corpus manager. I need to save the matrix factors, not the recreated matrix. I think one document with two spreadsheets would be a nice way to store. Done. Included the reconstructed matrix since it’s (a) needed to produce the column names and (b) stochastically produced, so it’s uniqe and tightly coupled to the factor matrices.
    • Results for today:
      rMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5, 3, 0, 1, 
      Doc2, 4, 0, 0, 1, 
      Doc3, 1, 1, 0, 5, 
      Doc4, 1, 0, 0, 4, 
      Doc5, 0, 1, 5, 4, 
      
      newMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5.03, 2.91, 4.39, 0.99, 
      Doc2, 3.97, 2.3, 3.64, 0.99, 
      Doc3, 1.09, 0.77, 5.03, 4.99, 
      Doc4, 0.96, 0.67, 4.08, 3.99, 
      Doc5, 2.05, 1.3, 4.9, 4.05, 
      average difference = 0.0757473875544334
      sorted columns {Trm3=22.05575184668782, Trm4=15.022558101226334, Trm1=13.097328480010614, Trm2=7.951735152321482}
      sorted rows {Doc1=13.33704116430316, Doc5=12.301738392587723, Doc3=11.885670818212631, Doc2=10.911189185303712, Doc4=9.691734019839023}
      
      rowMat
       , Trm1-Trm3, Trm4-Trm3, 
      Doc1, 2.33, 0.3, 
      Doc2, 1.83, 0.33, 
      Doc3, 0.4, 2.04, 
      Doc4, 0.36, 1.63, 
      Doc5, 0.87, 1.63, 
      
      colMat
       , Doc1-Doc2, Doc3-Doc5, 
      Trm1, 2.15, 0.11, 
      Trm2, 1.23, 0.14, 
      Trm3, 1.61, 2.15, 
      Trm4, 0.11, 2.43,
  • More BRC? At least verify what my story is.
  • Axios? We are a new media company delivering vital, trustworthy news and analysis in the most efficient, illuminating and shareable ways possible. We offer a mix of original and smartly narrated coverage of media trends, tech, business and politics with expertise, voice AND smart brevity — on a new and innovative mobile platform. At Axios — the Greek word for worthy — we provide only content worthy of people’s time, attention and trust.
  • Dr. Phyllis Schneck  – 3:30pm Thursday, 1 December 2016, UC 310, UMBC
    • Indicators of attack
      • IP address
      • Domain name
      • attachment
      • clusters
      • No PII
    • Reputation system? What kind of feeds do you need? Looking for very high accuracy. Multiple streams for improved statistical power?
    • Got me thinking about a ‘legal-targeted Stuxnet’. Imagine something that was set into legal databases (Legislation, regulation and case law) that simply changed some small percentage of ‘shalls’ to ‘wills’. That could be pretty damaging over the long run. Something smarter could be more subtle and directed, just like the Natanz attack. Wound up stopping by Aaron M’s office and chatted for about an hour about this. Also some potential research discussions before Dec 21.

Phil 11.30.16

7:00 – 3:30 ASRC

  • Wrote up my notes from chat with Shimei. I think the first step is to look through the UTOPIAN paper again and see how (if?) summary and coclustering is being handled.
    • Downloaded her suggested papers
    • It looks like the row and column matricies might be useful and manipulable. Digging into the NMF java class for some more manipulation
    • Added raw, weight and scaled matrices
    • Need to add ranked row, column and cell output for L2DMat – done here’s some data and thoughts:
      rMat
       , D1, D2, D3, D4, 
      U1, 5, 3, 0, 1, 
      U2, 4, 0, 0, 1, 
      U3, 1, 1, 0, 5, 
      U4, 1, 0, 0, 4, 
      U5, 0, 1, 5, 4, 
      
      newMat
       , D1, D2, D3, D4, 
      U1, 5.05, 2.87, 5.26, 1, 
      U2, 3.96, 2.25, 4.27, 1, 
      U3, 1.11, 0.71, 4.4, 4.99, 
      U4, 0.94, 0.6, 3.57, 3.99, 
      U5, 2.35, 1.39, 4.87, 4.05, 
      average difference = 0.09750770110043207
      sorted columns {D3=22.36862672329615, D4=15.038484762558607, D1=13.410342394629499, D2=7.815842574518472}
      sorted rows {U1=14.17790755369198, U5=12.657839100920228, U2=11.485548694067901, U3=11.209516468182759, U4=9.102484638139858}
      
      Manipulting row weights by column
      
      newMat weight col 0 set to 1.0
       , D1, D2, D3, D4, 
      U1, 4.9, 2.76, 4.44, 0, 
      U2, 3.81, 2.15, 3.45, 0, 
      U3, 0.35, 0.2, 0.32, 0, 
      U4, 0.34, 0.19, 0.31, 0, 
      U5, 1.73, 0.98, 1.57, 0, 
      sorted columns {D1=11.121458227331996, D3=10.081893895718448, D2=6.276360972184673, D4=0.0}
      sorted rows {U1=12.101008726739368, U2=9.406070587569038, U5=4.271932099958697, U3=0.869188591004756, U4=0.8315130899632566}
      
      newMat weight col 1 set to 1.0
       , D1, D2, D3, D4, 
      U1, 0.15, 0.1, 0.82, 1, 
      U2, 0.15, 0.1, 0.82, 1, 
      U3, 0.76, 0.51, 4.08, 4.99, 
      U4, 0.61, 0.41, 3.26, 3.99, 
      U5, 0.62, 0.41, 3.31, 4.05, 
      sorted columns {D4=15.038484762558607, D3=12.286732827577703, D1=2.2888841672975038, D2=1.539481602333799}
      sorted rows {U3=10.340327877178003, U5=8.38590700096153, U4=8.2709715481766, U2=2.079478106498862, U1=2.076898826952612}
    • According to Choo, the columns in the factor mats are the latent topics. That means, for example, when all the document columns are zeroed out but one, the high-ranked terms are the topics for that document (And LSI will extract those terms???). And when all the term columns are zeroed out but one, the documents are sorted relevant to that term. Big gaps mean clusters, or maybe just the cluster is up to the first gap???
  • Add this one to the list? Characteristics to look for? Hate Spin: The Twin Political Strategies of Religious Incitement and Offense-Taking
  • Deep Learning MIT book (pdf)
  • Back to Sociophysics.
    • To build a scale-free network, AL Barabási, R Albert in Emergence of scaling in random networks start with a small random network and incrementally add nodes where the probability of connecting a new node with existing nodes is proportional to how many connections the current nodes have.
      network.createInitialNodes(SOME_SMALL_VALUE)
      for(i = 0 to desired)
      	n = createNewNode()
      	totalLinks = countAllLinks()
      	for(j = 0 to network.numNodes)
      		curNode = getNode(j)
      		links = curNode.getLinks
      		probability = links/totalLinks
      		curNode.addNeighbor(n, probability)
      	network.addNode(n)
    • Does node aging matter in this model?
    • Null Models For Social Networks (for comparison and testing)
  • Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources <- One of the most popular articles from 2016 via Altmetric
  • Skype messenger meeting with Aaron and Katy going over the data we have

Phil 11.29.16

7:00 – 5:30 ASRC

  • How Le Monde is taking on fake news
  • Thinking about Jonathan Albright‘s work. How is it crawled? Is it really just inbound links? Can I get the data? I need to ask.
  • Back to Sociophysics.
    • Clustering coefficient (video)
      CC = 0
      numNodes = 0
      for(i = 0 to max)
      	for(j = 0 to max)
      		n = node(i,j)
      		k = n.numNeighbors()
      		a = n.numLinksBetweenNeighbors()
      		n.setNodeCC((2*a)/(k*(k-1)))
      		CC += n.getNodeCC()
      		numNodes++
      CC = CC/numNodes
    • Clustering coefficient ordering: random -> small world -> regular
  • Got the NMF built into CorpusManager. Here’s the first four chapters of Moby Dick as:
    • BOW: there think harpooneer about little landlord sleep could would
    • TF-IDF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • NMF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
    • BOW/centrality: there think would queequeg could about whale little first
    • TF-IDF/centrality : about harpooneer night landlord stand light nantucket where other
    • NMF/centrality : harpooneer queequeg landlord water nantucket circumstance sailor passenger about
    • (centrality with equalized docs)
  • Meeting with Shimei

Phil 11.28.16

7:00 – 5:00 ASRC

  • Stumbled upon the ACM Transactions on Interactive Intelligent Systems (TIIS). They have two interesting upcoming issues:
  • Jonathan Albright came up on my Twitter feed. He’s doing interesting data journalism. Here’s his thoughts on fake news. It’s really odd that he’s not published peer reviewed. Is this because he’s at a teaching university?
  • Looking through Sociophysics, and finding some interesting references.
    • Minority Opinion Spreading in Random Geometry
      • Abstract: The dynamics of spreading of the minority opinion in public debates (a reform proposal, a behavior change, a military retaliation) is studied using a diffusion reaction model. People move by discrete step on a landscape of random geometry shaped by social life (offices, houses, bars, and restaurants). A perfect world is considered with no advantage to the minority. A one person-one argument principle is applied to determine locally individual mind changes. In case of equality, a collective doubt is evoked which in turn favors the Status Quo. Starting from a large in favor of the proposal initial majority, repeated random size local discussions are found to drive the majority reversal along the minority hostile view. Total opinion refusal is completed within few days. Recent national collective issues are revisited. The model may apply to rumor and fear propagation.
  • Updating intellij and waiting for 497MB to download
  • Continue to generalize NMF. get k tested and implicit in the matrix passing. Start NMF class as part of JavaUtils. Done
  • Start to integrate NMF into CorpusManager. Initially, I’m just going to use it to produce the matrix, like TF-IDF.
    • Computing, now I need to sort and trim
  • Fika with Aaron on writing. Need to ask for his slide deck.
  • Meeting with Wayne, mostly catching up. What book should I give hime? The most tabbed are Sciences of the Artificial, Last Place on Earth, and Social Science.