Monthly Archives: November 2016

Phil 11.30.16

7:00 – 3:30 ASRC

Wrote up my notes from chat with Shimei. I think the first step is to look through the UTOPIAN paper again and see how (if?) summary and coclustering is being handled.

Downloaded her suggested papers
It looks like the row and column matricies might be useful and manipulable. Digging into the NMF java class for some more manipulation
Added raw, weight and scaled matrices

Need to add ranked row, column and cell output for L2DMat – done here’s some data and thoughts:

rMat
 , D1, D2, D3, D4, 
U1, 5, 3, 0, 1, 
U2, 4, 0, 0, 1, 
U3, 1, 1, 0, 5, 
U4, 1, 0, 0, 4, 
U5, 0, 1, 5, 4, 

newMat
 , D1, D2, D3, D4, 
U1, 5.05, 2.87, 5.26, 1, 
U2, 3.96, 2.25, 4.27, 1, 
U3, 1.11, 0.71, 4.4, 4.99, 
U4, 0.94, 0.6, 3.57, 3.99, 
U5, 2.35, 1.39, 4.87, 4.05, 
average difference = 0.09750770110043207
sorted columns {D3=22.36862672329615, D4=15.038484762558607, D1=13.410342394629499, D2=7.815842574518472}
sorted rows {U1=14.17790755369198, U5=12.657839100920228, U2=11.485548694067901, U3=11.209516468182759, U4=9.102484638139858}

Manipulting row weights by column

newMat weight col 0 set to 1.0
 , D1, D2, D3, D4, 
U1, 4.9, 2.76, 4.44, 0, 
U2, 3.81, 2.15, 3.45, 0, 
U3, 0.35, 0.2, 0.32, 0, 
U4, 0.34, 0.19, 0.31, 0, 
U5, 1.73, 0.98, 1.57, 0, 
sorted columns {D1=11.121458227331996, D3=10.081893895718448, D2=6.276360972184673, D4=0.0}
sorted rows {U1=12.101008726739368, U2=9.406070587569038, U5=4.271932099958697, U3=0.869188591004756, U4=0.8315130899632566}

newMat weight col 1 set to 1.0
 , D1, D2, D3, D4, 
U1, 0.15, 0.1, 0.82, 1, 
U2, 0.15, 0.1, 0.82, 1, 
U3, 0.76, 0.51, 4.08, 4.99, 
U4, 0.61, 0.41, 3.26, 3.99, 
U5, 0.62, 0.41, 3.31, 4.05, 
sorted columns {D4=15.038484762558607, D3=12.286732827577703, D1=2.2888841672975038, D2=1.539481602333799}
sorted rows {U3=10.340327877178003, U5=8.38590700096153, U4=8.2709715481766, U2=2.079478106498862, U1=2.076898826952612}

According to Choo, the columns in the factor mats are the latent topics. That means, for example, when all the document columns are zeroed out but one, the high-ranked terms are the topics for that document (And LSI will extract those terms???). And when all the term columns are zeroed out but one, the documents are sorted relevant to that term. Big gaps mean clusters, or maybe just the cluster is up to the first gap???

Add this one to the list? Characteristics to look for? Hate Spin: The Twin Political Strategies of Religious Incitement and Offense-Taking
Deep Learning MIT book (pdf)
Back to Sociophysics.
- To build a scale-free network, AL Barabási, R Albert in Emergence of scaling in random networks start with a small random network and incrementally add nodes where the probability of connecting a new node with existing nodes is proportional to how many connections the current nodes have.
```
network.createInitialNodes(SOME_SMALL_VALUE)
for(i = 0 to desired)
	n = createNewNode()
	totalLinks = countAllLinks()
	for(j = 0 to network.numNodes)
		curNode = getNode(j)
		links = curNode.getLinks
		probability = links/totalLinks
		curNode.addNeighbor(n, probability)
	network.addNode(n)
```
- Does node aging matter in this model?
- Null Models For Social Networks (for comparison and testing)
Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources <- One of the most popular articles from 2016 via Altmetric
Skype messenger meeting with Aaron and Katy going over the data we have

Phil 11.29.16

7:00 – 5:30 ASRC

How Le Monde is taking on fake news
Thinking about Jonathan Albright‘s work. How is it crawled? Is it really just inbound links? Can I get the data? I need to ask.

Back to Sociophysics.

Clustering coefficient (video)

CC = 0
numNodes = 0
for(i = 0 to max)
	for(j = 0 to max)
		n = node(i,j)
		k = n.numNeighbors()
		a = n.numLinksBetweenNeighbors()
		n.setNodeCC((2*a)/(k*(k-1)))
		CC += n.getNodeCC()
		numNodes++
CC = CC/numNodes

Clustering coefficient ordering: random -> small world -> regular

Got the NMF built into CorpusManager. Here’s the first four chapters of Moby Dick as:
- BOW: there think harpooneer about little landlord sleep could would
- TF-IDF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
- NMF: nantucket harpooneer queequeg landlord euroclydon bedford lazarus passenger circumstance
- BOW/centrality: there think would queequeg could about whale little first
- TF-IDF/centrality : about harpooneer night landlord stand light nantucket where other
- NMF/centrality : harpooneer queequeg landlord water nantucket circumstance sailor passenger about
- (centrality with equalized docs)
Meeting with Shimei

Phil 11.28.16

7:00 – 5:00 ASRC

Stumbled upon the ACM Transactions on Interactive Intelligent Systems (TIIS). They have two interesting upcoming issues:
- Trust and Influence in Intelligent Human-Machine Interaction
- Human-Centered Machine Learning
- Added to spreadsheet
Jonathan Albright came up on my Twitter feed. He’s doing interesting data journalism. Here’s his thoughts on fake news. It’s really odd that he’s not published peer reviewed. Is this because he’s at a teaching university?
Looking through Sociophysics, and finding some interesting references.
- Minority Opinion Spreading in Random Geometry
  - Abstract: The dynamics of spreading of the minority opinion in public debates (a reform proposal, a behavior change, a military retaliation) is studied using a diffusion reaction model. People move by discrete step on a landscape of random geometry shaped by social life (offices, houses, bars, and restaurants). A perfect world is considered with no advantage to the minority. A one person-one argument principle is applied to determine locally individual mind changes. In case of equality, a collective doubt is evoked which in turn favors the Status Quo. Starting from a large in favor of the proposal initial majority, repeated random size local discussions are found to drive the majority reversal along the minority hostile view. Total opinion refusal is completed within few days. Recent national collective issues are revisited. The model may apply to rumor and fear propagation.
Updating intellij and waiting for 497MB to download
Continue to generalize NMF. get k tested and implicit in the matrix passing. Start NMF class as part of JavaUtils. Done
Start to integrate NMF into CorpusManager. Initially, I’m just going to use it to produce the matrix, like TF-IDF.
- Computing, now I need to sort and trim
Fika with Aaron on writing. Need to ask for his slide deck.
Meeting with Wayne, mostly catching up. What book should I give hime? The most tabbed are Sciences of the Artificial, Last Place on Earth, and Social Science.

Phil 11.24.16

8:00 – 10:00 ASRC

Finished Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
Orgnet – network analytics by Valdis Krebs Website and blog. Lots of stuff here.

Phil 11.23.16

7:30 – 10:30 ASRC

Wrote up notes from yesterday’s meetings with Don and Shimei.
Really just getting ready for T-day, but I ran my list of recipies through the TF-IDF and LMN tools and now I have a nice, sparse matrix that I can try the NMF on.
Finish Matrix dot-product code and promote to Labled2DMatrix – done!!

Phil 11.22.16

7:00 – 5:00 ASRC

Worked on getting the spreadsheet of conferences, journals and grant started
Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
- When δ increases, the communities become smaller but more densely connected.
- It should be very interesting to look at belief velocity at different scales.
A Plethora of Data Set Repositories
More NMF. Getting closer
Installing Python on the laptop for discussion with Don

Got everything working in java! Need to move the dot product code into Labeled2DMatrix and flesh out the other cases.

rMat
 , D1, D2, D3, D4, 
U1, 5, 3, 0, 1, 
U2, 4, 0, 0, 1, 
U3, 1, 1, 0, 5, 
U4, 1, 0, 0, 4, 
U5, 0, 1, 5, 4, 

rowMat

U1, 0.67, 0.89, 
U2, 0.36, 0.47, 
U3, 0.51, 0.27, 
U4, 0.11, 0.84, 
U5, 0.23, 0.88, 

colMat

D1, 0.36, 0.68, 
D2, 0.84, 0.06, 
D3, 0.07, 0.06, 
D4, 0.65, 0.16, 

steps = 5000

P
Array2DRowRealMatrix{{0.1714659334,2.4334642215},{0.2222526463,1.8424266034},{1.8809519431,0.3877676639},{1.5002592207,0.3319796716},{1.398228183,1.5413729554}}

Q
Array2DRowRealMatrix{{0.1642944844,0.083284122,1.152720993,2.6155442597},{2.0998133805,1.0434120295,2.0884233062,0.228777745}}

rowMat

U1, 0.17, 2.43, 
U2, 0.22, 1.84, 
U3, 1.88, 0.39, 
U4, 1.5, 0.33, 
U5, 1.4, 1.54, 

colMat

D1, 0.16, 2.1, 
D2, 0.08, 1.04, 
D3, 1.15, 2.09, 
D4, 2.62, 0.23, 

newMat
 , D1, D2, D3, D4, 
U1, 5.14, 2.55, 5.28, 1.01, 
U2, 3.91, 1.94, 4.1, 1, 
U3, 1.12, 0.56, 2.98, 5.01, 
U4, 0.94, 0.47, 2.42, 4, 
U5, 3.47, 1.72, 4.83, 4.01,

Meeting with Don.
- Looked through the modelling and UTOPIAN papers, and walked through some of the math. We’ll meet next Friday to try to convert some of the equations into java code
Meeting with Shimei
- There are ways of getting better stability with LDA. Still ok to do NMF, though there may be issues with scaling. That’s where a stable version of LDA might make sense.

Phil 11.21.16

6:45 – 4:45 ASRC

Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
- Bubbles at scales? “Stability measures the quality of a partition by giving a positive contribution to communities from which a random walker is unlikely to escape within the given time scale. For small values of t, this gives more weights to small communities whereas for larger values of t , larger communities are favored. Thus, by searching the partitions maximizing the stability for several values of , one can detect communities at several scales.“
- The algebraic connectivity of a graph G is the second-smallest eigenvalue of the Laplacian matrix of G
- Data sources for the paper:
  - The political blogosphere and the 2004 US election: divided they blog
  - An information flow model for conflict and fission in small groups

More NMF

P = [[ 0.67503659  0.89795272]
 [ 0.36939303  0.47816356]
 [ 0.51019257  0.27772317]
 [ 0.1130504   0.84860109]
 [ 0.23238542  0.88222005]]

Q = [[ 0.36692407  0.6844149 ]
 [ 0.84469693  0.06331073]
 [ 0.07366106  0.06603799]
 [ 0.65677669  0.16947152]]

nP = [[ 0.16286496  2.42456084]
 [ 0.21647521  1.83981127]
 [ 1.9047257   0.39049035]
 [ 1.52103295  0.33509559]
 [ 1.41350212  1.51711067]]

nQ = [[ 0.15875994  2.09665688]
 [ 0.08334172  1.04818927]
 [ 1.16320811  2.09280482]
 [ 2.56431807  0.24424636]]

nQt = [[ 0.15875994  0.08334172  1.16320811  2.56431807]
 [ 2.09665688  1.04818927  2.09280482  0.24424636]]

R = [[5 3 0 1]
 [4 0 0 1]
 [1 1 0 5]
 [1 0 0 4]
 [0 1 5 4]]

nR = [[ 5.10932861  2.55497211  5.26357846  1.00982771]
 [ 3.89182055  1.94651185  4.10217161  1.00447849]
 [ 1.12111842  0.56805092  3.03281247  4.97969837]
 [ 0.94405957  0.4780091   2.47056752  3.98225815]
 [ 3.40526805  1.70802283  4.81921366  3.99521777]]

Hard coded the random values for gradient descent to compare python and java
Stepping h

Sprint stuff?
- Scrum
- Sent Jeremy the svn file names for my Vistronix code
Fika
Meeting with Wayne? Basic catching up. started the spreadsheet of conferences and grants

Phil 11.18.16

7:00 – 4:00 ASRC

Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
Working my way through Matrix Factorization for Movie Recommendations – CME 510
Still working the NMF math, particularly the gradient descent. This is still the best version I can find
Mark Zuckerberg on fake news

Phil 11.17.16

7:00 – 10:00, 10:30 – 5:30 ASRC

Tenure review meeting at 10:00? Show up and see, I guess
Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. Details here.
We prove, under some conditions, the existence of a solution to the system dynamics, convergence to clusters, and a non-trivial lower bound on the
distance between clusters. Huh. So bubbles must exist at a certain minimum information distance from each other??
Inverse matrices are needed because we can’t divide by a matrix but we can multiply by its reciprocal.
Lemma – a subsidiary or intermediate theorem in an argument or proof.
(From Wikipedia) In measure theory, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of n-dimensional Euclidean space. For n = 1, 2, or 3, it coincides with the standard measure of length, area, or volume. In general, it is also called n-dimensional volume, n-volume, or simply volume.^[1] It is used throughout real analysis, in particular to define Lebesgue integration. Sets that can be assigned a Lebesgue measure are called Lebesgue measurable; the measure of the Lebesgue measurable set A is denoted by λ(A).
The backward slash is kind of the set theory equivalent of subtracting, i.e., $A∖B={a\inA∣a\notinB}.$
Group doc on how to fix fake news
Back to working through NMF.
Looks like we’ll watch videos tomorrow morning

Phil 11.16.16

7:00 – 4:00 ASRC

Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. My notes are here.
Clean data and apis from propublica
Ulrich Krause
Professor of Mathematics, Bremen University
Positive dynamical systems, opinion dynamics, algebra
- Opinion dynamics and bounded confidence models, analysis, and simulation – looks incredibly clear and helpful.
- Opinion dynamics under the influence of radical groups, charismatic leaders, and other constant signals: A simple unifying model – derived from the above.
From Is there negative social influence? Disentangling effects of dissimilarity and disliking on opinion shifts
- This finding implies for models of opinion dynamics that a complex non-linear social influence function might be unnecessary to characterize the relationship between similarity and opinion change. Our results suggest that not only for the sake of simplicity, but also for the sake of realism, model builders should be cautioned against resorting too readily to a more complex assumption than a simple linear influence function.
I need to add recursion to the QueryComponent (content list and child list) and work through the combinations that way and get rid of the lists. Done! Had an issue with the BufferedWriter not flushing.
BRC kickoff meeting
Made a new arff for Aaron of the BRC doctor data I tagged. Should be enough for a starting junk filter.
Finished the Utopian paper. Need to get up to speed on NMF.

Phil 11.15.16

7:00 – 3:30 ASRC

Continuing Opinion Dynamics With Decaying Confidence: Application to Community Detection in Graphs. My notes are here.
Working on getting the crawl payload builder. I got messed up with permutations. Tomorrow I need to add recursion to the QueryComponent (content list and child list) and work through the combinations that way and get red of the lists.

Phil 11.14.16

7:00 – 5:00 ASRC

Pick up printer paper!
My intuition is that there is a form of information ‘flocking behavior’ with respect to information space. There wouldn’t be quite the same physics as birds or fish in motion, but there do seem to be rules.

Surprisingly, when I started to look at the literature, many of my hits came back from swarm robotics, for example Stable social foraging swarms in a noisy Environment. This is particularly interesting since information search behavior has long been equated with foraging behavior.
The Max Planck Department of Collective Behaviour: “If it’s collective, and a great system for asking questions, then it is of interest to us.”
So, Reading up on flocking.
- Found Stable social foraging swarms in a noisy Environment
- On Krause’s multi-agent consensus model with state-dependent connectivity
- Opinion dynamics with decaying confidence: Application to community detection in graphs <- going to start with this one.
Add wayne to my resume
Quick meetings with Shimei and Aaron

Phil 11.11.16

8:00 – 12:00 – UMBC

Finished the IUI reviews
Doing Shimei’s review
Setting up meeting with Christelle Viauroux
Too frazzled to do coding. Reading Last Place on Earth.

Phil 11.10.16

7:00 – 4:30 ASRC

Had some thoughts last night about how flocking at different scales in Hilbert space might work. Flocks built upon flocks. There is some equivalent of mass and velocity, where mass might be influence (positive and negative attraction). Velocity is related to how fast beliefs change.
Also thought about maps some more, weather maps in particular. A weather map maintains a coordinate frame, even though nothing in that frame is stable. Something like this, with a sense of history (playback of the last X years) could provide an interesting framework for visualization.
Continuing Novelty Learning via Collaborative Proximity Filtering review. Done! Need to submit both now.
Adding StrVec to the ARFF outputs – done
Starting this tutorial on Nonnegative Matrix Factorization
- These slides are also very nice
Working on building JSON files for loading CI
Meeting about Healthdatapalooza

Phil 11.9.16

7:00 – 5:00 ASRC

President-elect Trump. Wow. Just wow.
Starting Novelty Learning via Collaborative Proximity Filtering review
Working with Aaron to get the java version of the classifier working
LibRec (http://www.librec.net) is a Java library for recommender systems (Java version 1.7 or higher required). It implements a suit of state-of-the-art recommendation algorithms. It consists of three major components: Generic Interfaces, Data Structures and Recommendation Algorithms. This should save a *lot* of work. Remember to thank and cite.
The forces that drove this election’s media failure are likely to get worse – Lots of stuff on echo chambers and social media

viztales

Dimension reduction, State, Orientation, and Speed

Monthly Archives: November 2016

Phil 11.30.16

Phil 11.29.16

Phil 11.28.16

Phil 11.24.16

Phil 11.23.16

Phil 11.22.16

Phil 11.21.16

Phil 11.18.16

Phil 11.17.16

Phil 11.16.16

Phil 11.15.16

Phil 11.14.16

Phil 11.11.16

Phil 11.10.16

Phil 11.9.16