Monthly Archives: February 2017

Aaron 2.28.17

9:00 – BRC

TensorFlow
- Installed following TF installation guide.
- Found issues with the install instructions almost immediately. Found this link with a suggestion that I followed to get it installed.
- Almost immediately found that the Hello World example succeeded with a list of errors. Apparently its a known issue for the release candidate which was just fixed in the nightly build as per this link.
- I haven’t had a chance to try it yet, but found a good Reddit link for a brief TF tutorial.
- I went through the process of trying to get my IntelliJ project to connect and be happy with the Python interpreter in my Anaconda install, and although I was able to RUN the TF tutorials, it was still acting really wacky for features like code completion. Given Phil was able to get up and running with no problems doing a direct pip install to local Python, I scrapped my intent to run through Anaconda and did the local install. Tada! Everything is working fine now.

Unsupervised Learning (Clustering)
- Our plan is to implement our unsupervised learning for the IH customer in an automated fashion by writing a MR app dispatched by MicroService that populates a Protobuf matrix for TensorFlow.
- The trick about this is that there is no built in density-based clustering algorithm native for TF like the DBSCAN we used on last sprint’s deliverable. TF supports K-Means “out of the box” but with the high number of dimensions in our data set this isn’t ideal. Here is a great article explaining why.
- However, one possible method of successfully utilizing K-Means (or improving the scalability of DBSCAN is to convert our high dimensional data to polar coordinates. We’ll be investigating this once we’ve comfortable with TensorFlow’s matrix math operations.

Proposal Work
- Spent a fun hour of my day converting a bunch of content from previous white-papers and RFI documents into a one-page write-up of our Cognitive Computing capabilities. Ironically the more we have to write these the easier it gets because I’ve already written it all before. Also more importantly as time goes by more and more of the content describes things we’ve actually done instead of things we have in mind to do.

Phil 2.28.17

7:00 – 8:30 Research

Sent a note to Don about getting together on Thursday
Added to the list of journals and conferences. Included the Journal of Political Philosophy, which is where The Law of GP was originally published

9:00 – 4:30 BRC

Installing Tensorflow as per the instructions for CUDA and Anaconda
- May need to install Python 2.7 version due to TF incompatibility with prebuilt python distros
- Looks like I need visual studio for CUDA support, which is excessive. Going to try the CPU-only version
- Installing Anaconda3-4.3.0.1-Windows-x86_64.exe. Nope, based on Aaron’s experiences, I’m going to install natively

Tensorflow native pip installation

Uninstalled old Python
Installed python-3.5.2-amd64.exe from here
Did the cpu install:
```
 pip3 install --upgrade tensorflow
```

Ran the ‘hello world’ program

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

Success!!!

E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op: FinishedNodes
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op: SampleInputs
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op: ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op: TreePredictions
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op: UpdateFertileSlots
b'Hello, Tensorflow!'
>>>

The errors are some kind of cruft that has been fixed in the nightly build as per this thread
Got my new Python running in IntelliJ
Working through the tutorials. So far so good, and the support for matrices is very nice
- Getting Started
Some Tensorflow stuff from O’Reilly
- Hello, TensorFlow
- Learning TensorFlow – A guide to building deep learning systems
Some proposal work slinging text on cognitive computing

Phil 2.27.17

7:00 – 8:30 Research

Call Senate, house about CBP, SCOTUS, etc
Add Antibelief, Leader, etc. mentions to Future work

9:00 – 6:00 BRC

How to invoke a trained TensorFlow model from Java programs
Deploying TF in production. Serving up predictions in production
- 14:34 (video)
TF Javadoc: https://www.tensorflow.org/api_docs/java/reference/org/tensorflow/package-summary
TF jar: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/java
Or if you want to call the C++ via JNI: https://www.tensorflow.org/extend/language_bindings
Tensorflow Serving (microservice)
Tensorflow security (WSO2)
Hadoop data filters to train the model
XLA compiler github root
And we wrote a 5 page white paper

Phil 2.24.17

7:00 – 8:00 Research

Continuing paper.
Robert Mercer: the big data billionaire waging war on mainstream media
Defense Against the Dark Arts: Networked Propaganda and Counter-Propaganda
- Jonathan Stray
Hannah Arendt: From an Interview
- They learn whom to kill and how to kill and how to do it together. This is the much talked about Gleichschaltung—the coordination process. You are coordinated not with the powers that be, but with your neighbor—coordinated with the majority.
The fake news phenomenon: How it spreads, and how to fight it
Downward comparison principles in social psychology.
GovTrack.us. Looks like a good source for parsable xml-formatted data for:

Congressional Bills	04-Jan-2017 02:05	–
Bill Status	04-Jan-2017 11:14	–
Bill Summaries	04-Jan-2017 05:13	–
Commerce Business Daily	19-Mar-2012 05:52	–
Code of Federal Regulations (Annual Edition)	22-Feb-2016 03:06	–
Electronic Code of Federal Regulations	20-Sep-2016 09:11	–
Federal Register	31-Dec-2016 08:37	–
United States Government Manual	14-Feb-2017 06:49	–
House Rules and Manual	13-Oct-2016 06:13	–
Privacy Act Issuances	09-Feb-2016 06:41	–
Public Papers of the Presidents of the United States	17-Jan-2017 09:32	–
Supreme Court Decisions 1937-1975 (FLITE)

8:30 – 4:30 BRC

More TensorFlow
- https://www.youtube.com/watch?v=kAOanJczHA0&list=PLOU2XLYxmsIKGc_NBoIhTn2Qhraji53cv
- Speed vs. Memory compiler options
- Unrolling and vectorizing
- Long Short Term Memory (LSTM) overview
- https://www.youtube.com/watch?v=t64ortpgS-E&list=PLOU2XLYxmsIKGc_NBoIhTn2Qhraji53cv&index=5
Need to try turning the integrity data into an angle radius and DBSCAN on Monday
Writing up the justification/needs for going to TensorFlow/GPU

Phil 2.23.17

7:00 – 8:00 Research

Working on paper. Telling the story about how clustering needs to be done both as snapshot and over time using DTW

8:30 – 4:00 BRC

DTW clustering writeup
DBSCAN won’t scale well
Three Myths about Dynamic Time Warping Data Mining
Started on the TensorFlow Dev Summit 2017 videos
Found out about SIGMOD 2017

Phil 2.22.17

7:00 – 2:00 Research

Starting full paper
Finished porting abstract into gdocs
Working on adding the DTW work. Building charts. Lots of charts.

2:00 – 6:00 BRC

Worked with Aaron on accessing the classifier microservice
Writing up DTW as a mechanism for predicting behaviors
Found my old scripting engine code. Need to download and check

Phil 2.21.17

7:00 – 12:00 Research

Biting the bullet on Dynamic Time Warping as a way of identifying cluster members. Still not sure why a least squares approach isn’t a standard approach.
- This post seems to be helpful: stats.stackexchange.com/questions/131281/dynamic-time-warping-clustering
- FastDTW (java)
- The JavaML library: java-ml.sourceforge.net
- Well, that seems pretty straightforward. I put the full folder in my svn so I don’t have to deal with Sourceforge’s ads.

import net.sf.javaml.distance.fastdtw.dtw.FastDTW;
import net.sf.javaml.distance.fastdtw.timeseries.TimeSeries;
import net.sf.javaml.distance.fastdtw.timeseries.TimeSeriesPoint;

TimeSeries tsI = new TimeSeries(1);
TimeSeries tsJ = new TimeSeries(1);

TimeSeriesPoint tspI;
TimeSeriesPoint tspJ;

double t = 0;
double offset = 0.0;
double amplitude = 2.0;
double step = 0.1;
while(t < 10) {
    double[] v1 = {Math.sin(t)};
    double[] v2 = {Math.sin(t+offset)*amplitude};
    tspI = new TimeSeriesPoint(v1);
    tspJ = new TimeSeriesPoint(v2);
    tsI.addLast(t, tspI);
    tsJ.addLast(t, tspJ);

    t += step;
}

System.out.println("FastDTW.getWarpDistBetween(tsI, tsJ) = "+FastDTW.getWarpDistBetween(tsI, tsJ));

FastDTW.getWarpDistBetween(tsI, tsJ) = 46.33334518229166

Note that the measure can be through all of the dimensions, so this may take some refactoring
Next step is to add this to the FlockRecorder class and output to excel and ARFF. I think this should replace the ‘deltas’ outputs. Done!

Running DBSCAN clustering in WEKA on the outputs

All Exploit – Social Radius = 0: All NOISE
All Exploit – Social Radius = 0.1 ALL NOISE

All Exploit – Social Radius = 0.2 (32 NOISE)

=== Model and evaluation on training set ===

Clustered Instances

0       68 (100%)

Unclustered instances : 32

Class attribute: AgentBias_
Classes to Clusters:

  0  -- assigned to cluster
 68 | EXPLOITER

Cluster 0 -- EXPLOITER

Incorrectly clustered instances :	0.0	  0      %

All Exploit – Social Radius = 0.4 (86 NOISE)

== Model and evaluation on training set ===

Clustered Instances

0       14 (100%)

Unclustered instances : 86

Class attribute: AgentBias_
Classes to Clusters:

  0  -- assigned to cluster
 14 | EXPLOITER

Cluster 0 -- EXPLOITER

Incorrectly clustered instances :	0.0	  0      %

All Exploit – Social Radius = 0.8 (41 NOISE)

=== Model and evaluation on training set ===

Clustered Instances

0       45 ( 76%)
1        7 ( 12%)
2        7 ( 12%)

Unclustered instances : 41

Class attribute: AgentBias_
Classes to Clusters:

  0  1  2  -- assigned to cluster
 45  7  7 | EXPLOITER

Cluster 0 -- EXPLOITER
Cluster 1 -- No class
Cluster 2 -- No class

Incorrectly clustered instances :	14.0	 14      %

All Exploit – Social Radius = 1.6 (51 NOISE)

=== Model and evaluation on training set ===

Clustered Instances

0       49 (100%)

Unclustered instances : 51

Class attribute: AgentBias_
Classes to Clusters:

  0  -- assigned to cluster
 49 | EXPLOITER

Cluster 0 -- EXPLOITER

Incorrectly clustered instances :	0.0	  0      %

All Exploit – Social Radius = 3.2 (9 NOISE)

=== Model and evaluation on training set ===

Clustered Instances 

0       91 (100%)

Unclustered instances : 9

Class attribute: AgentBias_
Classes to Clusters:

  0  -- assigned to cluster
 91 | EXPLOITER

Cluster 0 -- EXPLOITER

Incorrectly clustered instances :	0.0	  0      %

All Exploit – Social Radius = 6.4 (8 NOISE)

=== Model and evaluation on training set ===

Clustered Instances

0       86 ( 93%)
1        6 (  7%)

Unclustered instances : 8

Class attribute: AgentBias_
Classes to Clusters:

  0  1  -- assigned to cluster
 86  6 | EXPLOITER

Cluster 0 -- EXPLOITER
Cluster 1 -- No class

Incorrectly clustered instances :	6.0	  6      %

All Exploit – Social Radius = 10

=== Model and evaluation on training set ===

Clustered Instances

0       82 ( 91%)
1        8 (  9%)

Unclustered instances : 10

Class attribute: AgentBias_
Classes to Clusters:

  0  1  -- assigned to cluster
 82  8 | EXPLOITER

Cluster 0 -- EXPLOITER
Cluster 1 -- No class

Incorrectly clustered instances :	8.0	  8      %

So what this all means is that the DTW produces reasonable data that can be used for clustering. The results seem to match the plots. I think I can write this up now…

12:00 – 5:00 BRC

Clustering discussions with Aaron
GEM Meeting

Phil 2.20.17

7:00 – 11:00 Research

PathNet article and paper. Using genetic techniques to produce better NN systems. GAs are treated like gradient descent. Which makes sense, as gradient descent and hillclimbing are pretty much the same thing
- “Since scientists started building and training neural networks, Transfer Learning has been the main bottleneck. Transfer Learning is the ability of an AI to learn from different tasks and apply its pre-learned knowledge to a completely new task. It is implicit that with this precedent knowledge, the AI will perform better and train faster than de novo neural networks on the new task.”
Adding angle and mean deltas. Interesting results, but still not sure on the best approach to classify…
Newest version is at philfeldman.com/GroupPolarization
So here’s a pretty typical population. It’s 10% Explorer, 90% Exploiter. Exploit social influence radius is 0.2. These settings produce an orbiting flock. Between-group interaction is allowed, so This is a grid where the accumulated relationship of each agent to every other agent is shown. Red is closest, green is farthest You can see the different populations pretty well. One thing that isn’t that obvious is that exploiters are on average slightly closer to each other than to exploiters.
A more extreme example is where the Exploit influence distance is 10: These tables show just relative position when compared to the origin.

Although I can’t figure out how to classify using this data, clustering works pretty well. This is Canopy (WEKA) on the top dataset above:

=== Run information ===

Scheme: weka.clusterers.Canopy -N -1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t2 -1.0 -t1 -1.25 -S 1
Relation: ORIGIN_POSITION_DELTA
Instances: 100
Attributes: 102
[list of attributes omitted]
Test mode: Classes to clusters evaluation on training data

=== Clustering model (full training set) ===

Canopy clustering
=================

Number of canopies (cluster centers) found: 2
T2 radius: 3.137
T1 radius: 3.922

Cluster 0: 0.283631,0.443357,0.240249,0.280277,0.396611,0.258673,0.28608,0.27558,0.312295,0.215801,0.249255,0.25779,0.280719,0.273191,0.58818,0.258901,0.196191,0.240405,0.201927,0.273491,0.271862,0.266807,0.249377,0.269756,0.265874,0.252873,0.299417,0.244208,0.284257,0.253868,0.234348,0.213578,0.242031,0.248292,0.215259,0.236993,0.301843,0.245444,0.282464,0.290885,0.216585,0.375846,0.223493,0.278251,0.375965,0.764462,0.338657,0.280672,0.316447,0.261622,0.265026,0.436098,0.246442,0.246887,0.289306,0.470806,0.43541,0.209845,0.220971,0.21506,0.247576,0.249173,0.468053,0.28907,0.418987,0.293851,0.452858,0.267638,0.243671,0.248868,0.242674,0.371534,0.29843,0.221506,0.25575,0.242182,0.335877,0.28386,0.303986,0.235298,0.282083,0.427425,0.26635,0.251009,0.304134,0.281157,0.212644,0.367693,0.222213,0.247862,0.780248,0.894699,0.713413,0.865287,0.826024,0.868741,0.757008,0.807287,0.785141,0.756071,{88}
Cluster 1: 0.919922,0.669721,0.908035,0.73578,0.591465,0.752733,0.774358,0.826861,0.84364,0.884803,0.939301,0.958981,0.629587,0.76459,0.545587,0.715267,0.853073,0.803545,0.851979,0.693952,0.954557,0.703606,0.897206,0.698297,0.926263,0.91898,0.733686,0.818759,0.763319,0.776199,0.843167,0.811708,0.903011,0.814435,0.804113,0.916336,0.639919,0.779399,0.663897,0.754696,0.77482,0.682512,0.832556,0.764008,0.703999,0.513612,0.693526,0.734279,0.723504,0.903016,0.777757,0.597915,0.86509,0.900357,0.724636,0.648915,0.577278,0.883327,0.828117,0.813873,0.860062,0.915821,0.684886,0.979451,0.556747,0.667678,0.556487,0.941671,0.898276,0.902846,0.686763,0.664381,0.709607,0.706246,0.890753,0.898794,0.588379,1.001214,0.625244,0.761188,0.828436,0.661864,0.759379,0.944355,0.728272,0.764909,0.761139,0.65028,0.845547,0.87213,0.586679,0.500194,0.498893,0.513267,0.493026,0.58192,0.620756,0.469854,0.540532,0.496272,{12}

Time taken to build model (full training data) : 0.03 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 88 ( 88%)
1 12 ( 12%)

Class attribute: AgentBias_
Classes to Clusters:

0 1 -- assigned to cluster
0 10 | EXPLORER
88 2 | EXPLOITER

Cluster 0 -- EXPLOITER
Cluster 1 -- EXPLORER

Incorrectly clustered instances : 2.0 2 %

The next analyses is on the second dataset. They are essentially the same, even though the differences are more dramatic (the tight clusters are very tight

=== Run information ===

Scheme:       weka.clusterers.Canopy -N -1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t2 -1.0 -t1 -1.25 -S 1
Relation:     ORIGIN_POSITION_DELTA
Instances:    100
Attributes:   102
              [list of attributes omitted]
Test mode:    Classes to clusters evaluation on training data

=== Clustering model (full training set) ===


Canopy clustering
=================

Number of canopies (cluster centers) found: 2
T2 radius: 3.438     
T1 radius: 4.297     

Cluster 0: 0.085848,0.050964,0.0513,0.053288,0.05439,0.054653,0.21758,0.057725,0.058775,0.050894,0.053768,0.130821,0.051098,0.050923,0.051115,0.050893,0.051012,0.051009,0.060649,0.051454,0.051089,0.051032,0.050894,0.053364,0.276684,0.051857,0.050984,0.050942,0.0509,0.050952,0.051025,0.056953,0.050914,0.050962,0.050903,0.052129,0.128196,0.051023,0.054222,0.274438,0.053978,0.050934,0.051124,0.054563,0.050995,0.074289,0.051077,0.05094,0.053644,0.050941,0.051343,0.050967,0.062704,0.052333,0.050936,0.051013,0.050922,0.051007,0.051038,0.050899,0.501239,0.051574,0.051005,0.050898,0.050944,0.204398,0.06076,0.050947,0.050904,0.408553,0.051263,0.0511,0.051574,0.069173,0.050997,0.162314,0.051353,0.096523,0.498648,0.339103,0.051125,0.050888,0.051002,0.051124,0.080711,0.05105,0.051024,0.050988,0.100492,0.132793,0.630178,0.882598,0.832132,0.86452,0.55151,0.729317,0.755526,0.513822,0.782104,0.768836,{92} 
Cluster 1: 0.799117,0.793729,0.79643,0.7929,0.797843,0.797642,0.709935,0.78817,0.805937,0.794095,0.7972,0.76062,0.793743,0.79418,0.794846,0.794247,0.794677,0.793599,0.800359,0.794787,0.793849,0.793805,0.793613,0.784762,0.774656,0.79547,0.794308,0.793527,0.794406,0.793292,0.793513,0.800151,0.793775,0.793652,0.794123,0.793645,0.73331,0.794506,0.788542,0.710244,0.793332,0.793313,0.794184,0.801119,0.79448,0.802416,0.793669,0.7947,0.794813,0.794533,0.796484,0.794512,0.797614,0.794607,0.793716,0.793642,0.793548,0.794789,0.793551,0.793989,0.539133,0.79391,0.793443,0.793969,0.794472,0.715896,0.790956,0.794494,0.794293,0.678147,0.79434,0.793611,0.794221,0.802197,0.793753,0.759132,0.794164,0.798071,0.55929,0.698333,0.79444,0.79424,0.793585,0.793581,0.779958,0.79394,0.793567,0.794795,0.764686,0.754727,0.482214,0.518683,0.434538,0.501648,0.790616,0.4855,0.464554,0.691735,0.405411,0.496892,{8} 



Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       88 ( 88%)
1       12 ( 12%)


Class attribute: AgentBias_
Classes to Clusters:

  0  1  -- assigned to cluster
  0 10 | EXPLORER
 88  2 | EXPLOITER

Cluster 0 -- EXPLOITER
Cluster 1 -- EXPLORER

Incorrectly clustered instances :	2.0	  2      %

Online clustering, fear and uncertainty in Egypt’s transition (Published today). Wow. Downloaded

11:00 – 6:00 BRC

Spent the rest of the day working on the CHIMERA paper with Aaron

Phil 2.17.17

7:00 – 8:00 research

I think I want to navigate the information space of Trump’s tweets
Still working on how to classify an agent. After struggling a bit, I can classify very well if I eliminate extraneous infor from mean angle stats, leaving only bias and variance

8:30 – 10:30, 4:00 – 5:00

Working on creating, extracting and classifying cluster membership from flocks.
Had to leave early to help Barbara with Buck
Discussed exec summary with Aaron. Will write on Monday

Phil 2.16.17

7:00 – 8:00 Research

Had a great time NOT DOING ANY THINKING yesterday
Rechecking the velocity comparison matrix. It’s correct. Looking at multiplying or adding relative position vs relative velocity
Sent a few charts to Don to see if he can make anything pretty
Uploaded new version

8:30 – 5:00 BRC

Scrum. Getting started on next sprint. Discussion with Aaron. Neet to time and size a 3k col by 2M row matrix with DBSCAN.
Also need to write up an exec summary of all the analytic tools and give it a good name. By COB Monday?
This looks very interesting: http://pending.schema.org/ClaimReview. It’s from here, and this is the referring text:
- Google News determines whether an article might contain fact checks in part by looking for the schema.org ClaimReview markup. We also look for sites that follow the commonly accepted criteria for fact checks. Publishers who create fact-checks and would like to see it appear with the “Fact check” tag should use that markup in fact-check articles. For more information, head on over to our help center.
Had some good luck chasing down clustering algorithms for Aaron.

Phil 2.14.17

7:00 – 8:00 Research

Based on the charts from yesterday, I think I’m going to build two matrices to point WEKA at. Essentially, theses matrices will be filled with meta-cluster information

Average distance from agent to agent. Tightly clustered agents should have low average distances. DBSCAN should also work on this, as well as bootstrapping. That should cover this case:
Average velocity from agent to agent. I’m not sure what I’ll get from this, but in looking at the explore-explore case and the explore-exploit case, it strikes me that there may be some difference that is meaningful. And in the exploit-exploit case, the velocities should be near zero Explore-exploit Explore-explore

Start with Excel, and then add an ARFF

Got most of the methods built. Might finish this morning at work.
Indeed, you can get a lot done when you’re sitting in on a Skype meeting and they’re not talking about your part…
Ok, so I’ve added comparison matrices as Excel and ARFF output. In this case WEKA does better charting, so here goes. The first chart is exploit-exploit. Note that the majority of points are at 0,0: Next, an explore-exploit. In this case, there’s a cluster on the left side of the chart: Last, is the explore-explore chart, which has a cluster towards the middle:

This data also seems to be good to train a NaiveBayes Classifier. Here’s the result of an initial run:

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances          97               97      %
Incorrectly Classified Instances         3                3      %
Kappa statistic                          0.94  
Mean absolute error                      0.03  
Root mean squared error                  0.1732
Relative absolute error                  6      %
Root relative squared error             34.6337 %
Total Number of Instances              100     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 1.000    0.059    0.942      1.000    0.970      0.942    0.971     0.942     EXPLORER
                 0.941    0.000    1.000      0.941    0.970      0.942    1.000     1.000     EXPLOITER
Weighted Avg.    0.970    0.029    0.972      0.970    0.970      0.942    0.986     0.972     

=== Confusion Matrix ===

  a  b   -- classified as
 49  0 |  a = EXPLORER
  3 48 |  b = EXPLOITER

Velocity also works, the plots aren’t as crisp, but the classifier accuracy is about the same: Exploit-Exploit Explore-Exploit Explore-Explore

Again, classification looks good:

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances          99               99      %
Incorrectly Classified Instances         1                1      %
Kappa statistic                          0.98  
Mean absolute error                      0.01  
Root mean squared error                  0.1   
Relative absolute error                  2      %
Root relative squared error             19.9957 %
Total Number of Instances              100     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 1.000    0.020    0.980      1.000    0.990      0.980    1.000     1.000     EXPLORER
                 0.980    0.000    1.000      0.980    0.990      0.980    1.000     1.000     EXPLOITER
Weighted Avg.    0.990    0.010    0.990      0.990    0.990      0.980    1.000     1.000     

=== Confusion Matrix ===

  a  b   -- classified as
 49  0 |  a = EXPLORER
  1 50 |  b = EXPLOITER

Uploaded new version of the tool to philfeldman.com/GroupPolarization/GroupPloarizationModel.jar

8:30 – 3:30. BRC

Either start on the ResearchBrowser or continue with meta-clustering.
Grooming and sprint planning today – done! And good progress while hanging out on the phone.

Phil 2.13.17

7:00 – 8:00, 3:00 – 5:30 Research

Getting the toArff methods working with the WEKA date format: The default format string accepts the ISO-8601 combined date and time format: yyyy-MM-dd'T'HH:mm:ss
Going to go with Milliseconds format, and then set the ARFF to default to 1 sec increments. Nope, that didn’t work. Going with the default fromat and incrementing by seconds.
Fika
Meeting with Wayne. Basically catching up, but I also got to show him the sim. And for a numerical model of community behavior based on sociophysics, we got into a very involved conversation. Visualization helps tell stories.

8:30 – 2:30 BRC

Testing periodicity and time series classifiers
I can forecast in a bunch of different ways, but I can’t seem to classify. There are several filters, but no real explanation
Redoing some simulations, to see the best way of checking for Explorer/exploiter behavior. Here are some screenshots:
Explore-Explore
Explore-Exploit
Exploit-Exploit

Phil 2.10.17

7:00 – 8:30 Research

Adding the ability to set maxSlew and slew Variance on init. Can comment out particle selection. Done!
Commenting out particle to save GUI space Done!
Using ATLAS.ti 8 Windows in Literature Reviews
Uploaded the newest version and pinged Don.

10:00 – 5:00 BRC

The Porsche didn’t start, so I had to bike out and get the Honda. Brrr!
Submit for travel expenses
Learning how to do time series in WEKA
Looks like I need to rotate the matrix so that each column is an agent, and each row is a time step. Yep. Done. Had to add a case to toArff() where there are no row names
Discovered the arff viewer, which is awesome
Discovered the plugin manager. Nice. wrappers for other AI packages!
Predictive Learning. You just need to read in the file, then use the Forecast tabselect the time column and then the items you want to track:

Phil 2.9.17

7:00 – 8:30, 4:00 – 5:00 Research

Submitted edits on Collective Intelligence extended abstract
Finished Filter bubbles, echo chambers, and online news consumption. Post is here
Uploaded the executable jar https://philfeldman.com/GroupPolarization/GroupPolarizationModel.jar
Meeting with Don
- Played around with the app a bit. There may be other phases!
- Add the ability to set maxSlew and slew Variance on init. Can comment out particle selection.

8:30 – 3:30 BRC

Assembling notes from the trip – done
Starting to work on automated clustering. Nope Jira tasking. Done enough…?

Phil 2.8.17

8:00 – 8:00 BRC

At Wall. Brown Bag presentation today – went well, I think
BRC demo went well too
On Slashdot today: De-anonymizing Web Browsing Data with Social Networks
Apache cTAKES™ – clinical Text Analysis Knowledge Extraction System
Worked on the ML architecture. At four pages and counting.

viztales

Dimension reduction, State, Orientation, and Speed

Monthly Archives: February 2017

Aaron 2.28.17

Phil 2.28.17

Phil 2.27.17

Phil 2.24.17

Phil 2.23.17

Phil 2.22.17

Phil 2.21.17

Phil 2.20.17

Phil 2.17.17

Phil 2.16.17

Phil 2.14.17

Phil 2.13.17

Phil 2.10.17

Phil 2.9.17

Phil 2.8.17