Author Archives: pgfeldman

Phil 2.22.17

7:00 – 2:00 Research

  • Starting full paper
  • Finished porting abstract into gdocs
  • Working on adding the DTW work. Building charts. Lots of charts.

2:00 – 6:00 BRC

  • Worked with Aaron on accessing the classifier microservice
  • Writing up DTW as a mechanism for predicting behaviors
  • Found my old scripting engine code. Need to download and check

Phil 2.21.17

7:00 – 12:00 Research

import net.sf.javaml.distance.fastdtw.dtw.FastDTW;
import net.sf.javaml.distance.fastdtw.timeseries.TimeSeries;
import net.sf.javaml.distance.fastdtw.timeseries.TimeSeriesPoint;

TimeSeries tsI = new TimeSeries(1);
TimeSeries tsJ = new TimeSeries(1);

TimeSeriesPoint tspI;
TimeSeriesPoint tspJ;

double t = 0;
double offset = 0.0;
double amplitude = 2.0;
double step = 0.1;
while(t < 10) {
    double[] v1 = {Math.sin(t)};
    double[] v2 = {Math.sin(t+offset)*amplitude};
    tspI = new TimeSeriesPoint(v1);
    tspJ = new TimeSeriesPoint(v2);
    tsI.addLast(t, tspI);
    tsJ.addLast(t, tspJ);

    t += step;
}

System.out.println("FastDTW.getWarpDistBetween(tsI, tsJ) = "+FastDTW.getWarpDistBetween(tsI, tsJ));
FastDTW.getWarpDistBetween(tsI, tsJ) = 46.33334518229166
  • Note that the measure can be through all of the dimensions, so this may take some refactoring
  • Next step is to add this to the FlockRecorder class and output to excel and ARFF. I think this should replace the ‘deltas’ outputs. Done!
  • Running DBSCAN clustering in WEKA on the outputs
    • All Exploit – Social Radius = 0: All NOISE
    • All Exploit – Social Radius = 0.1 ALL NOISE
    • All Exploit – Social Radius = 0.2 (32 NOISE)
      === Model and evaluation on training set ===
      
      Clustered Instances
      
      0       68 (100%)
      
      Unclustered instances : 32
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  -- assigned to cluster
       68 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      
      Incorrectly clustered instances :	0.0	  0      %
    • All Exploit – Social Radius = 0.4 (86 NOISE)
      == Model and evaluation on training set ===
      
      Clustered Instances
      
      0       14 (100%)
      
      Unclustered instances : 86
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  -- assigned to cluster
       14 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      
      Incorrectly clustered instances :	0.0	  0      %
    • All Exploit – Social Radius = 0.8 (41 NOISE)
      === Model and evaluation on training set ===
      
      Clustered Instances
      
      0       45 ( 76%)
      1        7 ( 12%)
      2        7 ( 12%)
      
      Unclustered instances : 41
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  1  2  -- assigned to cluster
       45  7  7 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      Cluster 1 -- No class
      Cluster 2 -- No class
      
      Incorrectly clustered instances :	14.0	 14      %
    • All Exploit – Social Radius = 1.6 (51 NOISE)
      === Model and evaluation on training set ===
      
      Clustered Instances
      
      0       49 (100%)
      
      Unclustered instances : 51
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  -- assigned to cluster
       49 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      
      Incorrectly clustered instances :	0.0	  0      %
    • All Exploit – Social Radius = 3.2 (9 NOISE)
      === Model and evaluation on training set ===
      
      Clustered Instances 
      
      0       91 (100%)
      
      Unclustered instances : 9
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  -- assigned to cluster
       91 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      
      Incorrectly clustered instances :	0.0	  0      %
    • All Exploit – Social Radius = 6.4 (8 NOISE)
      === Model and evaluation on training set ===
      
      Clustered Instances
      
      0       86 ( 93%)
      1        6 (  7%)
      
      Unclustered instances : 8
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  1  -- assigned to cluster
       86  6 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      Cluster 1 -- No class
      
      Incorrectly clustered instances :	6.0	  6      %
      
    • All Exploit – Social Radius = 10
      === Model and evaluation on training set ===
      
      Clustered Instances
      
      0       82 ( 91%)
      1        8 (  9%)
      
      Unclustered instances : 10
      
      Class attribute: AgentBias_
      Classes to Clusters:
      
        0  1  -- assigned to cluster
       82  8 | EXPLOITER
      
      Cluster 0 -- EXPLOITER
      Cluster 1 -- No class
      
      Incorrectly clustered instances :	8.0	  8      %
  • So what this all means is that the DTW produces reasonable data that can be used for clustering. The results seem to match the plots. I think I can write this up now…

12:00 – 5:00 BRC

  • Clustering discussions with Aaron
  • GEM Meeting

Phil 2.20.17

7:00 – 11:00 Research

  • PathNet article and paper. Using genetic techniques to produce better NN systems. GAs are treated like gradient descent. Which makes sense, as gradient descent and hillclimbing are pretty much the same thing
    • “Since scientists started building and training neural networks, Transfer Learning has been the main bottleneck. Transfer Learning is the ability of an AI to learn from different tasks and apply its pre-learned knowledge to a completely new task. It is implicit that with this precedent knowledge, the AI will perform better and train faster than de novo neural networks on the new task.”
  • Adding angle and mean deltas. Interesting results, but still not sure on the best approach to classify…
  • Newest version is at philfeldman.com/GroupPolarization
  • So here’s a pretty typical population. It’s 10% Explorer, 90% Exploiter. Exploit social influence radius is 0.2. These settings produce an orbiting flock. Between-group interaction is allowed, so This is a grid where the accumulated relationship of each agent to every other agent is shown. Red is closest, green is farthestcolorizedpositions You can see the different populations pretty well. One thing that isn’t that obvious is that exploiters are on average slightly closer to each other than to exploiters.
  • A more extreme example is where the Exploit influence distance is 10: colorizedpositions2 These tables show just relative position when compared to the origin.
  • Although I can’t figure out how to classify using this data, clustering works pretty well. This is Canopy (WEKA) on the top dataset above:
    === Run information ===
    
    Scheme: weka.clusterers.Canopy -N -1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t2 -1.0 -t1 -1.25 -S 1
    Relation: ORIGIN_POSITION_DELTA
    Instances: 100
    Attributes: 102
    [list of attributes omitted]
    Test mode: Classes to clusters evaluation on training data
    
    === Clustering model (full training set) ===
    
    Canopy clustering
    =================
    
    Number of canopies (cluster centers) found: 2
    T2 radius: 3.137
    T1 radius: 3.922
    
    Cluster 0: 0.283631,0.443357,0.240249,0.280277,0.396611,0.258673,0.28608,0.27558,0.312295,0.215801,0.249255,0.25779,0.280719,0.273191,0.58818,0.258901,0.196191,0.240405,0.201927,0.273491,0.271862,0.266807,0.249377,0.269756,0.265874,0.252873,0.299417,0.244208,0.284257,0.253868,0.234348,0.213578,0.242031,0.248292,0.215259,0.236993,0.301843,0.245444,0.282464,0.290885,0.216585,0.375846,0.223493,0.278251,0.375965,0.764462,0.338657,0.280672,0.316447,0.261622,0.265026,0.436098,0.246442,0.246887,0.289306,0.470806,0.43541,0.209845,0.220971,0.21506,0.247576,0.249173,0.468053,0.28907,0.418987,0.293851,0.452858,0.267638,0.243671,0.248868,0.242674,0.371534,0.29843,0.221506,0.25575,0.242182,0.335877,0.28386,0.303986,0.235298,0.282083,0.427425,0.26635,0.251009,0.304134,0.281157,0.212644,0.367693,0.222213,0.247862,0.780248,0.894699,0.713413,0.865287,0.826024,0.868741,0.757008,0.807287,0.785141,0.756071,{88}
    Cluster 1: 0.919922,0.669721,0.908035,0.73578,0.591465,0.752733,0.774358,0.826861,0.84364,0.884803,0.939301,0.958981,0.629587,0.76459,0.545587,0.715267,0.853073,0.803545,0.851979,0.693952,0.954557,0.703606,0.897206,0.698297,0.926263,0.91898,0.733686,0.818759,0.763319,0.776199,0.843167,0.811708,0.903011,0.814435,0.804113,0.916336,0.639919,0.779399,0.663897,0.754696,0.77482,0.682512,0.832556,0.764008,0.703999,0.513612,0.693526,0.734279,0.723504,0.903016,0.777757,0.597915,0.86509,0.900357,0.724636,0.648915,0.577278,0.883327,0.828117,0.813873,0.860062,0.915821,0.684886,0.979451,0.556747,0.667678,0.556487,0.941671,0.898276,0.902846,0.686763,0.664381,0.709607,0.706246,0.890753,0.898794,0.588379,1.001214,0.625244,0.761188,0.828436,0.661864,0.759379,0.944355,0.728272,0.764909,0.761139,0.65028,0.845547,0.87213,0.586679,0.500194,0.498893,0.513267,0.493026,0.58192,0.620756,0.469854,0.540532,0.496272,{12}
    
    Time taken to build model (full training data) : 0.03 seconds
    
    === Model and evaluation on training set ===
    
    Clustered Instances
    
    0 88 ( 88%)
    1 12 ( 12%)
    
    Class attribute: AgentBias_
    Classes to Clusters:
    
    0 1 -- assigned to cluster
    0 10 | EXPLORER
    88 2 | EXPLOITER
    
    Cluster 0 -- EXPLOITER
    Cluster 1 -- EXPLORER
    
    Incorrectly clustered instances : 2.0 2 %
  • The next analyses is on the second dataset. They are essentially the same, even though the differences are more dramatic (the tight clusters are very tight
    === Run information ===
    
    Scheme:       weka.clusterers.Canopy -N -1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t2 -1.0 -t1 -1.25 -S 1
    Relation:     ORIGIN_POSITION_DELTA
    Instances:    100
    Attributes:   102
                  [list of attributes omitted]
    Test mode:    Classes to clusters evaluation on training data
    
    === Clustering model (full training set) ===
    
    
    Canopy clustering
    =================
    
    Number of canopies (cluster centers) found: 2
    T2 radius: 3.438     
    T1 radius: 4.297     
    
    Cluster 0: 0.085848,0.050964,0.0513,0.053288,0.05439,0.054653,0.21758,0.057725,0.058775,0.050894,0.053768,0.130821,0.051098,0.050923,0.051115,0.050893,0.051012,0.051009,0.060649,0.051454,0.051089,0.051032,0.050894,0.053364,0.276684,0.051857,0.050984,0.050942,0.0509,0.050952,0.051025,0.056953,0.050914,0.050962,0.050903,0.052129,0.128196,0.051023,0.054222,0.274438,0.053978,0.050934,0.051124,0.054563,0.050995,0.074289,0.051077,0.05094,0.053644,0.050941,0.051343,0.050967,0.062704,0.052333,0.050936,0.051013,0.050922,0.051007,0.051038,0.050899,0.501239,0.051574,0.051005,0.050898,0.050944,0.204398,0.06076,0.050947,0.050904,0.408553,0.051263,0.0511,0.051574,0.069173,0.050997,0.162314,0.051353,0.096523,0.498648,0.339103,0.051125,0.050888,0.051002,0.051124,0.080711,0.05105,0.051024,0.050988,0.100492,0.132793,0.630178,0.882598,0.832132,0.86452,0.55151,0.729317,0.755526,0.513822,0.782104,0.768836,{92} 
    Cluster 1: 0.799117,0.793729,0.79643,0.7929,0.797843,0.797642,0.709935,0.78817,0.805937,0.794095,0.7972,0.76062,0.793743,0.79418,0.794846,0.794247,0.794677,0.793599,0.800359,0.794787,0.793849,0.793805,0.793613,0.784762,0.774656,0.79547,0.794308,0.793527,0.794406,0.793292,0.793513,0.800151,0.793775,0.793652,0.794123,0.793645,0.73331,0.794506,0.788542,0.710244,0.793332,0.793313,0.794184,0.801119,0.79448,0.802416,0.793669,0.7947,0.794813,0.794533,0.796484,0.794512,0.797614,0.794607,0.793716,0.793642,0.793548,0.794789,0.793551,0.793989,0.539133,0.79391,0.793443,0.793969,0.794472,0.715896,0.790956,0.794494,0.794293,0.678147,0.79434,0.793611,0.794221,0.802197,0.793753,0.759132,0.794164,0.798071,0.55929,0.698333,0.79444,0.79424,0.793585,0.793581,0.779958,0.79394,0.793567,0.794795,0.764686,0.754727,0.482214,0.518683,0.434538,0.501648,0.790616,0.4855,0.464554,0.691735,0.405411,0.496892,{8} 
    
    
    
    Time taken to build model (full training data) : 0.01 seconds
    
    === Model and evaluation on training set ===
    
    Clustered Instances
    
    0       88 ( 88%)
    1       12 ( 12%)
    
    
    Class attribute: AgentBias_
    Classes to Clusters:
    
      0  1  -- assigned to cluster
      0 10 | EXPLORER
     88  2 | EXPLOITER
    
    Cluster 0 -- EXPLOITER
    Cluster 1 -- EXPLORER
    
    Incorrectly clustered instances :	2.0	  2      %
  • Online clustering, fear and uncertainty in Egypt’s transition (Published today). Wow. Downloaded

11:00 – 6:00 BRC

  • Spent the rest of the day working on the CHIMERA paper with Aaron

Phil 2.17.17

7:00 – 8:00 research

  • I think I want to navigate the information space of Trump’s tweets
  • Still working on how to classify an agent. After struggling a bit, I can classify very well if I eliminate extraneous infor from mean angle stats, leaving only bias and variance

8:30 – 10:30, 4:00 – 5:00

  • Working on creating, extracting and classifying cluster membership from flocks.
  • Had to leave early to help Barbara with Buck
  • Discussed exec summary with Aaron. Will write on Monday

Phil 2.16.17

7:00 – 8:00 Research

  • Had a great time NOT DOING ANY THINKING yesterday
  • Rechecking the velocity comparison matrix. It’s correct. Looking at multiplying or adding relative position vs relative velocity
  • Sent a few charts to Don to see if he can make anything pretty
  • Uploaded new version

8:30 – 5:00 BRC

Phil 2.14.17

7:00 – 8:00 Research

  • Based on the charts from yesterday, I think I’m going to build two matrices to point WEKA at. Essentially, theses matrices will be filled with meta-cluster information
    • Average distance from agent to agent. Tightly clustered agents should have low average distances. DBSCAN should also work on this, as well as bootstrapping. That should cover this case: exploit_vs_exploit
    • Average velocity from agent to agent. I’m not sure what I’ll get from this, but in looking at the explore-explore case and the explore-exploit case, it strikes me that there may be some difference that is meaningful. And in the exploit-exploit case, the velocities should be near zero explore_vs_exploit Explore-exploitexplore_vs_explore Explore-explore
    • Start with Excel, and then add an ARFF
      • Got most of the methods built. Might finish this morning at work.
      • Indeed, you can get a lot done when you’re sitting in on a Skype meeting and they’re not talking about your part…
      • Ok, so I’ve added comparison matrices as Excel and ARFF output. In this case WEKA does better charting, so here goes. The first chart is exploit-exploit. Note that the majority of points are at 0,0: explit-exploit-deltas Next, an explore-exploit. In this case, there’s a cluster on the left side of the chart: exploit-explore-deltasLast, is the explore-explore chart, which has a cluster towards the middle: explore-explore-deltas
      • This data also seems to be good to train a NaiveBayes Classifier. Here’s the result of an initial run:
        === Stratified cross-validation ===
        === Summary ===
        
        Correctly Classified Instances          97               97      %
        Incorrectly Classified Instances         3                3      %
        Kappa statistic                          0.94  
        Mean absolute error                      0.03  
        Root mean squared error                  0.1732
        Relative absolute error                  6      %
        Root relative squared error             34.6337 %
        Total Number of Instances              100     
        
        === Detailed Accuracy By Class ===
        
                         TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                         1.000    0.059    0.942      1.000    0.970      0.942    0.971     0.942     EXPLORER
                         0.941    0.000    1.000      0.941    0.970      0.942    1.000     1.000     EXPLOITER
        Weighted Avg.    0.970    0.029    0.972      0.970    0.970      0.942    0.986     0.972     
        
        === Confusion Matrix ===
        
          a  b   -- classified as
         49  0 |  a = EXPLORER
          3 48 |  b = EXPLOITER
      • Velocity also works, the plots aren’t as crisp, but the classifier accuracy is about the same: exploit-exploit-velocity Exploit-Exploit exploit-explore-velocity Explore-Exploit explore-explore-velocity Explore-Explore
      • Again, classification looks good:
        === Stratified cross-validation ===
        === Summary ===
        
        Correctly Classified Instances          99               99      %
        Incorrectly Classified Instances         1                1      %
        Kappa statistic                          0.98  
        Mean absolute error                      0.01  
        Root mean squared error                  0.1   
        Relative absolute error                  2      %
        Root relative squared error             19.9957 %
        Total Number of Instances              100     
        
        === Detailed Accuracy By Class ===
        
                         TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                         1.000    0.020    0.980      1.000    0.990      0.980    1.000     1.000     EXPLORER
                         0.980    0.000    1.000      0.980    0.990      0.980    1.000     1.000     EXPLOITER
        Weighted Avg.    0.990    0.010    0.990      0.990    0.990      0.980    1.000     1.000     
        
        === Confusion Matrix ===
        
          a  b   -- classified as
         49  0 |  a = EXPLORER
          1 50 |  b = EXPLOITER
    • Uploaded new version of the tool to philfeldman.com/GroupPolarization/GroupPloarizationModel.jar

8:30 – 3:30. BRC

  • Either start on the ResearchBrowser or continue with meta-clustering.
  • Grooming and sprint planning today – done! And good progress while hanging out on the phone.

Phil 2.13.17

7:00 – 8:00, 3:00 – 5:30 Research

  • Getting the toArff methods working with the WEKA date format: The default format string accepts the ISO-8601 combined date and time format: yyyy-MM-dd'T'HH:mm:ss
  • Going to go with Milliseconds format, and then set the ARFF to default to 1 sec increments. Nope, that didn’t work. Going with the default fromat and incrementing by seconds.
  • Fika
  • Meeting with Wayne. Basically catching up, but I also got to show him the sim. And for a numerical model of community behavior based on sociophysics, we got into a very involved conversation. Visualization helps tell stories.

8:30 – 2:30 BRC

  • Testing periodicity and time series classifiers
  • I can forecast in a bunch of different ways, but I can’t seem to classify. There are several filters, but no real explanation
  • Redoing some simulations, to see the best way of checking for Explorer/exploiter behavior. Here are some screenshots:
  • Explore-Explore explore_vs_explore
  • Explore-Exploit explore_vs_exploit
  • Exploit-Exploit exploit_vs_exploit

Phil 2.10.17

7:00 – 8:30 Research

  • Adding the ability to set maxSlew and slew Variance on init. Can comment out particle selection. Done!
  • Commenting out particle to save GUI space Done!
  • Using ATLAS.ti 8 Windows in Literature Reviews
  • Uploaded the newest version and pinged Don.

10:00 – 5:00 BRC

  • The Porsche didn’t start, so I had to bike out and get the Honda. Brrr!
  • Submit for travel expenses
  • Learning how to do time series in WEKA
  • Looks like I need to rotate the matrix so that each column is an agent, and each row is a time step. Yep. Done. Had to add a case to toArff() where there are no row names
  • Discovered the arff viewer, which is awesome
  • Discovered the plugin manager. Nice. wrappers for other AI packages!
  • Predictive Learning. You just need to read in the file, then use the Forecast tabselect the time column and then the items you want to track: flockingprediction

Phil 2.9.17

7:00 – 8:30, 4:00 – 5:00  Research

8:30 – 3:30 BRC

  • Assembling notes from the trip – done
  • Starting to work on automated clustering. Nope Jira tasking. Done enough…?

Phil 2.1.17

7:00 = 8:00 Research

  • Notes from meeting with Don to go over abstract
    • See how boids work if speed is integrated with the size of the vector. Should be a matter of commenting out the normalization step and forcing speed to be 1
    • Worked out the flocking equation, need to learn the equation editor…
  • Submitted! Now it needs to get fleshed out into a full paper
  • Continuing  Filter bubbles, echo chambers, and online news consumption

8:30 – 5:00 BRC

  • Talking with Aaron about Research Browser and Clustering proposals
  • Working on clustering code – trying out some different agent motion code

Phil 1.31.17

Do today

7:00 – 8:00 Research

  • I have a server! tacjour.rs.umbc.edu
  • Starting Filter bubbles, echo chambers, and online news consumption
    • Seth R. Flaxman – I am currently undertaking a postdoc with Yee Whye Teh at Oxford in the computational statistics and machine learning group in the Department of Statistics. My research is on scalable methods and flexible models for spatiotemporal statistics and Bayesian machine learning, applied to public policy and social science areas including crime, emotion, and public health. I helped make a very accessible animation answering the question, What is Machine Learning?
    • Sharad Goel – I’m an Assistant Professor at Stanford in the Department of Management Science & Engineering (in the School of Engineering). I also have courtesy appointments in Sociology and Computer Science. My primary area of research is computational social science, an emerging discipline at the intersection of computer science, statistics, and the social sciences. I’m particularly interested in applying modern computational and statistical techniques to understand and improve public policy.
    • Justin M. Rao – I am a Senior Researcher at Microsoft Research. A member of our New York City lab, an interdisciplinary research group combining social science with computational and theoretical methods, I am currently located at company HQ in the Seattle area, where I am also an Affiliate Professor of Economics at the University of Washington.
    • Spearman’s Rank-Order Correlation
    • Goel, Mason, and Watts (2010) show that a substantial fraction of ties in online social networks are between individuals on opposite sides of the political spectrum, opening up the possibility for diverse content discovery. [p 299]
      • I think this helps in areas where flocking can occur. Changing heading is hardest when opinions are moving in opposite directions. Finding a variety of perspectives may change the dynamic.
    • Specifically, users who predominately visit left-leaning news outlets only very
      rarely read substantive news articles from conservative sites, and vice versa
      for right-leaning readers, an effect that is even more pronounced for opinion
      articles.

      • Is the range of information available from left or right-leaning sites different? Is there another way to look at the populations? I think it’s very easy to get polarized left or right, but seeking diversity is different, and may have a pattern of seeking less polarized voices?
    • Interestingly, exposure to opposing perspectives is higher for the
      channels associated with the highest segregation, search, and social. Thus,
      counterintuitively, we find evidence that recent technological changes both
      increase and decrease various aspects of the partisan divide.

      • To me this follows, because anti belief helps in the polarization process.
    • We select an initial universe of news outlets (i.e., web domains) via the Open Directory Project (ODP, dmoz.org), a collective of tens of thousands of editors who hand-label websites into a classification hierarchy. This gives 7,923 distinct domains labeled as news, politics/news, politics/media, and regional/news. Since the vast majority of these news sites receive relatively little traffic,
      •  Still a good option for mapping. Though I’d like to compare with schema.org
    • Specifically, our primary analysis is based on the subset of users who have read at least ten substantive news articles and at least two opinion pieces in the three-month time frame we consider. This first requirement reduces our initial sample of 1.2 million individuals to 173,450 (14 percent of the total); the second requirement further reduces the sample to 50,383 (4 percent of the total). These numbers are generally lower than past estimates, likely because of our focus on substantive news and opinion (which excludes sports, entertainment, and other soft news), and our explicit activity measures (as opposed to self-reports).
      • Good indicator of explore-exploit in the user population at least in the context of news.
    • We now define the polarity of an individual to be the typical polarity of the news outlet that he or she visits. We then define segregation to be the expected distance between the polarity scores of two randomly selected users. This definition of segregation, which is in line with past work (Dandekar, Goel, and Lee 2013), intuitively captures the idea that segregated populations are those in which pairs of individuals are, on average, far apart.
      • This fits nicely with my notion of belief space
    • ideological-segregation-across-channels
      • This is interesting. Figure 3 shows that aggregators and direct (which have some level of external curation, are substantially less polarized than the social and search-based channels. That’s a good indicator that the visible information horizon makes a difference in what is accessed.
    • our findings do suggest that the relatively recent ability to instantly query large corpora of news articles—vastly expanding users’ choice sets—contributes to increased ideological segregation
      • The frictionlessness of being able to find exactly what you want to see, without being exposed to things that you disagree with.
    • In particular, that level of segregation corresponds to the ideological distance between Fox News and Daily Kos, which represents meaningful differences in coverage (Baum and Groeling 2008) but is within the mainstream political spectrum. Consequently, though the predicted filter bubble and echo chamber mechanisms do appear to increase online segregation, their overall effects at this time are somewhat limited.
      • But this depends on how opinion is moving. We are always redefining normal. It would also be good to look at the news producers using this approach…?
    • This finding of within-user ideological concentration is driven in part by the fact that individuals often simply turn to a single news source for information: 78 percent of users get the majority of their news from a single publication, and 94 percent get a majority from at most two sources. …even when individuals visit a variety of news outlets, they are, by and large, frequenting publications with similar ideological perspectives.
    • opposingpartisanexposure
      • Although I think focussing on ‘opposing’ rather than ‘diverse’ biases these results, this still shows that populations of users behave differently, and that the channel has a distinct effect.
    • …relatively high within-user variation is a product of reading a variety of centrist and right-leaning outlets, and not exposure to truly ideologically diverse content.
      • So left leaning is more diverse across ideology
    • the outlets that dominate partisan news coverage are still relatively mainstream, ranging from the New York Times on the left to Fox News on the right; the more extreme ideological sites (e.g., Breitbart), which presumably benefited from the rise of online publishing, do not appear to qualitatively impact the dynamics of news consumption.

8:30 – 4:00 BRC

  • Finished a second pass through the ResearchBrowser white paper
  • Thinking about optimal sequential clustering
    • A Framework of Mining Semantic Regions from Trajectories
    • This also makes me wonder if we should be looking at our patients as angle from mean
    • Phase 1 : optimize current algorithm to hillclimb for most cluster and least unclustered by varying EPS for a given cluster minimum
    • Phase 2: Do NMF analysis of patient clusters to extract meaningful labels
    • Phase 3: Model patient trajectories through diagnosis space