Category Archives: research

Phil 4.22.16

7:00 – 4:30 VTX

  • Had a thought going to sleep last night that it would be interesting to see the difference between a ‘naive’ ranking based on the number of quotes vs. PageRank. Pretty much as soon as I got up, I pulled down the spreadsheet and got the lists. It’s in the previous post, but I’ll pot them here too:
    • Sorted from most to least quotes
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      P 5: Saracevic_relevance_75.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      P77: The Law of Group Polarization.pdf
      P43: On the Accuracy of Media-based Conflict Event Data.pdf
      System Trust
      P37: Security-control methods for statistical databases – a comparative study.pdf
    • Sorted on Page Rank eigenvector
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      System Trust
      Social Trust
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      Credibility Cues
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • To me it’s really interesting how much better the codes are mixed in to the results. I actually thought it could be the other way, since the codes are common across many papers. Also, the concepts of System Trust, Social Trust and Credibility  Cues very much became a central point in my mind as I worked through the papers.
  • A second thought, which is the next step in the research, is to see ho weighting affects relationships. Right now, the the papers and codes are weighted by the number of quotes. What happens when all the weights are normalized (set to 1.0)?. And then there is the setup of the interactivity. With zero optimizations, this took 4.2 seconds to calculate on a modern laptop. Not sliderbar rates, but change a (some?) values and click a ‘run’ button.
  • So, moving forward, the next steps are to create the Swing App that will:
    • read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
  • I’d like to do something with the connectivity graph. Not sure what yet.
  • And I think I’ll do this in JavaFX rather than Swing this time.
  • Huh. JavaFX Scene Builder is no longer supported by Oracle. Now it’s a Gluon project.
  • Documentation still seems to be at Oracle though
  • Spent most of the day seeing what’s going on with the Crawl. Turns out it was bad formatting on the terms?

Phil 4.21.16

7:00 – VTX

  • A little more bitcoin
  • Installed *another* new Java 1.8.0_92
  • Discovered the arXiv API page. This might be very helpful. I need to dig into it a bit.
  • Testing ranking code. I hate to say this, but if it works I think I’m going to write *another* Swing app to check interactivity rates. Which means I need to instrument the matrix calculations for timing.
  • Ok, the rank table is consistent across all columns. In my test code, the eigenvector stabilizes after 5 iterations:
    initial
     , col1, col2, col3, col4,
    row1, 11, 21, 31, 41,
    row2, 12, 22, 32, 42,
    row3, 13, 23, 33, 43,
    
    derived
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 1, 0, 0, 0.26, 0.49, 0.72, 0.95,
    row2, 0, 1, 0, 0.28, 0.51, 0.74, 0.98,
    row3, 0, 0, 1, 0.3, 0.53, 0.77, 1,
    col1, 0.26, 0.28, 0.3, 1, 0, 0, 0,
    col2, 0.49, 0.51, 0.53, 0, 1, 0, 0,
    col3, 0.72, 0.74, 0.77, 0, 0, 1, 0,
    col4, 0.95, 0.98, 1, 0, 0, 0, 1,
    
    rank
    , row1, row2, row3, col1, col2, col3, col4,
    row1, 0.61, 0.62, 0.64, 0.22, 0.41, 0.59, 0.78,
    row2, 0.62, 0.65, 0.67, 0.23, 0.42, 0.61, 0.8,
    row3, 0.64, 0.67, 0.69, 0.24, 0.43, 0.63, 0.83,
    col1, 0.22, 0.23, 0.24, 0.08, 0.15, 0.22, 0.29,
    col2, 0.41, 0.42, 0.43, 0.15, 0.27, 0.4, 0.52,
    col3, 0.59, 0.61, 0.63, 0.22, 0.4, 0.58, 0.76,
    col4, 0.78, 0.8, 0.83, 0.29, 0.52, 0.76, 1,
    
    EigenVec
    row1, 1, 0.71, 0.62, 0.61, 0.61, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row2, 0, 0.46, 0.61, 0.62, 0.62, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    row3, 0, 0.48, 0.63, 0.64, 0.64, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col1, 0.26, 0.13, 0.21, 0.22, 0.22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col2, 0.49, 0.25, 0.38, 0.41, 0.41, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col3, 0.72, 0.37, 0.55, 0.59, 0.59, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    col4, 0.95, 0.49, 0.73, 0.78, 0.78, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  • And after a lot of banging away, here’s my lit review in PageRank: pageRank
  • And here’s the difference between PageRank and sorting based on number of quotes:
  • Page Rank
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    System Trust
    Social Trust
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    Credibility Cues
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • Sorted from most to least quotes
    P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
    P13: The Egyptian Blogosphere.pdf
    P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
    P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
    P 5: Saracevic_relevance_75.pdf
    P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
    P77: The Law of Group Polarization.pdf
    P43: On the Accuracy of Media-based Conflict Event Data.pdf
    System Trust
    P37: Security-control methods for statistical databases – a comparative study.pdf

Phil 4.20.16

7:00 – 4:00 VTX

  • Read a little more of the BitCoin article. Nice description of the blockchain.
  • Generated a fresh Excel matrix of codes and papers. I excluded the ‘meta’ codes (Definitions, Methods, etc) and all the papers that have zero quotes
  • Duke Ellington & His Orchestra make great coding music.
  • Installed new Java
  • Well drat. I can’t increase the size of an existing matrix. Will have to return a new one.
  • Yay!
    FileUtils.getInputFileName() opening: mat1.xls
    initial
     , col1, col2, col3, col4, 
    row1, 11.0, 21.0, 31.0, 41.0, 
    row2, 12.0, 22.0, 32.0, 42.0, 
    row3, 13.0, 23.0, 33.0, 43.0, 
    done calculating
    done creating
    derived
     , row1, row2, row3, col1, col2, col3, col4, 
    row1, 0.0, 0.0, 0.0, 11.0, 21.0, 31.0, 41.0, 
    row2, 0.0, 0.0, 0.0, 12.0, 22.0, 32.0, 42.0, 
    row3, 0.0, 0.0, 0.0, 13.0, 23.0, 33.0, 43.0, 
    col1, 11.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 
    col2, 21.0, 22.0, 23.0, 0.0, 0.0, 0.0, 0.0, 
    col3, 31.0, 32.0, 33.0, 0.0, 0.0, 0.0, 0.0, 
    col4, 41.0, 42.0, 43.0, 0.0, 0.0, 0.0, 0.0,
  • Need to set Identity and other housekeeping.
  • Added ‘normalizeByMatrix that sets the entire matrix on a unit scale
  • Need to have a calcRank function that squares and normalizes until the difference between output eigenvectors are below a certain threshold or a limit of iterations. Done?

Phil 3.24.16

7:00 – 10:00, 11:00 – 3:00 VTX

  • Was going to continue The Law of Group Polarization, but got sucked into the following. On a related note, I peeked at the group sensemaking paper from CSCW and realized that they are dealing with group polarization issues.
  • Soooooooooo, I went back to check the links that the google search “link:http://dotearth.blogs.nytimes.com” brings up. In looking at the pages (mostly other blog-like sites), the link to dotearth is almost always in the blogroll list that’s off to the side on many of these sites. For example look at the lower right on climatecentral.org, and you’ll see the link.
  • I think this makes sense. These are the generic pages that point to other generic pages. So I went back to Google and searched for ‘Paul Krugman blog‘ and then looked for the oldest post that I could find in the result, which was this one from January 16. Top ratings means that it has to be linked to a lot, so I tried “link:krugman.blogs.nytimes.com/2016/01/23/how-to-make-donald-trump-president/“. Alas, that doesn’t return anything, though “link:krugman.blogs.nytimes.com” does.
  • So I went to the the Wikipedia most referenced pages page. Top ranked was Geographic coordinate system, which has over 600k inbound links. But –
  • Apparently, this is Google being coy. Searching for backlinks can be expensive. Moz has plans that start at $500/month. Bing also seems to have something with an API. Starting to check that out.
    • Added philfeldman.com to my bing webmaster profile. Had to add BingSiteAuth.xml to the site.
    • Nope, looks like it’s just the verified pages
  • Looking at SEMrush. Pretty straightforward and $15 buys you 7,500 lines of results.
    • Here’s the REST-ish API
    • Here’s the first format I’ve tried:
      http://api.semrush.com/analytics/v1/?key=xxxxxxxxxxxxxxxxxxxxxx&target=boardsanctions.com/&type=backlinks&target_type=root_domain&display_sort=page_score_desc&display_limit=10
    • The first thing I tried out was on my angular blog entry, and this is what comes back:
      page_score;source_title;source_url;target_url;anchor;external_num;internal_num;first_seen;last_seen
      1;Philip Feldman;http://philfeldman.com/resume.html;https://phifel.wordpress.com/;blog;7;2;1435698192;1452178691
      1;Phil Feldman Resume (WebGL);http://philfeldman.com/;https://phifel.wordpress.com/;My Primary Blog;15;4;1424207638;1452178080
      1;Phil Feldman Resume (WebGL);http://www.philfeldman.com/;https://phifel.wordpress.com/;My Primary Blog;15;4;1435689880;1452178091
    • Pretty good! Very clean. Then I tried boardsanctions.com:
      page_score;source_title;source_url;target_url;anchor;external_num;internal_num;first_seen;last_seen
      0;Plastic Surgery - Avoiding The Nightmare Case - Social Gaming Wiki FR;http://fr.socialgamingwiki.com/index.php/Plastic_Surgery_-_Avoiding_The_Nightmare_Case;http://boardsanctions.com/;Georgia Medical Board Actions;4;32;1454582397;1454582397
      0;Plastic Surgeon - Advice To Allow You Choose – TFC;http://www.tvfc.de/index.php?printable=yes&title=Plastic_Surgeon_-_Advice_To_Allow_You_Choose;http://boardsanctions.com/;Doctors to avoid;2;28;1452634501;1452634501
      0;Finding A Plastic Surgeon In Your Area – TheorieWiki;http://theoriewiki.org/index.php?oldid=8721&title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Ohio Medical Board Actions;4;40;1451297137;1451297137
      0;How To Prepare For Your Breast Augmentation – TheorieWiki;http://theoriewiki.org/index.php?title=How_To_Prepare_For_Your_Breast_Augmentation;http://boardsanctions.com/;Doctor Complaints;4;33;1444916428;1453210146
      0;Finding A Plastic Surgeon In Your Area: Unterschied zwischen den Versionen – TheorieWiki;http://theoriewiki.org/index.php?diff=8723&oldid=8721&title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Florida Medical Board Sanctions;4;39;1457400844;1457400844
      0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;NC Medical Board Actions;5;35;1448297485;1458043290
      0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;http://boardsanctions.com/;5;35;1448297485;1458043290
      0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?printable=yes&title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;NC Medical Board Actions;5;30;1456257160;1457931212
      0;Benutzer:FelicaAngelo06 – TheorieWiki;http://theoriewiki.org/index.php?printable=yes&title=Benutzer%3AFelicaAngelo06;http://boardsanctions.com/;http://boardsanctions.com/;5;30;1456257160;1457931212
      0;Finding A Plastic Surgeon In Your Area – TheorieWiki;http://theoriewiki.org/index.php?title=Finding_A_Plastic_Surgeon_In_Your_Area;http://boardsanctions.com/;Florida Medical Board Sanctions;4;33;1443858328;1457622408
    • Note that it’s a good thing I’m limiting the results to 10! The second thing to notice is every one of these links is SEO garbage. This one is my favorite. Now, this is ordered according to rank (however that’s calculated) and maybe there are better ways to order the results, but this does make me nervous about using backlinks without some checking. Maybe cosine similarity?
    • So the last thing, if we want to spend some money is to use the common crawl for backlinks. Not sure if it would make any difference, but there would be more insight. As an example, there’s wikireverse which did exactly that.

Phil 3.22.16

7:00 – 7:30

  • I think I want to install this??? https://github.com/dthree/cash
  • Still thinking about social trust and system trust. Today, Brussels was attacked by ISIS or ISIS sympathisers. An official when interviewed said that Belgium had been ‘prepared’ and was ready. No one was surprised that one group of people would try to kill another group of people. In other news, the iPhone from another set of killers was unflaggingly resisting attempts to unlock it. In many ways, every day (ironically because of the news) we are informed how horrible and untrustworthy people can be. And at the same time, every day, our machines generally do what they are supposed to do, and when looked at over time, get better at it. Is it any wonder that we have high system trust and low social trust (or high cynicism?).
  • This isn’t really new. Music can be pure. Musicians can be awful.
  • Continuing The Law of Group Polarization.
    • Page 181: Thus when the  context emphasizes  each  person’s  membership  in  the  social  group  engaging  in deliberation,  polarization  increases.  This finding  is  in  line  with  more  general evidence  that social  ties  among  deliberating  group  members  tend  to  suppress dissent  and  in  that  way  to  lead  to  inferior  decisions.
      • So a website with a strong point of view (Breitbart or Moveon or PETA for example) should have less variance among commenters, while more balanced should have more variance? Data may be here: http://www.journalism.org/2014/10/21/political-polarization-media-habits/. I would think that these could be compared against edit histories on Wikipedia for a more Star-like pattern?
    • Persuasive Arguments Theory (PAT)?
    • Interaction with others increases decision confidence but not decision quality: evidence against information collection views of interactive decision making.
      • So in this case, the paper was scanned and protected, so I couldn’t do OCR on it. The workaround was to export as jpg, then open the first jpg in Acrobat DC, select Tools->organize pages then Inset->from file, shift-click all the pages, select ‘insert after’ and read them in. Once that’s done go to ‘Enhance scans’ and run OCR on the file.
      • Anyway, the paper looks interesting, with quantitative support. I wonder why all this research seems to be focussed in the 1990s through early 2000s? The Wikipedia page on Group Polarization has a wider date range.
  • Working on the rating app. Worried that jsoup doesn’t seem to be pulling down pages that well
    • Got a 403 on https://stackoverflow.com/questions/10716828/joptionpane-showconfirmdialog using URL.openStream, but it works on Google.
    • Going to try a more web-scapey pattern. Checking out Jaunt.
  • Changing the selection lists
  • Adding a check to see what ratings have changed as a user check – Done
  • Need to start on the backlinks.
  • Meeting with Aaron about next steps based on the