Monthly Archives: December 2016

Phil 12.12.16

7:00 – 3:00 ASRC

  • Register for Spring 2017
  • Continuing with Sociophysics
  • Integrity data
    • Pulling all empty columns, and columns that contain encrypted data. Stupid slow and error prone. Going to write a quick app. It should be able to store and output data about all the useful columns in the tables. I should also be able to incorporate Gregg’s data dictionary (And look for toLower(“need definition”))
    • Possibly output persistent queries for Java?
    • Sparsity score?
    • Column matches across tables?
    • Once master matrix is built, then use NMF to cluster columns for predictive capability
  • Sprint review
    • Ticket to fix my vpn access?

Phil 12.9.16

7:00 – 5:00 ASRC

  • Clickbait? How to Make an Amazing Tensorflow Chatbot Easily
  • Good article on information bubbles (Clinton and Trump). Visualizations clearly show bubble and star. Parallel Narratives
  • Continuing with Sociophysics
    • Opinion Formation
    • On [page 62], the authors discuss Phase transitions in a two-parameter model of opinion dynamics with random kinetic exchanges, which they say shows  that agent behavior is unrealistic when there is only positive influence. This could support my Anti-belief element in my model.
    • On [page 66] the authors briefly discuss Opinion dynamics with confidence threshold: an alternative to the Axelrod model, where voter have continuous opinion. Clusters happen when confidence is 0 < c < 1. Interval notation (0, 1)
    • An important point seems to be the number of opinions. For low numbers of discretized opinions and many agents, clustering happens. For the reverse, pretty much every agent has their own opinion. Runs at different number of opinions can show the thresholds that these these transitions happen (called precipitation?).
    • On [page 69] there is a brief mention of a model with a vector of opinions, which sounds a lot like my ‘belief’ being a set of statements. The title looks good too: Different topologies for a herding model of opinion (abstract below)
    • Back to NMF
      • Have sliders adjust scalar matrices
      • Chain the raw, scalar and scaled matrices together.
      • Make sure that the changed matrices are visible.
      • Add a load option that if there is only one matrix, that factorization can be run on that matrix, rather than having to read in the factored spreadsheets. Will need a (k) size select and a ‘calculate factors’ button
    • BRC
      • It’s impossible to get the VPN to work. Asked Stan for a better machine.
      • Matt is getting the full DB downloaded so I can work on it locally. Done
      • Can’t create the DB. Messages into Matt and Gregg.
        • Wound up doing the following to create the postgres db in pgAdminIII
          1. Created user postgres
          2. Created new db npi_raw as a UTF8 db, with postgres as owner.
          3. Created dev_ci and integrety_ci schemas with postgres as owner
          4. Right-click on the dev_ci schema and select restore. Navigate to the npi_raw.backup file and click ‘Restore’
          5. Right-click on the integrity_ci schema and select restore. Navigate to the integrity_ci folder and then restore each of the backup files.

Phil 12.8.16

7:00 – 4:00 ASRC

  • Continuing with Sociophysics
    • Found while reading about opinion dynamics modelling [page 56]: Heterogeneous bounds of confidence: meet, discuss and find consensus! – In this paper, heterogeneous bounds of confidence are studied. The surprising result is that a society of agents with two different bounds of confidence (open-minded and closed minded agents) can find consensus even when both bounds of confidence are significantly below the critical bound of confidence of a homogeneous society. I think that this may represent exploiters and explorers. Need to read.

  • Back to NMF
    • setting up sliders to manipulate row, column or cell for a matrix.
      • Changes to the row or col mat should result in a recalculation of the product matrix. Product matrix should just recalculate based on scalars
      • Got the event handlers set up, working on putting sliders in place
  • Getting the VPN set up so I can access the DB – no luck. Need to take it up with Heath tomorrow
  • 1:00 – 4:00 meeting with Aaron, Theresa, Katy, Greg & Jeremy

Phil 12.7.16

7:00 – 5:30 ASRC

  • Continuing with Sociophysics
  • Opinion Formation
  • NMF WMN
    • Got the product matrix calculated and showing
    • Added RowSum
    • Adding ColumnSum – done
    • Thinking about how to manipulate all this.
      • Adding scalar and scaled matrices. I still need to wire them up, but I’m going to work on modifying the weight matrix first.
      • Got the row, column and value of the selected cell. Next is to set up sliders to scale same on the selected matrix/cell.
  • Progress for today nmftestbed_12_7_16
  • More discussions on data modeling.
  • Meeting with Aaron and Katy.

Phil 12.6.16

7:00 – 4:00 ASRC

  • Getting a server
    • Campus cluster (underutilized) Free Student access (Damian Doyle) HPCF.umbc.edu. Write a note for Wayne to send. Done
  • Note to Wayne on getting a Google grant. Done.
  • Using Java Persistence API (JPA) with Cloud SQL
  • Google datasets
  • Continuing with Sociophysics – nope, maybe this afternoon.
  • Working on incorporating the factor matrices into the NmfModelGui. Looks like I use Maps as discussed at the bottom of this page. Building a L2DMat2Table class that should take care of doing this for each matrix.
  • Need to be able to output a Labled2DMatrix as a map. Working on that. Done. Oh, I need column maps. No, not really, just a complete header list.
  • And it’s working! Here’s the two factored matrices: factoredtables
  • Worked with Aaron on the data document. The tables are poorly put together and let’s just say, not self-documenting.

Phil 12.5.16

7:00 – 5:00 ASRC

  • Here’s Where Donald Trump Gets His News

    • It would be interesting to do a crawl on those sources (weighted by the amount visited?) and do a word model analysis.
  • Continuing with Sociophysics
  • Fixed the header problem in Labled2dMatrix.fromExcelSheet()
  • Here’s the term extraction using NMF with a k = 2:
    chapter-3-the-spouter-inn, chapter-1-loomings, chapter-2-the-carpet-bag
    harpooneer
    water
    landlord
    about
    light
    stand
    thousand
    other
    night
    passenger
    
    chapter-2-the-carpet-bag, chapter-4-the-counterpane, chapter-1-loomings
    queequeg
    landlord
    harpooneer
    sailor
    money
    nantucket
    passenger
    sight
    where
    dream
  • And here are the document clusters. Just looking at this, I see that each term-triple (the top three from the above list) could easily cluster into three groups each:
    queequeg-landlord-harpooneer
    chapter-2-the-carpet-bag 4.158255493
    chapter-4-the-counterpane 2.889135473
    chapter-1-loomings 2.370002854
    chapter-3-the-spouter-inn 0.651989511
    
    harpooneer-water-landlord
    chapter-3-the-spouter-inn 7.111998945
    chapter-1-loomings 1.72918472
    chapter-2-the-carpet-bag 1.157125575
    chapter-4-the-counterpane 0.000284084
  • Working on incorporating the factor matrices into the NmfModelGui. Looks like I use Maps as discussed at the bottom of this page. Building a L2DMat2Table class that should take care of doing this for each matrix.
  • Some thoughts abut Jobs, work and the Amish. Are they economically competitive? Value-add? Why does this work?
  • Fika talk Dr. Quincey Brown
    • Educational Games – Microsoft Imagine Cup
    • Mobile intelligent tutoring systems. Math on iPads
    • Children using touch and gesture. Different than adult usage?
    • Broadening participation in computing, Grace Hopper, etc.
    • Science and Technology fellowships at the AAAS (during healthcare.gov launch)
    • White House nation of makers
    •  No real effort to cultivate press relationships and new media ways to get the word out.Leverage YouTube?
  • Meeting with Wayne
    • Latent social hacking
  • Getting a server
    • Contact Ron for an abandoned Blade
    • Campus cluster (underutilized) Free Student access (Damian Doyle) HPCF.umbc.edu. Write a note for Wayne to send.
    • UMBC agreement with AWS
  • Note to Wayne on getting a Google grant

Phil 12.2.12

7:00 – 6:00 ASRC

  • Wrote up notes from yesterday’s meeting with Aaron
  • Google had lined this up for me: A Study of Factuality, Objectivity and Relevance: Three Desiderata in Large-Scale Information Retrieval?
  • Back to Sociophysics
    • I appear to be working with (maybe?) class ‘C’ social networks, where links connect people indirectly [pg 19].Covered in chapter 7 – Of Flocks, Flows and Transports
  • Starting on NMF LMN tool. I need some interactivity to see how manipulation works
    • Copied the guts over from LMN. I think the best way to do this first is to show the two factored mats as tables and then show the result as text. Then scale rows, cols or cells
    • Changing Labled2DMatrix to handle reading from a given sheet name/number – done
  • Meeting with Don to work through Opinion Dynamics paper math.
    • Word for today – Quotient Set, the splitting of a set into subsets that the rule allows and that it does not. The rule isDigit() compared against set {1,w,3,r,5,h} would produce {{1, 3, 5}, {w, r, h}}

Phil 12.1.16

7:00 – 6:00 ASRC

  • Back to Sociophysics
  • More NMF
    • Run through the row/col mats and get the top N items as topic or document clusters. Look for big jumps? Cluster using DBSCAN?
    • Got the factor matrices’ columns labeled by setting all the columns but one in the row/column weight mat to zero. The recreated matrix can then be sorted by row or column. The thought is that the highest values are the items that are most sensitive. It’s hard to get a feel though. I think I have to build an interactive app that I can watch the effects. Intuitively, since we are building a matrix from the dot product of the rows in the two factor mats, the effects should be matrix wide.
    • I’m also saving the wrong items out from the corpus manager. I need to save the matrix factors, not the recreated matrix. I think one document with two spreadsheets would be a nice way to store. Done. Included the reconstructed matrix since it’s (a) needed to produce the column names and (b) stochastically produced, so it’s uniqe and tightly coupled to the factor matrices.
    • Results for today:
      rMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5, 3, 0, 1, 
      Doc2, 4, 0, 0, 1, 
      Doc3, 1, 1, 0, 5, 
      Doc4, 1, 0, 0, 4, 
      Doc5, 0, 1, 5, 4, 
      
      newMat
       , Trm1, Trm2, Trm3, Trm4, 
      Doc1, 5.03, 2.91, 4.39, 0.99, 
      Doc2, 3.97, 2.3, 3.64, 0.99, 
      Doc3, 1.09, 0.77, 5.03, 4.99, 
      Doc4, 0.96, 0.67, 4.08, 3.99, 
      Doc5, 2.05, 1.3, 4.9, 4.05, 
      average difference = 0.0757473875544334
      sorted columns {Trm3=22.05575184668782, Trm4=15.022558101226334, Trm1=13.097328480010614, Trm2=7.951735152321482}
      sorted rows {Doc1=13.33704116430316, Doc5=12.301738392587723, Doc3=11.885670818212631, Doc2=10.911189185303712, Doc4=9.691734019839023}
      
      rowMat
       , Trm1-Trm3, Trm4-Trm3, 
      Doc1, 2.33, 0.3, 
      Doc2, 1.83, 0.33, 
      Doc3, 0.4, 2.04, 
      Doc4, 0.36, 1.63, 
      Doc5, 0.87, 1.63, 
      
      colMat
       , Doc1-Doc2, Doc3-Doc5, 
      Trm1, 2.15, 0.11, 
      Trm2, 1.23, 0.14, 
      Trm3, 1.61, 2.15, 
      Trm4, 0.11, 2.43,
  • More BRC? At least verify what my story is.
  • Axios? We are a new media company delivering vital, trustworthy news and analysis in the most efficient, illuminating and shareable ways possible. We offer a mix of original and smartly narrated coverage of media trends, tech, business and politics with expertise, voice AND smart brevity — on a new and innovative mobile platform. At Axios — the Greek word for worthy — we provide only content worthy of people’s time, attention and trust.
  • Dr. Phyllis Schneck  – 3:30pm Thursday, 1 December 2016, UC 310, UMBC
    • Indicators of attack
      • IP address
      • Domain name
      • attachment
      • clusters
      • No PII
    • Reputation system? What kind of feeds do you need? Looking for very high accuracy. Multiple streams for improved statistical power?
    • Got me thinking about a ‘legal-targeted Stuxnet’. Imagine something that was set into legal databases (Legislation, regulation and case law) that simply changed some small percentage of ‘shalls’ to ‘wills’. That could be pretty damaging over the long run. Something smarter could be more subtle and directed, just like the Natanz attack. Wound up stopping by Aaron M’s office and chatted for about an hour about this. Also some potential research discussions before Dec 21.