Category Archives: Java

Phil 6.1.16

7:00 – 2:00VTX

Phil 5.31.16

7:00 – 4:30 VTX

  • Writing. Working on describing how maintaining many codes in a network contains more (and more subtle) information than grouping similar codes.
  • Working on the UrlChecker
    • In the process, I discovered that the annotation.xml file is unique only for the account and not for the CSE. All CSEs for one account are contained in one annotation file
    • Created a new annotation called ALL_annotations.xml
    • fixed a few things in Andy’s file
    • Reading in everything. Now to produce the new sets of lists.
    • I think it’s just easier to delete all the lists and start over.
    • Done and verified. You run UrlChecker from the command line, with the input file being a list of domains (one per line) and the ALL_annotations.xml file.
  • https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2
  • Need to add a Delete or Hide button to reduce down a large corpus to a more effective size.
  • Added. Tomorrow I’ll wire up the deletion of a row or cilumn and the recreation of the initialMatrix

Phil 5.30.16

7:00 – 10:00 Thesis/VTX

  • Built a new matrix for the coded lit review. I had coded a couple of more papers
  • Working on copying over the read papers into a new folder that I can run text analytics over
  • After carefully reading through the doc manager list and copying over each paper, I just discovered I could have exported selected.
  • Ooops: Exception in thread “JavaFX Application Thread” java.lang.IllegalArgumentException: Invalid column index (16384).  Allowable column range for EXCEL2007 is (0..16383) or (‘A’..’XFD’)
    • Going to add a limit of
      SpreadsheetVersion.EXCEL2007.getMaxColumns()-8

      columns for now. Clearly that can be cut down.

    • Figuring out where to cut the terms. I’m summing the columns of the LSI calculation, starting at the highest value and then dividing that by the sum of all values. The top 20% of rank weights gives 280 columns. Going to try that first
    • Success! Some initial thoughts
      • The coded version is much more ‘crisp’
      • There are interesting hints in the LSI version
      • Clicking on a term or paper to see the associated items is really nice.
      • I think that document subgroups might be good/better, and it might be possible to use the tool to help build those subgroups. This goes back to the ‘hiding’ concept. (hide item / hide item and associated)

Phil 5.27.16

7:00 – 2:00 VTX

  • Wound up writing the introduction and saving the old intro to a new document – Themesurfing
  • Renamed the current document
  • Got the parser working. Old artifact settings.
  • Added some tweaks to show progress better. I’m kinda stuck with the single thread in JavaFx having to execute before text can get shown.
  • Need an XML parser to find out what sites have already been added. Added an IntelliJ project to the GoogleCseConfigFiles SVN file. Should be able to finish it on Tuesday.

Phil 5.17.16

7:00 -7:00

  • Great discussion with Greg yesterday. Very encouraging.
  • Some thoughts that came up during Fahad’s (Successful!) defense
    • It should be possible to determine the ‘deletable’ codes at the bottom of the ranking by setting the allowable difference between the initial ranking and the trimmed rank.
    • The ‘filter’ box should also be set by clicking on one of the items in the list of associations for the selected items. This way, selection is a two-step process in this context.
    • Suggesting grouping of terms based on connectivity? Maybe second degree? Allows for domain independence?
    • Using a 3D display to show the shared second, third and nth degree as different layer
    • NLP tagged words for TF-IDF to produce a more characterized matrix?
    • 50 samples per iteration, 2,000 iterations? Check! And add info to spreadsheet! Done, and it’s 1,000 iterations
  • Writing
  • Parsing Jeremy’s JSON file
    • Moving the OptionalContent and JsonLoadable over to JavaJtils2
    • Adding javax.persistence-2.1.0
    • Adding json-simple-1.1.1
    • It worked, but it’s junk. It looks like these are un-curated pages
  • Long discussion with Aaron about calculating flag rollups.

Phil 5.11.16

7:00 – 4:30 VTX

  • Continuing paper – working on the ‘motivations’ section
  • Need to set the mode to interactive after a successful load
  • Need to find out where the JSON ratings are in the medicalpractitioner db? Or just rely on Jeremy’s interface? I guess it depends on what gets blown away. But it doesn’t seem like the JSON is in the db.
  • Added a stanfordNLP package to JavaUtils
    • NLPtoken stores all the extracted information about a token (word, lemma, index, POS, etc)
    • DocumentStatistics holds token data across one or more documents
    • StringAnnotator parses strings into NLPtokens.
  • Fixed a bunch of math issues (in Excel, too), but here are the two versions;
    am = 1.969
    be = 2.523
    da = 0.984
    do = 1.892
    i = 1.761
    is = 1.130
    it = 1.130
    let = 1.130
    not = 1.380
    or = 3.523
    thfor = 1.380
    think = 1.380
    to = 1.469
    what = 1.380

    And Excel:

     da	is	 it	 let	 not	 thfor	 think	 what	 to	 i	 do	 am	 be	 or
    0.984	1.130	1.130	1.130	1.380	1.380	1.380	1.380	1.469	1.761	1.892	1.969	2.523	3.523
    

Phil 5.9.16

7:00 – 4:00 VTX

  • Started the paper describing the slider interface
  • TF-IDF today!
    • Read docs from web and PDF
    • Calculate the rank
    • Create matrix of terms and documents, weighted by occurrence.
  • Hmm. What I’m actually looking for is the lowest-occurring terms within a document that occur over the largest number of documents. I’ve used this page as a starting point. After flailing for many hours in java, I wound up walking through the algorithm in Excel and I think I’ve got it. This is the spreadsheet that embodies my delusional thinking ATM.

Phil 5.6.16

7:00 – 4:00 VTX

  • Today’s shower thought is to compare the variance of the difference of two (unitized) rank matrices. The maximum difference would be (matrix size), so we do have a scale. If we assume a binomial distribution (there are many ways to be slightly different, only two ways to be completely different), then we can use a binomial (one tailed?) distribution centered on zero and ending at (matrix size). That should mean that I can see how far one item is from the other? But it will be withing the context of a larger distribution (all zeros vs all ones)…
  • Before going down that rabbit hole, I decided to use the bootstrap method just to see if the concept works. It looks mostly good.
    • Verified that scaling a low-ranked item (ACLED) by 10 has less impact than scaling the highest ranking item (P61) by 1.28.
    • Set the stats text to red if it’s outside 1 SD and green if it’s within.
    • I think the terms can be played around with more because the top one (Pertinence) gets ranked at .436, while P61 has a rank of 1.
    • There are some weird issues with the way the matrix recalculates. Some states are statistically similar to others. I think I can do something with the thoughts above, but later.
  • There seems to be a bug calculating the current mean when compared to the unit mean. It may be that the values are so small? It’s occasional….
  • Got the ‘top’ button working.
  • And that’s it for the week…

LMT With Data2

Oh yeah – Everything You Ever Wanted To Know About Motorcycle Safety Gear

Phil 5.3.16

7:00 – 3:30 VTX

  • Out riding, I realized that I could have a column called ‘counts’ that would add up the total number of ‘terms per document’ and ‘documents per terms ‘. Unitizing the values would then show the number of unique terms per document. That’s useful, I think.
  • Helena pointed to an interesting CHI 2016 site. This is sort of the other side of extracting pertinence from relevant data. I wonder where they got their data from?
    • Found it!. It’s in a public set of Google docs, in XML and JSON formats. I found it by looking at the GitHub home page. In the example code  there was this structure:
      source: {
          gdocId: '0Ai6LdDWgaqgNdG1WX29BanYzRHU4VHpDUTNPX3JLaUE',
          tables: "Presidents"
        }

      That gave me a hint of what to look for in the document source of the demo, where I found this:

      var urlBase = 'https://ca480fa8cd553f048c65766cc0d0f07f93f6fe2f.googledrive.com/host/0By6LdDWgaqgNfmpDajZMdHMtU3FWTEkzZW9LTndWdFg0Qk9MNzd0ZW9mcjA4aUJlV0p1Zk0/CHI2016/';
      

      And that’s the link from above.

    • There appear to be other useful data sets as well. For example, there is an extensive CHI paper database sitting behind this demo.
    • So this makes generalizing the PageRank approach much more simple since it looks like I can pull the data down pretty simply. In my case I think the best thing would be to write small apps that pull down the data and build Excel spreadsheets that are read in by the tool for now.
  • Exporting a new data set from Atlas. Done and committed. I need to do runs before meeting with Wayne.
  • Added Counts in and refactored a bit.
  • I think I want a list of what a doc or term is directly linked to and the number of references. Addid the basics. Wiring up next. Done! But now I want to click on an item in the counts list and have it be selected? Or at least highlighted?
  • Stored the new version on dropbox: https://www.dropbox.com/s/92err4z2posuaa1/LMN.zip?dl=0
  • Meeting with Wayne
    • There’s some bug with counts. Add it to the WeightedItem.toString() and test.
    • Add a ‘move to top’ button near the weight slider that adds just enough weight to move the item to the top of the list. This could be iterative?
    • Add code that compares the population of ranks with the population of scaled ranks. Maybe bootstrapping? Apache Commons Math has KolmogorovSmirnovTest, which has public double kolmogorovSmirnovTest(double[] x, double[] y, boolean strict), which looks promising.
  • Added ability to log out of the rating app.

Phil 4.29.16

7:00 – 5:00 VTX

  • Expense reports and timesheets! Done.
  • Continuing Informed Citizenship in a Media-Centric Way of Life
    • The pertinence interface may be an example of a UI affording the concept of monitorial citizenship.
      • Page 219: The monitorial citizen, in Schudson’s (1998) view, does environmental surveillance rather than gathering in-depth information. By implication, citizens have social awareness that spans vast territory without having in-depth understanding of specific topics. Related to the idea of monitorial instead of informed citizenship, Pew Center (2008) data identified an emerging group of young (18–34) mobile media users called news grazers. These grazers ind what they need by switching across media platforms rather than waiting for content to be served.
    • Page 222: Risk as Feelings. The abstract is below. There is an emotional hacking aspect here that traditional journalism has used (heuristically?) for most(?) of its history.
      • Virtually all current theories of choice under risk or uncertainty are cognitive and consequentialist. They assume that people assess the desirability and likelihood of possible outcomes of choice alternatives and integrate this information through some type of expectation-based calculus to arrive at a decision. The authors propose an alternative theoretical perspective, the risk-as-feelings hypothesis, that highlights the role of affect experienced at the moment of decision making. Drawing on research from clinical, physiological, and other subfields of psychology, they show that emotional reactions to risky situations often diverge from cognitive assessments of those risks. When such divergence occurs, emotional reactions often drive behavior. The risk-as-feelings hypothesis is shown to explain a wide range of phenomena that have resisted interpretation in cognitive–consequentialist terms.
    • At page 223 – Elections as the canon of participation

  • Working on getting tables to sort – Done

  • Loading excel file -done
  • Calculating – done
  • Using weights -done
  • Reset weights – done
  • Saving (don’t forget to add sheet with variables!) – done
  • Wrapped in executable – done
  • Uploading to dropbox. Wow – the files with JavaFX are *much* bigger than Swing.

Phil 4.28.16

7:00 – 5:00 VTX

  • Reading Informed Citizenship in a Media-Centric Way of Life
    • Jessica Gall Myrick
    • This is a bit out of the concentration of the thesis, but it addresses several themes that relate to system and social trust. And I’m thinking that behind these themes of social vs. system is the Designer’s Social Trust of the user. Think of it this way: If the designer has a high Social Trust intention with respect to the benevolence of the users, then a more ‘human’ interactive site may result with more opportunities for the user to see more deeply into the system and contribute more meaningfully. There is risks in this, such as hellish comment sections, but also rewards (see the YouTube comments section for The Idea Channel episodes). If the designer has a System Trust intention with respect to say, the reliability of the user watching ads, then different systems get designed that learns to generate click-bait using neural networks such as clickotron). Or, closer to home, Instagram might decide to curate a feed for you without affordances to support changing of feed options. The truism goes ‘If you’re not paying, then you’re the product’. And products aren’t people. Products are systems.
    • Page 218: Graber (2001) argues that researchers oten treat the information value of images as a subsidiary to verbal information, rather than having value themselves. Slowly, studies employing visual measures and examining how images facilitate knowledge gain are emerging (Grabe, Bas, & van Driel, 2015; Graber, 2001; Prior, 2014). In a burgeoning media age with citizens who overwhelmingly favor (audio)visually distributed information, research momentum on the role of visual modalities in shaping informed citizenship is needed. Paired with it, reconsideration of the written word as the preeminent conduit of information and rational thought are necessary.
      • The rise of infographics  makes me believe that it’s not image and video per se, but clear information with low cognitive load.
  • ————————–
  • Bob had a little trouble with inappropriate and unclear identity, as well as education, info and other
  • Got tables working for terms and docs.
  • Got callbacks working from table clicks
  • Couldn’t get the table to display. Had to use this ugly hack.
  • Realized that I need name, weight and eigenval. Sorting is by eigenval. Weight is the multiplier of the weights in a row or column associated with a term or document. Mostly done.

Phil 4.22.16

7:00 – 4:30 VTX

  • Had a thought going to sleep last night that it would be interesting to see the difference between a ‘naive’ ranking based on the number of quotes vs. PageRank. Pretty much as soon as I got up, I pulled down the spreadsheet and got the lists. It’s in the previous post, but I’ll pot them here too:
    • Sorted from most to least quotes
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      P 5: Saracevic_relevance_75.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      P77: The Law of Group Polarization.pdf
      P43: On the Accuracy of Media-based Conflict Event Data.pdf
      System Trust
      P37: Security-control methods for statistical databases – a comparative study.pdf
    • Sorted on Page Rank eigenvector
      P85: Technology Humanness and Trust-Rethinking Trust in Technology.pdf
      System Trust
      Social Trust
      P61: A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web.pdf
      P84: What is Trust_ A Conceptual Analysis–AMCIS-2000.pdf
      P 1: Social Media and Trust during the Gezi Protests in Turkey.pdf
      Credibility Cues
      P13: The Egyptian Blogosphere.pdf
      P10: Sensing_And_Shaping_Emerging_Conflicts.pdf
      P82: The ‘like me’ framework for recognizing and becoming an intentional agent.pdf
  • To me it’s really interesting how much better the codes are mixed in to the results. I actually thought it could be the other way, since the codes are common across many papers. Also, the concepts of System Trust, Social Trust and Credibility  Cues very much became a central point in my mind as I worked through the papers.
  • A second thought, which is the next step in the research, is to see ho weighting affects relationships. Right now, the the papers and codes are weighted by the number of quotes. What happens when all the weights are normalized (set to 1.0)?. And then there is the setup of the interactivity. With zero optimizations, this took 4.2 seconds to calculate on a modern laptop. Not sliderbar rates, but change a (some?) values and click a ‘run’ button.
  • So, moving forward, the next steps are to create the Swing App that will:
    • read in a spreadsheet (xls and xlsx)
    • Write out spreadsheets (page containing the data information
      • File
      • User
      • Date run
      • Settings used
    • allow for manipulation of row and column values (in this case, papers and codes, but the possibilities are endless)
      • Select the value to manipulate (reset should be an option)
      • Spinner/entry field to set changes (original value in label)
      • ‘Calculate’ button
      • Sorted list(s) of rows and columns. (indicate +/- change in rank)
    • Reset all button
    • Normalize all button
  • I’d like to do something with the connectivity graph. Not sure what yet.
  • And I think I’ll do this in JavaFX rather than Swing this time.
  • Huh. JavaFX Scene Builder is no longer supported by Oracle. Now it’s a Gluon project.
  • Documentation still seems to be at Oracle though
  • Spent most of the day seeing what’s going on with the Crawl. Turns out it was bad formatting on the terms?

Phil 4.20.16

7:00 – 4:00 VTX

  • Read a little more of the BitCoin article. Nice description of the blockchain.
  • Generated a fresh Excel matrix of codes and papers. I excluded the ‘meta’ codes (Definitions, Methods, etc) and all the papers that have zero quotes
  • Duke Ellington & His Orchestra make great coding music.
  • Installed new Java
  • Well drat. I can’t increase the size of an existing matrix. Will have to return a new one.
  • Yay!
    FileUtils.getInputFileName() opening: mat1.xls
    initial
     , col1, col2, col3, col4, 
    row1, 11.0, 21.0, 31.0, 41.0, 
    row2, 12.0, 22.0, 32.0, 42.0, 
    row3, 13.0, 23.0, 33.0, 43.0, 
    done calculating
    done creating
    derived
     , row1, row2, row3, col1, col2, col3, col4, 
    row1, 0.0, 0.0, 0.0, 11.0, 21.0, 31.0, 41.0, 
    row2, 0.0, 0.0, 0.0, 12.0, 22.0, 32.0, 42.0, 
    row3, 0.0, 0.0, 0.0, 13.0, 23.0, 33.0, 43.0, 
    col1, 11.0, 12.0, 13.0, 0.0, 0.0, 0.0, 0.0, 
    col2, 21.0, 22.0, 23.0, 0.0, 0.0, 0.0, 0.0, 
    col3, 31.0, 32.0, 33.0, 0.0, 0.0, 0.0, 0.0, 
    col4, 41.0, 42.0, 43.0, 0.0, 0.0, 0.0, 0.0,
  • Need to set Identity and other housekeeping.
  • Added ‘normalizeByMatrix that sets the entire matrix on a unit scale
  • Need to have a calcRank function that squares and normalizes until the difference between output eigenvectors are below a certain threshold or a limit of iterations. Done?

Phil 4.14.16

7:00 – 3:30 VTX

  • Continuing The ‘like me’ framework for recognizing and becoming an intentional agent
  • Page 2: Perception influences production, and production influences perception, with substantial implications for social cognition.
    • This must be a foundational element of Social Trust. I see you do a thing. I imitate the thing. I feel (not think!) that it is the same thing. I do a thing. You imitate the thing. Think peekaboo. We establish a rapport. This is different from System Trust, where I put something somewhere and it’s still there. System trust may be derived fundamentally from Object Permanence, while Social Trust comes from imitation?
    • This is(?) tied to motor neurons. From Mirror neurons: Enigma of the metaphysical modular brainEssentially, mirror neurons respond to actions that we observe in others. The interesting part is that mirror neurons fire in the same way when we actually recreate that action ourselves.
      • Implications for design? Journalism is definitely built around the ‘like me’ concept that it is built around stories. IR is much less so, and is more data focused.
    • At section 3 – Experiment 1: learning tool-use by observing others
      • We have Social Trust first. Then we learn to use tools. Tools are different from, though related to the environment. They are not ‘like me’, but they extend me (Heidegger again). More later.
  • Page 3: For example, there is an intimate relation between striving to achieve a goal and a concomitant facial expression and effortful bodily acts.
    • This is like the boot loader or initial dictionary entry. Hard-wired common vocabulary.
  • Page 3: Humans, including preverbal infants, imbue the acts of others with felt meaning not solely (or at first) through a formal process of step-by-step reasoning, but because the other is processed as ‘like me.’ This is underwritten by the way humans represent action—the supramodal action code—and self experience
    • So is there a ‘more like me’ and ‘less like me’?
  • Meeting with Wayne this evening
    • Go over notes
    • Coding session
  • ——————
  • Check to see that reports are being made correctly
    • Fix “Get all rated” Numerous issues, including strings with commas
    • Fix “Get Match Counts” all zeros
    • Fix “Get No Match Counts” redundent
    • Change “Get Blacklist (CSV)” to “Black/White list (CSV)
    • Add “Get Whitelist (Google CSE)
    • Change the Sets in getBlack/Whitelist to use maps rather than sets so blacklist culling can be used with more informative rows.
  • Update remote DB and test a few pages. Ran into a problem with LONGTEXT and Postgress. Went back to TEXT
  • Went over Aaron’s ASB slides a couple of times. Introduced him to Partial Least Squares Structural Equation Modeling (PLS-SEM).
  • Present new system to Andy, Margarita and John. Tomorrow…

Phil 4.5.16

7:00 – 4:30 VTX

  • Had a good discussion with Patrick yesterday. He’s approaching his wheelchair work from a Heideggerian framework, where the controls may be present-at-hand or ready-to-hand. I think those might be frameworks that apply to non-social systems (Hammers, Excel, Search), while social systems more align with being-with. The evaluation of trustworthiness is different. True in a non-social sense is a property of exactness; a straightedge may be true or out-of-true. In a social sense, true is associated with a statement that is in accordance with reality.
  • While reading Search Engine Agendas  in Communications of the ACM, I came upon a mention of Frank Pasquale, who wrote an article on the regulation of Search, given its impact (Federal Search Commission? Access, Fairness, and Accountability in the Law of Search). The point of Search Engine Agendas is that the ranking of political candidates affects people’s perception of them (higher is better) This ties into my thoughts from March 29th. That there are situations where the idea of ordering among pertinent documents may be problematic and further that how users might interact with the ordering process might be instructive.
  • Continuing Technology, Humanness, and Trust: Rethinking Trust in Technology.
  • ————————
  • Added the sites Andy and Margarita found to the blacklist and updated the repo
  • Theresa has some sites too – in process.
  • Finished my refactoring party – more debugging than I was expecting
  • Converted the Excela spreadsheet to JSON and read the whole thing in. Need to do that just for a subsample now.
  • Added a request from Andy about creating a JSON object for the comments in the flag dismissal field.
  • Worked with Gregg about setting up the postgres db.