Author Archives: pgfeldman

Phil 4.21.1

6:00 – 7:00 Research

8:30 – 5:00 BRC

  • Need to think about handling time, so we can see if people are getting better
  • All hands meeting
    • Transforming healthcare WRT identifying risks and anomalies for the purpose of reducing variance in care. From what to what?
  • 4 things:
    • Technology: get to and stay at the leading edge of what we’re marketing. Investment commitment (CCRi These guys? Alias, Commonwealth university)
    • Sales development
    • Partnering
    • Capital (direct raise from investors) Alignment at a capital level?
  • V2 & V3 timelines and capabilities
  • Sales and capital story
  • Discussion (2 hours)

Phil 4.20.17

7:00 – 8:00 Research

8:30 – 6:00, 7:00 – 10:00 BRC

  • Drove up to NJ
  • Still working on temporal coherence of clusters. Talked through it with Aaron, and we both believe it’s close
  • Another good discussion with Bob
  • BRC dinner meet-n-greet

Phil 4.19.17

7:00 – 8:00 Research

8:30 – 5:00 BRC

  • Have Aaron read abstract
  • Finishing up temporal coherence in clustering. Getting differences, now I have to figure out how to sort, and when to make a new cluster.
    timestamp = 10.07
    	t=10.07, id=0, members = ['ExploitSh_54', 'ExploitSh_65', 'ExploitSh_94', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_17', 'ExploreSh_2', 'ExploreSh_21', 'ExploreSh_24', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_35', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_43', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_8']
    	t=10.07, id=1, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_64', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_20', 'ExploreSh_23', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_9']
    	t=10.07, id=-1, members = ['ExploitSh_63', 'ExploitSh_68', 'ExploitSh_86', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    
    timestamp = 10.18
    	t=10.18, id=0, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_63', 'ExploitSh_64', 'ExploitSh_65', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_86', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_94', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_17', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_2', 'ExploreSh_20', 'ExploreSh_21', 'ExploreSh_23', 'ExploreSh_24', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_35', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_43', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_8', 'ExploreSh_9']
    	t=10.18, id=-1, members = ['ExploitSh_54', 'ExploitSh_68', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    current[0] 32.43% similar to previous[0]
    current[0] 87.80% similar to previous[1]
    current[0] 3.96% similar to previous[-1]
    current[-1] 7.41% similar to previous[0]
    current[-1] 82.35% similar to previous[-1]

    In the above example, we originally have 3 clusters and then 2. The two that map are pretty straightforward: current[0] 87.80% similar to previous[1], and current[-1] 82.35% similar to previous[-1]. Not sure what to do about the group that fell away. I think there should be an increasing ID number for clusters, with the exception of [-1], which is unclustered. Once a cluster goes away, it can’t come back.

  • Long discussion with Bob and Aaron, basically coordinating and giving Bob a sense of where we are. That wound up being most of the day.

Phil 4.18.17

7:00 – 8:00, 4:00 – 5:00 Research

  • Redid the paper in the ACM format. I have to say, LaTex did make that pretty easy…
  • Got the gridded population data. Now I have to find a Python reader for arcGIS .asc files. This looks like it might work (ASCII to Raster)
  • Chat with Helena
    • Now in CSCW
    • “In the field of CSCW there are emergent trends”. Starbird & etc. Check mail
    • Abstract due on Thursday!

8:30 – 3:30 BRC

Phil 4.17.17

7:00 – 8:00, 3:00 – 4:00 Research

  • This looks good: Bayesian data analysis for newcomers
  • Also this; Seeing Theory
  • I want to do a map that is based on population vs geography:
    • Gridded Population of the world (Matplotlib to generate image?). Can’t get the data directly. Need to see if available via UMBC (or maybe GLOBE?)
    • Wilbur terrain generation (installed. Will accept an image as the heightmap source)
  • Tried using QT designer, but it can’t find the web plugin?
  • Installing Python 3.6 on my home dev box
  • Downloaded all the python code, and my simulation data. I want to be able to merge tables to produce networks that can then be plotted, so I think it’s mostly going to be installing things this morning
  • NOTE: When installing Python, the only way to install for all users it to go through the advanced setup.
  • Installing packages. CMD needs to run as admin, which blows.
  • After some brief issues with the IDE not being set in structure, got all the pieces that use numpy, pandas and matplotlib running. That should be enough for table parsing (although there will be the excel reading and writing installs), though I still need to get started with graph-tool
  • Paper was rejected – time to try ACM? LaTex format. Downloaded and compiled! Now I just have to move the text over? Wrap the existing text? That’s something for tomorrow.

8:30 – 2:30, BRC

  • Working on table joins. That was pretty straightforward. Note that for column collision you have to provide a suffix. Makes me think that I want to compare across DataFrames instead
    eu.read_dataframe_excel(args.excelfile, None)
    cluster_df = eu.read_dataframe_sheet("Cluster ID")
    print("cluster_df")
    #  print(cluster_df)
    dist_df = eu.read_dataframe_sheet("Distance from mean center")
    print("dist_df")
    #  print(dist_df)
    merged_df = cluster_df.join(other=dist_df, lsuffix='_c', rsuffix='_d')
    print("merged_df")
    print(merged_df)
  • So now that I can read in and analyze sheets, what am I trying to do?I think that for each time slice, and by cluster, produce a sorted list from most to least common membership.

 

Phil 4.15.17

Thoughts about CSCW ‘Fake News’ class

A political system cannot last long if its appropriate principle is lacking. Montesquieu claims, for example, that the English failed to establish a republic after the Civil War (1642–1651) because the society lacked the requisite love of virtue.

Phil 4.14.17

7:00 – 8:00 Research

  • This Is Why Trump’s Conspiracy Theories Work, Say Experts
  • Good advice on writing papers
  • Setting up to get clustering data.
    • Explore/Exploit ratio 50/50
    • Explore will be in the flocking range: 0.1 – 1.6
    • Exploit will be in the echo chamber stage: 3.2 – 10.0
    • Cluster EPS of 0.25 gives good diversity
    • Create a DataFrame from the “Cluster ID” sheet
    • Ran the sims, but the default destination is wrong. Re-running
    • Discriminate between explorer and exploiter cluster membership over time
      • Clusters the agent belonged to
      • Will need to post-process to fix cluster switching. Probably taking all the cluster average positions and applying the a common ID if the difference between the samples is less than a given delta

8:30 – 3:30 BRC

  • Generating data for clustering
  • Will try to read in and merge the DataFrames so that I have position (angle or origin distance) to calculate group persistence
  • Human Motion Recognition Using Isomap and Dynamic Time Warping
  • Fixing DataFrames. Mostly this is a case of bad data handling. This is in the data:
    359000.00,836,2,2,4,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,#<Geocoder::Result::Bing:0x007fe9a1b39718>,#<Geocoder::Result::Bing:0x007fe9a1b39150>

    Which means that we have to handle ‘#<Geocoder::Result::Bing:0x007fe9a1b39718>,#<Geocoder::Result::Bing:0x007fe9a1b39150>. I am very disappointed in this book

  • Giving up. Wrote a review. Monday I’ll try doing joins on Dataframes. That being said, I learned a lot on how to check for errors.

Phil 4.13.17

7:00 – 8:00 Research

  • Reading the HCIC Boaster Poster description. Downloaded to HCIC 2017 folder
    • A “boaster-poster” is a poster that describes your most current research endeavor and/or interest. The idea is to foster dialogue about your topic of interest/research so you can meet like-minded HCIC 2017 attendees. Format for a “boaster-poster” is as follows: a short description of your perspective and interest in this area, plus a description of your work in form of a single page (8.3 × 11.7 inches) poster. Boaster-posters offer an opportunity to showcase the work of new and experienced authors alike. You can use images and text to frame and illustrate your ideas. A list with boaster-poster titles, authors & abstracts will be distributed at the conference, and the posters will be available for view at the HCIC conference. We strongly encourage all student attendees to submit a boaster to HCIC, as boaster authors will have opportunities across the conference to discuss their work with other attendees through a new interactive format for 2017.
      Boaster-poster deadline: June 2nd, 2017
      A pdf that includes:

      • A cover page with
        • Title, author(s) (indicate those available to chat at meeting)
        • At least three keywords
        • A 150 word abstract
      • A draft of your poster
    • So something like “sociophysics-informed design“? I’m thinking that if I can take agent cluster membership and use that to construct a social network graph, I could show something that looks like this: twitterdata1-01
    • Maybe use graph-tool Python library? polblogs_pr
    • Need to look at Zappos and McMaster websites as examples of explorational interfaces
    • Facebook’s guide to handling Fake News. High effort. I wonder what kind of feedback mechanisms there are?

8:30 – 6:00 BRC

  • Sprint planning
  • Doctor visit, 10:15 – 11:00
  • Discussion with Aaron about visualizing high-dimensional clusters in low-dimensional space for intuitive understanding
  • Working through Thoughtful Machine Learning. Very disappointed. The code in GitHub doesn’t match the book, doesn’t even have an entry point, and blows up in the init. Sad! Here’s the offending line (df is the read-in DataFrame):
    df = (df - df.mean()) / (df.max() - df.min())
  • Learning more about the pandas DataFrame here so maybe I can fix the above.
  • Actually, Skillport has useful stuff, but all the videos crash before the end
  • The problem is that the floating point values in the file are being read in as string values, and crashing the calculation. I’ve tried doing an apply function that changes the value but it doesn’t result in the type change. Going to try changing everything to float tomorrow.
  • Helped Aaron break down the tasking for this sprint’s efforts.

 

Phil 4.12.17

7:00 – 8:00 Research

  • Just found the HCIC Boaster Poster description
  • Resistbot is kind of along the lines of what I was thinking about anonymous news input. And article on chatbot technology from CIO. One of the interesting platforms is ChatFuel, which has a non-programming (example-based) creation process. It’s really tied into FB. Not sure I want to do that without setting up a specific account.
  • Gupshup is another bot system that deploys to a lot of platforms (FB, twitter, slack, etc)
  • NLP/NLU services. Here’s Google’s documentation for NLP and Prediction, which seems related
  • Downloading the QT community IDE. Not sure if it has designer or not. Also, there was only the option to download versions 5.x, so the V4 options of pyqt may be problematic. Big install. Lots of documentation though it looks like.
  • Downloaded and running. Tutorials tomorrow

8:30 – 1:00 BRC

  • Adding stats dataframe – done. Interesting results in the test128 data: clustersVsUnclustered
  • Sprint grooming. And while that’s running, reading Thoughtful Machine Learning. Downloaded accompanying code from GitHub

Phil 4.11.17

7:00 – 8:00 Research

8:30 – 4:00 BRC

  • Running clustering on local machine.
  • Need to add a stats DataFrame that has eps, min_size, c.num_clusters, c.get_num_clustered, c.get_num_items. Write that out at the end. Tomorrow
  • Added reading of csv file and conversion to a DataFrame xlsx. Played around with that in Excel. Some nice results if I take the log of the value. Need to try clustering on that, but I want to add the stats output first integrity3
  • Jonker-Volgenant Algorithm + t-SNE = Super Powers. THis could be a way of producing a map view for the research browser

Phil 4.10.17

7:00 – 8:30, 3:00 – 6:00 Research

  • Continuing submission. Figuring out PhySH. That’s a pretty nice system!
  • Done!

    05Apr2017 es2017apr05_602 Physical Review E (Regular Article)
    Status: Submitted 10Apr2017-07:31 EDT
    Title: Modeling the Law of Group Polarization
    Authors: Feldman,Philip / Engel,Don

  • Starting on business cards

9:00 – 2:30 BRC

  • Sprint retrospective
  • I seem to have lost the cluster I ran last week. Rerunning. I thought there might be an error since there were no (-1), but it’s because the DataFrame has the values of the item replaced with the cluster ID +2. Which means that a row that has a clusterID of zero, it is completely empty, and +1 is unclustered.

Phil 4.8.17

8:00pm – 12:00pm, plus an hour on the 9th

  • Spent much time trying to figure out why the PDF wouldn’t print on the website from the LaTex file. As a result, I learned how to read the log file, fixed some image scaling issues, and learned to be VERY careful in checking the cases in the files. Also, as per here, I learned how to take the bib file and convert it to a bbl file, which can then be pasted into the tex file resulting in a single file for submission. No errors, and only minor quibbles on citations!

BRC

Phil 4.7.17

7:15 – 8:15 Research

  • So this happened: Dozens of U.S. Missiles Hit Air Base in Syria. Wonder how it will play out. Like Infinite Reach? That actually had more justification, since Americans were killed in the preceding events…
  • Direction matching for sparse movement datasets: determining interaction rules in social groups Data is here, in the very cool Movebank. Reminds me of GLOBE
  • Still working on getting paper submitted. Word is hanging, which is weird.
  • The APS journal submissions login page is here
  • Length Check
    Please be aware that this is only an estimate and meant to help avoid delays associated with excessive length.
    
    Journal/article type is not length constrained (4281)
    
     *** Word count calculation may be inaccurate for the following reasons ***
       * No acknowledgment environment found - acknowledgments may have been counted
       * No bibliography environment found - references may have been counted
    
      Figure   Aspect Ratio   Wide?   Word Equivalent
         1         1.33         No        132
         2         0.98         No        173
         3         0.93         No        181
         4         1.01         No        168
         5         1.75         No        105
         6         1.67         No        109
         7         2.99         No         70
         8         1.22         No        142
         9         0.95         No        177
        10         1.66         No        110
    
    WORD COUNT SUMMARY
        Note: Text word count excludes title, abstract, byline, PACS,
              receipt date, acknowledgments, and references.
    
                  Text word count   2914 
            Equations word equiv.      0 
              Figures word equiv.   1367
               Tables word equiv.      0
                                  ------
                           TOTAL    4281 (Maximum length is unlimited)
  • Getting PDF conversion errors. Not sure what to do next…
    WARNING: Supplemental file: ModelingGroupPolarizationNotes.bib.
    Checking figure order in 'ModelingGroupPolarization.tex'...
    PDF file creation failed:
    ERROR: Following figure files not called into TeX:
    DTWmatrix.png,toolScreenshot.png,ExplorerPDF.png
    AND TeX file calls in the following missing figure files:
    toolscreenshot.png,explorerPDF.png,DTWMatrix.png
    Please check and fix these file names.

8:30 – 2:30 BRC

  • Imperative programming in TensorFlow
  • Working on reading in the old integrity as a DataFrame from Excel
  • Had to install xlrd
  • TypeError: Image data can not convert to float. This pattern seems useful:
    mat = df.as_matrix()  # get the data matrix
    mat = mat.astype(np.float64)  # force it to float64
    df.update(mat)  # replace the 'float' mat with the 'float64' mat
  • Well, it was before, but now it won’t update the int64 to float 64. This worked though. When in doubt build a new mat
    mat = mat.astype(np.float64)  # force it to float64
    indices = df.index.values
    cols = df.columns.values
    df = pandas.DataFrame(mat, indices, cols)
  • Got the data in though. Interesting.This is the densest corner of the sorted matrix: integrity

Phil 4.6.17

8:30 – 5:30 BRC

  • Worked too late yesterday and slept in. Will try to get the submission issues worked on once the sprint review gets going
  • Added code that deletes previous file if it exists
  • refactored the fitness landscape test to be 100 eps increments by 10 cluster increments, starting at the min_cluster. Seems to produce more useful results

     

  • Sprint review at 9:30, 11:00, 1:00 Done! Went over my stuff too fast.
    • Need to figure out a way to cluster diagnosis codes
    • Discovered that the data is still the same integrity data from the last sprint. Need to write a read_dataframe(file_name: str) -> pandas.DataFrame: method to hdfs_csv_reader  tomorrow.

Phil 4.5.17

7:00 – 8:00 Research

  • Finishing up poster? CI_GP_Poster
  • Starting submission for Phys Rev E, based on my notes here.
    • Created account
    • Created ORCID account
    • Started submission. Errors!There were problems with the following fields:
      • AllExploit_SI0.0.psd: Classification description value must be selected for each file and cannot be left unknown.
      • 90_ExploitR10_10_ExploreR0.psd: Classification description value must be selected for each file and cannot be left unknown.
      • polarized.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • clusterMembership.png: Classification description value must be selected for each file and cannot be left unknown.
      • toolScreenshot.png: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization2col.pdf: Classification description value must be selected for each file and cannot be left unknown.
      • exploring.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization.pdf: Classification description value must be selected for each file and cannot be left unknown.
      • psheader.txt: Classification description value must be selected for each file and cannot be left unknown.
      • influenced.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • ExplorerPDF.png: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization.synctex.gz: Classification description value must be selected for each file and cannot be left unknown.
      • AllExploit_SI10.0.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarizationDraft.pdf: Classification description value must be selected for each file and cannot be left unknown.
      • flocking.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • DTWmatrix.png: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization.log: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization.dvi: Classification description value must be selected for each file and cannot be left unknown.
      • AllExploit_SI10.0.psd: Classification description value must be selected for each file and cannot be left unknown.
      • ModelingGroupPolarization.blg: Classification description value must be selected for each file and cannot be left unknown.
      • 90_ExploitR10_10_ExploreR0.jpg: Classification description value must be selected for each file and cannot be left unknown.
      • AllExploit_SI0.2.psd: Classification description value must be selected for each file and cannot be left unknown.
      • SimilarPaths.png: Classification description value must be selected for each file and cannot be left unknown.
      • RevTex41_example.aux: Classification description value must be selected for each file and cannot be left unknown.
      • label 0 is duplicated
      • ModelingGroupPolarization.aux: Classification description value must be selected for each file and cannot be left unknown.
      • revtex41_template.blg: Classification description value must be selected for each file and cannot be left unknown.
      • RevTex41_example.blg: Classification description value must be selected for each file and cannot be left unknown.
      • DTWclusters.png: Classification description value must be selected for each file and cannot be left unknown.
      • Main text file is required

9:00 – 5:00, 6:00, 7:30 BRC

  • Working on HDFS reading and writing
  • Integrating code
  • Compiled! Now waiting for things to blow up
  • Success! After fixing this:
    writer.write(u'%s' % cstr) # good
    writer.write('%s', cstr) # bad