Monthly Archives: April 2017

Phil 4.28.17

7:00 – 8:00 Research

8:30 – 4:30 BRC

  • Working on finding an exit condition for the subdivision surface
  • I’m currently calculating the corners of a constricting rectangle that contracts towards the best point. Each iteration is saved, and I’m working on visualizing that surface, but my brain has shut down, and I can do simple math anymore.
  • Had a thought for Aaron about how to visualize his dimension reduction. Turns out to do well.

Aaron 4.27.17

  • Cycling
    • Got a late start in the office today, so as soon as I get in I got my gear on for a brain cleaning ride. Pushed really hard today, and combined with some nice weather and low traffic hit my first 16+ average MPH door-to-door. Landed a 16.4 mph average, and felt really proud of it.
  • Focus today was on learning some more about Manifold learning and its applications for reduction of high dimensional data for unsupervised learning.
    • SciKit includes some great documentation and resources including a working sample comparing various Manifold learning techniques against test data sets.
    • My goal now is to take the sorted code from yesterday and compare the manifold learning examples against the clustered output of the unreduced data. Once I have a benchmark set up I can do the same for the sample live data.
    • The output of the SciKit examples in MatPlotLib is really attractive as well.manifold_learning_sample

Phil 4.27.17

7:00 – 9:00 Research

  • Some more echo chamber flocking: Iran Deal Is More Popular Than Ever, Poll Shows 170426_iran-1Republicans registered the biggest uptick in support for the deal, which has been heavily criticized by GOP lawmakers since its inception in July 2015: 53 percent of Republican voters said they supported it, compared with 37 percent who backed it last summer and just 10 percent who supported it shortly after it was announced. Democratic support for the deal has been largely unchanged since August, and a larger share of independents are getting on board, from 41 percent in August to 48 percent now.
  • Finishing corrections to paper
  • This really is my phase I research question: If ‘laws of motion’ can indeed be ascribed to behavior, we should be able to model the effects of those laws. The question them becomes what form do these models take? Also, how do we detect these behaviors with domain independence and at scale?
  • Submitted!
  • The Relevance of Hannah Arendt’s Reflections on Evil:Globalization and Rightlessness

BRC 9:30 5:00

  • Continuing Subdivision surfacing
  • didn’t like the documentation on sortedcollections. going to try panda Series
  • Allowable options in an arg:
    parser.add_argument("--algorithm", type=str, choices=['naive', 'subdivision', 'genetic'], default='naive', help="hill climbing algorithm")

    Note that range(), which returns a list should also work

  • And here’s how you get the key/values from a pandas Series:
    print("calc_subdivision_fitness_landsape(): key = {0}, val = {1}".format(fitness.index[0], fitness.values[0]))
  • Looks like it’s working. I think I should be using the average of the 4 fitnesses to decide if I’m done
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    calc_subdivision_fitness_landsape(): fitness = 
    1    10.0
    0     7.0
    3     6.0
    2     6.0
    dtype: float64
    done calc_subdivision_fitness_landsape

Phil 4.26.17

7:00 – 8:30 Research

  • Proofreading and tweaking the CSCW paper.
  • Finished the paper edit. Started to roll in the changes
  • Made a 10D chart of the explorer probability distribution. I think it tells the story better:
  •  ExplorerPDF
  • Had to install a dictionary in TexStudio. This helped a lot.
  • Started rolling in the changes to the tex file

BRC 9:00 – 4:30

  • Looks like the sort changes to the code haven’t been pushed yet
  • Starting on subdivision surfacing
    def calc_subdivision_fitness_landsape(self, eps_step: float, min_cluster: int) -> pandas.DataFrame:
        # create the four extreme corners. These will work their way in
        # calculate halfway points
        # keep the square with the greatest (single? average?) value
        # repeat until an epsilon, max value, or max iterations are reached
        # construct a sparse matrix with spacing equal to the smallest spacing
        # fill in the values that have been calculated
        # build a dataframe and return it for visualization
  • I need to sort a dict, so I’m trying SortedContainers.
  • Then things went off the hails a bit, and I wrote a haiku program as a haiku that prints itself:
    def haiku(sequence):
        this_is_not_needed = ""
        return "".join(sequence)
    if __name__ == "__main__":
        f = open('')

Aaron 4.25.17

  • Wasted a ton of time today tracking down progress of integration of additional teams into our program.
  • Spent a couple of hours tackling a poster presentation to be delivered at a technical leadership summit next week. I’ll be presenting the “Advanced Analytics” presentation and discussing all of our tools, and capabilities. Phil helped a lot, and I ended up quite pleased with the results. One of the nice things is we were able to include screenshots of actual tools and graphs of the data we’re using. I think this will be a nice difference from the rest of the presenters.
  • Did some good pair programming with Phil on the Pandas DataFrame.sort issue, moved to the non-deprecated version of DataFrame.sort_values and got it working correctly at all matrix sizes.

Phil 4.25.17

7:00 – 8:30 Research

  • Wikipedia founder Jimmy Wales launches Wikitribune, a large-scale attempt to combat fake news
  • Listening to the BBC Business Daily on Machine Learning. They had an interview with Joanna J Bryson (Scholar). She has an approach for explaining the behavior of AI that seems to involve simulation? Here are some papers that look interesting:
    • Behavior Oriented Design (MIT Dissertation: Intelligence by Design: Principles of Modularity and Coordination for Engineering Complex Adaptive Agents)
    • Learning from Play: Facilitating character design through genetic programming and human mimicry
      • Mimicry and play are fundamental learning processes by which individuals can acquire behaviours, skills and norms. In this paper we utilise these two processes to create new game characters by mimicking and learning from actual human players. We present our approach towards aiding the design process of game characters through the use of genetic programming. The current state of the art in game character design relies heavily on human designers to manually create and edit scripts and rules for game characters. Computational creativity approaches this issue with fully autonomous character generators, replacing most of the design process using black box solutions such as neural networks. Our GP approach to this problem not only mimics actual human play but creates character controllers which can be further authored and developed by a designer. This keeps the designer in the loop while reducing repetitive labour. Our system also provides insights into how players express themselves in games and into deriving appropriate models for representing those insights. We present our framework and preliminary results supporting our claim.
    • Replicators, Lineages and Interactors: One page note on cultural evolution
      • If we adopt the other option and refer to culture itself is the lineage, then the culture itself can evolve since the replicators are the ideas and practices that exist within that culture. However, if it is the culture that is the lineage, we cannot say that it evolves when it takes more territory, in the same way that a species does not evolve with more individuals. Adaptation is presently understood to be about changes in the frequency of replicators, not about absolute numbers of interactors. In sum, cultural evolution (changes of practices within a group) is necessarily a separate process from cultural group selection (changes of the frequency of group-types at a specific location).
    • The behavior-oriented design of modular agent intelligence
    • Should probably cite some of these and a reference to Behavior-Oriented Design in the conclusions section of the paper
  • Continuing Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter
    • We collected data using the Twitter Streaming API, tracking on the following terms (shooter, shooting, gunman, gunmen, gunshot, gunshots, shooters, gun shot, gun shots, shootings) for a ten-month period between January 1 and October 5, 2016. This collection resulted in 58M total tweets. We then scoped that data to include only tweets related to alternative narratives of the event—false flag, falseflag, crisis actor, crisisactor, staged, hoax and “1488”.
      • These keywords specify a ‘primary information space’. Bag-of-words of text correlated with each term could make this a linear axis
    • Of 15,150 users who sent at least one tweet with a link, only 1372 sent (over the course to the collection period) tweets citing more than one domain.
      • This is the difference between implicit behaviors (clicking, reading, navigating) and explicit actions. Twitter monitors what people are willing to write
    • Interestingly, the two most influential Domains in Alternative Narrative Tweets Interesting, the two most highly tweeted domains were both associated with significant automated account or “bot” activity. The Real Strategy, an alternative news site with a conspiracy theory orientation, is the most tweeted domain in our dataset (by far). The temporal signature of tweets citing this domain reveals a consistent pattern of coordinated bursts of activity at regular intervals generated by 200 accounts that appear to be connected to each other (via following relationships) and coordinated through an external tool.
      • There is clearly a desire to have a greater effect through the use of bots. Two questions: 1) How does this work? 2) How did this emerge?
    • The InfoWars domain, an alternative news website that focuses on Alt-Right and conspiracy theory themes, was the second-most tweeted domain, but as (Figure 1) shows it was only tenuously connected to one other node.
      • Why? Is InforWars more polarized? Is it using something other than Twitter?
      • Infowars Inbound links
        Domain score Domain trust score Domain Backlinks IP Address Country First seen Last seen
        0 0 1857029 us 2015-09-28 2017-03-26
        4 4 1335835 us 2014-01-19 2017-03-21
        33 39 648958 us 2013-06-07 2017-03-25
        1 0 346153 us 2014-01-19 2017-03-21
        13 31 182060 cz 2013-06-07 2017-03-26
        12 30 151778 us 2016-06-27 2017-03-22
        1 0 92766 us 2014-11-14 2017-03-23
        4 29 49288 us 2015-02-04 2017-03-26
        14 30 47195 us 2014-10-02 2017-03-20
        1 0 43748 us 2016-06-08 2017-03-24

9:00 – 5:30 BRC

  • John is having trouble getting Linux running on the laptop
    • No luck. Re-submitting for an Alienware deskside
  • Back to getting the temporal coherence. last try to finish up, then switching to fitness landscape optimization, which I dreamed about last night
  • Finished coherence! Had to include a state check for a timeline to see if a DIRTY state had been touched with an update. If not, then the timeline is set to CLOSED. If a new cluster appears that would have had some overlap, a new timeline is created anyway. This could be an optional behavior.
    • Still need to test rigorously across multiple data sets
  • Long scrum, then ML meeting.
    • Hard tasks
      • TF server set up to work in our environment
      • Pre-calculated models to speed up training from research browser
      • T-SNE or other mapping of returned CSE text to support exploration
      • Fast, on-the-fly classification and entity extraction within the research browser framework. Plus interactive training
      • NMF (or other) topic extraction tied to human labeling and curation, plus cross-user validation of topics
  • Poster with Aaron later? Yep. Couple of hours. Done?
  • Oh, just why? Spent an hour on this before going brute force:
    def get_last_cluster(self) -> ClusterSample:
        # return self._cluster_dict[self._cluster_dict.keys()[-1] TODO: This should work
        toReturn = None
        for key in self._cluster_dict:
            toReturn = self._cluster_dict[key]
        return toReturn
  • Walked through some gradient descent regression code with Bob. More tomorrow?
  • Got the new sort working with Aaron. Much faster progress as a pair

Phil 4.24.17

7:00 – 8:00, 3:00 – 4:00  Research

  • Continuing to tweak paper
  • Starting Examining the Alternative Media Ecosystem through the Production of Alternative Narratives of Mass Shooting Events on Twitter
    • From the introduction. Do I need something like this? Our contributions include an increased understanding of the underlying nature of this subsection of alternative media — which hosts conspiratorial content and conducts various anti-globalist political agendas. Noting thematic convergence across domains, we theorize about how alternative media may contribute to conspiratorial thinking by creating a false perception of information diversity.
  • Conspiracy Theories
  • A cool thing on explorers and veloviewer: maxsquare Here’s an overview of the project
  • Brownbag
    • Teaching abstract concepts to children and tweens (STEM)
    • Cohesive understanding of science over time
    • Wearable technology as the gateway for elementary-school-aged kids? Research shows that they find them valuable
    • How are these attributes measured? <——-!!!!!!!
    • Live visualization sensing and visualization
    • Zephyr bioharness
    • Gender/age differences? Augmented reality? Through-phone?

8:30 – 2:30 BRC

  • Expense report! Done? Had to get a charge number, and re enter. Took forever.
  • Found out that I’m getting a laptop rather than what I asked for
    • Having John install Ubuntu and verify that multiple monitors work in Linux
  • Helped Bob set up Git repo
  • Still working on temporal coherence. Think I’ve figured out the logic. Now I need to set clusters in ClusterTimelines
  • Learned how to do Enums in Python

Phil 4.21.1

6:00 – 7:00 Research

8:30 – 5:00 BRC

  • Need to think about handling time, so we can see if people are getting better
  • All hands meeting
    • Transforming healthcare WRT identifying risks and anomalies for the purpose of reducing variance in care. From what to what?
  • 4 things:
    • Technology: get to and stay at the leading edge of what we’re marketing. Investment commitment (CCRi These guys? Alias, Commonwealth university)
    • Sales development
    • Partnering
    • Capital (direct raise from investors) Alignment at a capital level?
  • V2 & V3 timelines and capabilities
  • Sales and capital story
  • Discussion (2 hours)

Phil 4.20.17

7:00 – 8:00 Research

8:30 – 6:00, 7:00 – 10:00 BRC

  • Drove up to NJ
  • Still working on temporal coherence of clusters. Talked through it with Aaron, and we both believe it’s close
  • Another good discussion with Bob
  • BRC dinner meet-n-greet

Phil 4.19.17

7:00 – 8:00 Research

8:30 – 5:00 BRC

  • Have Aaron read abstract
  • Finishing up temporal coherence in clustering. Getting differences, now I have to figure out how to sort, and when to make a new cluster.
    timestamp = 10.07
    	t=10.07, id=0, members = ['ExploitSh_54', 'ExploitSh_65', 'ExploitSh_94', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_17', 'ExploreSh_2', 'ExploreSh_21', 'ExploreSh_24', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_35', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_43', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_8']
    	t=10.07, id=1, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_64', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_20', 'ExploreSh_23', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_9']
    	t=10.07, id=-1, members = ['ExploitSh_63', 'ExploitSh_68', 'ExploitSh_86', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    timestamp = 10.18
    	t=10.18, id=0, members = ['ExploitSh_50', 'ExploitSh_51', 'ExploitSh_52', 'ExploitSh_53', 'ExploitSh_55', 'ExploitSh_56', 'ExploitSh_57', 'ExploitSh_58', 'ExploitSh_59', 'ExploitSh_60', 'ExploitSh_61', 'ExploitSh_62', 'ExploitSh_63', 'ExploitSh_64', 'ExploitSh_65', 'ExploitSh_66', 'ExploitSh_67', 'ExploitSh_69', 'ExploitSh_70', 'ExploitSh_71', 'ExploitSh_72', 'ExploitSh_73', 'ExploitSh_74', 'ExploitSh_75', 'ExploitSh_76', 'ExploitSh_77', 'ExploitSh_78', 'ExploitSh_79', 'ExploitSh_80', 'ExploitSh_81', 'ExploitSh_82', 'ExploitSh_83', 'ExploitSh_84', 'ExploitSh_85', 'ExploitSh_86', 'ExploitSh_87', 'ExploitSh_88', 'ExploitSh_89', 'ExploitSh_90', 'ExploitSh_91', 'ExploitSh_92', 'ExploitSh_93', 'ExploitSh_94', 'ExploitSh_95', 'ExploitSh_96', 'ExploitSh_97', 'ExploitSh_99', 'ExploreSh_0', 'ExploreSh_1', 'ExploreSh_10', 'ExploreSh_11', 'ExploreSh_13', 'ExploreSh_14', 'ExploreSh_15', 'ExploreSh_16', 'ExploreSh_17', 'ExploreSh_18', 'ExploreSh_19', 'ExploreSh_2', 'ExploreSh_20', 'ExploreSh_21', 'ExploreSh_23', 'ExploreSh_24', 'ExploreSh_25', 'ExploreSh_26', 'ExploreSh_27', 'ExploreSh_28', 'ExploreSh_29', 'ExploreSh_3', 'ExploreSh_30', 'ExploreSh_31', 'ExploreSh_32', 'ExploreSh_33', 'ExploreSh_34', 'ExploreSh_35', 'ExploreSh_36', 'ExploreSh_37', 'ExploreSh_38', 'ExploreSh_4', 'ExploreSh_40', 'ExploreSh_41', 'ExploreSh_42', 'ExploreSh_43', 'ExploreSh_45', 'ExploreSh_46', 'ExploreSh_47', 'ExploreSh_48', 'ExploreSh_49', 'ExploreSh_5', 'ExploreSh_7', 'ExploreSh_8', 'ExploreSh_9']
    	t=10.18, id=-1, members = ['ExploitSh_54', 'ExploitSh_68', 'ExploitSh_98', 'ExploreSh_12', 'ExploreSh_22', 'ExploreSh_39', 'ExploreSh_44', 'ExploreSh_6']
    current[0] 32.43% similar to previous[0]
    current[0] 87.80% similar to previous[1]
    current[0] 3.96% similar to previous[-1]
    current[-1] 7.41% similar to previous[0]
    current[-1] 82.35% similar to previous[-1]

    In the above example, we originally have 3 clusters and then 2. The two that map are pretty straightforward: current[0] 87.80% similar to previous[1], and current[-1] 82.35% similar to previous[-1]. Not sure what to do about the group that fell away. I think there should be an increasing ID number for clusters, with the exception of [-1], which is unclustered. Once a cluster goes away, it can’t come back.

  • Long discussion with Bob and Aaron, basically coordinating and giving Bob a sense of where we are. That wound up being most of the day.

Phil 4.18.17

7:00 – 8:00, 4:00 – 5:00 Research

  • Redid the paper in the ACM format. I have to say, LaTex did make that pretty easy…
  • Got the gridded population data. Now I have to find a Python reader for arcGIS .asc files. This looks like it might work (ASCII to Raster)
  • Chat with Helena
    • Now in CSCW
    • “In the field of CSCW there are emergent trends”. Starbird & etc. Check mail
    • Abstract due on Thursday!

8:30 – 3:30 BRC

Phil 4.17.17

7:00 – 8:00, 3:00 – 4:00 Research

  • This looks good: Bayesian data analysis for newcomers
  • Also this; Seeing Theory
  • I want to do a map that is based on population vs geography:
    • Gridded Population of the world (Matplotlib to generate image?). Can’t get the data directly. Need to see if available via UMBC (or maybe GLOBE?)
    • Wilbur terrain generation (installed. Will accept an image as the heightmap source)
  • Tried using QT designer, but it can’t find the web plugin?
  • Installing Python 3.6 on my home dev box
  • Downloaded all the python code, and my simulation data. I want to be able to merge tables to produce networks that can then be plotted, so I think it’s mostly going to be installing things this morning
  • NOTE: When installing Python, the only way to install for all users it to go through the advanced setup.
  • Installing packages. CMD needs to run as admin, which blows.
  • After some brief issues with the IDE not being set in structure, got all the pieces that use numpy, pandas and matplotlib running. That should be enough for table parsing (although there will be the excel reading and writing installs), though I still need to get started with graph-tool
  • Paper was rejected – time to try ACM? LaTex format. Downloaded and compiled! Now I just have to move the text over? Wrap the existing text? That’s something for tomorrow.

8:30 – 2:30, BRC

  • Working on table joins. That was pretty straightforward. Note that for column collision you have to provide a suffix. Makes me think that I want to compare across DataFrames instead
    eu.read_dataframe_excel(args.excelfile, None)
    cluster_df = eu.read_dataframe_sheet("Cluster ID")
    #  print(cluster_df)
    dist_df = eu.read_dataframe_sheet("Distance from mean center")
    #  print(dist_df)
    merged_df = cluster_df.join(other=dist_df, lsuffix='_c', rsuffix='_d')
  • So now that I can read in and analyze sheets, what am I trying to do?I think that for each time slice, and by cluster, produce a sorted list from most to least common membership.


Phil 4.15.17

Thoughts about CSCW ‘Fake News’ class

A political system cannot last long if its appropriate principle is lacking. Montesquieu claims, for example, that the English failed to establish a republic after the Civil War (1642–1651) because the society lacked the requisite love of virtue.

Phil 4.14.17

7:00 – 8:00 Research

  • This Is Why Trump’s Conspiracy Theories Work, Say Experts
  • Good advice on writing papers
  • Setting up to get clustering data.
    • Explore/Exploit ratio 50/50
    • Explore will be in the flocking range: 0.1 – 1.6
    • Exploit will be in the echo chamber stage: 3.2 – 10.0
    • Cluster EPS of 0.25 gives good diversity
    • Create a DataFrame from the “Cluster ID” sheet
    • Ran the sims, but the default destination is wrong. Re-running
    • Discriminate between explorer and exploiter cluster membership over time
      • Clusters the agent belonged to
      • Will need to post-process to fix cluster switching. Probably taking all the cluster average positions and applying the a common ID if the difference between the samples is less than a given delta

8:30 – 3:30 BRC

  • Generating data for clustering
  • Will try to read in and merge the DataFrames so that I have position (angle or origin distance) to calculate group persistence
  • Human Motion Recognition Using Isomap and Dynamic Time Warping
  • Fixing DataFrames. Mostly this is a case of bad data handling. This is in the data:

    Which means that we have to handle ‘#<Geocoder::Result::Bing:0x007fe9a1b39718>,#<Geocoder::Result::Bing:0x007fe9a1b39150>. I am very disappointed in this book

  • Giving up. Wrote a review. Monday I’ll try doing joins on Dataframes. That being said, I learned a lot on how to check for errors.

Phil 4.13.17

7:00 – 8:00 Research

  • Reading the HCIC Boaster Poster description. Downloaded to HCIC 2017 folder
    • A “boaster-poster” is a poster that describes your most current research endeavor and/or interest. The idea is to foster dialogue about your topic of interest/research so you can meet like-minded HCIC 2017 attendees. Format for a “boaster-poster” is as follows: a short description of your perspective and interest in this area, plus a description of your work in form of a single page (8.3 × 11.7 inches) poster. Boaster-posters offer an opportunity to showcase the work of new and experienced authors alike. You can use images and text to frame and illustrate your ideas. A list with boaster-poster titles, authors & abstracts will be distributed at the conference, and the posters will be available for view at the HCIC conference. We strongly encourage all student attendees to submit a boaster to HCIC, as boaster authors will have opportunities across the conference to discuss their work with other attendees through a new interactive format for 2017.
      Boaster-poster deadline: June 2nd, 2017
      A pdf that includes:

      • A cover page with
        • Title, author(s) (indicate those available to chat at meeting)
        • At least three keywords
        • A 150 word abstract
      • A draft of your poster
    • So something like “sociophysics-informed design“? I’m thinking that if I can take agent cluster membership and use that to construct a social network graph, I could show something that looks like this: twitterdata1-01
    • Maybe use graph-tool Python library? polblogs_pr
    • Need to look at Zappos and McMaster websites as examples of explorational interfaces
    • Facebook’s guide to handling Fake News. High effort. I wonder what kind of feedback mechanisms there are?

8:30 – 6:00 BRC

  • Sprint planning
  • Doctor visit, 10:15 – 11:00
  • Discussion with Aaron about visualizing high-dimensional clusters in low-dimensional space for intuitive understanding
  • Working through Thoughtful Machine Learning. Very disappointed. The code in GitHub doesn’t match the book, doesn’t even have an entry point, and blows up in the init. Sad! Here’s the offending line (df is the read-in DataFrame):
    df = (df - df.mean()) / (df.max() - df.min())
  • Learning more about the pandas DataFrame here so maybe I can fix the above.
  • Actually, Skillport has useful stuff, but all the videos crash before the end
  • The problem is that the floating point values in the file are being read in as string values, and crashing the calculation. I’ve tried doing an apply function that changes the value but it doesn’t result in the type change. Going to try changing everything to float tomorrow.
  • Helped Aaron break down the tasking for this sprint’s efforts.