Monthly Archives: March 2017

Phil 3.31.17

Keeping this for reference. From Wikipedia:

Principles that motivate citizen behaviour according to Montesquieu

Driving each classification of political system, according to Montesquieu, must be what he calls a “principle”. This principle acts as a spring or motor to motivate behavior on the part of the citizens in ways that will tend to support that regime and make it function smoothly.

  • For democratic republics (and to a somewhat lesser extent for aristocratic republics), this spring is the love of virtue—the willingness to put the interests of the community ahead of private interests.
  • For monarchies, the spring is the love of honor—the desire to attain greater rank and privilege.
  • Finally, for despotisms, the spring is the fear of the ruler.

A political system cannot last long if its appropriate principle is lacking. Montesquieu claims, for example, that the English failed to establish a republic after the Civil War (1642–1651) because the society lacked the requisite love of virtue.

7:00 – 8:00 Research

  • Some back-and forth with Wane on him attending the CI conference.
  • Nice talk with Don yesterday. Played with LaTex and discussed how, what and why to change behaviors of the particles going forward.
  • Started in the Illustrator version of the poster.
  • In Defense of Interactive Graphics

8:30 – 4:30 BRC

  • Working on
    • Write out sparse file queue – done
    • Read in queue – partial. Currently at line 60:
      print(filtered) # TODO: go and read all these files. Will need to be HADOOP-compatable later
    • Assemble DataFrame
    • By the way, this line of code in Python:
      filtered = [dir_name+"/"+name for name in all_files if file_prefix in name]
    • Is pretty much the same as this piece of C code. I guess some things have gotten better?
      char** buildFilteredList(char* dirName, char* filePrefix, char* allFiles){
      	// count the number of spaces
      	int i;
      	int filecount = 0; 
      	for (i = 0; i < strlen(allFiles); ++i){
      		if(allFiles[i] == ' ')
      	char *fileList[fileCount];
      	int numchars = strlen(filePrefix);
      	filecount = 0; 
      	for (i = 0; i < strlen(allFiles); ++i){
      		char* sptr = allFiles[i];
      		if(strncmp(filePrefix, sptr, numchars) == 0)
      			fileList[filecount++] = sptr;	
      	return fileList;
  • Long discussion with Aaron about ResearchBrowser design. We’re thinking about SPAs communicating through a lightweight chrome (and other(?)) plugin that mediates communication between tabs

Phil 3.30.17

7:00 – 8:00, 4:00 – 6:00

  • Looking more closely at Qt and PyQt. First, the integrated browser is nice. Second, if it’s possible to wireframe UIs in Qt and connect them to Python for matrix calculations and server interactions, then I have some real leverage.
  • Really good overview. Difference between 4 and 5, etc.
  • Lotsa python and machine learning videos. These look promising
  • Meeting with Don

8:30 – 3:30 BRC

  • Cleaning up reading code and adding argument handling
  • Need to add a row-reader to replace the slow_read_pbf and the slow_write_pbf methods. They also need to turn the row into a JSON object, and to have a separate method that will assemble a DataFrame from a set of JSON objects in memory
  • Nope, strike that. We’re now going to read CSV files containing sparse data and assemble them into a DataFrame. The format is:
  • The row id is quoted, otherwise key:value pairs are comma separated. Rows are terminated by CR/LF, and there can be multiple rows per file.
  • Python hadoop: Pydoop

Phil 3.29.17

7:00 – 8:30 Research

  • Starting to think seriously about the Research Browser. There are three parts that I want to leverage against each other
    1. I need to get good at Python
    2. I need a level of user-centered design that shows the interaction between analog controls and information. This is hard to do with words, paper design, or Wizard of Oz, so I want to have at least some GUI wrapper
    3. I want to start working with my school server, and get enough up that there is an application that connects to the back end and that has enough of a web wrapper that I can point to that in papers, business cards, etc.
  • If it’s going to be a thick client, then there needs to be a way of building an executable that does not require a Python install. And there is:
    • PyInstaller is a program that freezes (packages) Python programs into stand-alone executables, under Windows, Linux, Mac OS X, FreeBSD, Solaris and AIX. Its main advantages over similar tools are that PyInstaller works with Python 2.7 and 3.3—3.5, it builds smaller executables thanks to transparent compression, it is fully multi-platform, and use the OS support to load the dynamic libraries, thus ensuring full compatibility.
  • Now we have the option of thick or thin client. For thick client we have
    • Kivy – Open source Python library for rapid development of applications
      that make use of innovative user interfaces, such as multi-touch apps
    • PyQT, which is a set of Python wrappers for QT, which is huge UI. Here’s a whitepaper discussing the integration. Seems powerful, but maybe a lot of moving parts
    • PyGUI
      • Develop a GUI API that is designed specifically for Python, taking advantage of Python’s unique language features and working smoothly with Python’s data types.
      • Provide implementations of the API for the three major platforms (Unix, Macintosh and Windows) that are small and lightweight, interposing as little code as possible between the Python application and the platform’s underlying GUI facilities, and not bloating the Python installations or applications which use them.
      • Document the API purely in Python terms, so that the programmer does not need to read the documentation for another GUI library, in terms of another language, and translate into Python.
      • Get the library and its documentation included in the core Python distribution, so that truly cross-platform GUI applications may be written that will run on any Python installation, anywhere.
  • Thin Client:
    • Flexx is a pure Python toolkit for creating graphical user interfaces (GUI’s), that uses web technology for its rendering. Apps are written purely in Python; Flexx’ transpiler generates the necessary JavaScript on the fly.
    • Reahl lets you build a web application purely in Python, and in terms of useful objects that shield you from low-level web implementation issues.
    • Remi is a GUI library for Python applications which transpiles an application’s interface into HTML to be rendered in a web browser. This removes platform-specific dependencies and lets you easily develop cross-platform applications in Python!
      • screenshot
    • WDOM: GUI library for browser-based desktop applications
    • pyjs is a Rich Internet Application (RIA) Development Platform for both Web and Desktop. With pyjs you can write your JavaScript-powered web applications entirely in Python. pyjs contains a Python-to-JavaScript compiler, an AJAX framework and a Widget Set API. pyjs started life as a Python port of Google Web Toolkit, the Java-to-JavaScript compiler.
    • There are more frameworks here. There is some overlap with the above, but the list seems to include more obscure systems. The CEFBrowser embedded browser is pretty interesting and is a needed piece. This note indicates that calling the QT QWebEngineView class from Python (PyQT) is a known pattern.

9:00 – 5:00 BRC

  • Continuing to try and figure out how to assemble and write out a pandas.Dataframe as TFReccords. Now looking at this post
  • Success! Here’s the slow way:
    def slow_write_pbf(self, file_name: str) -> bool:
        df = self._data_frame
        row_label_array = np.array(df.index.values)
        col_label_array = np.array(df.columns.values)
        value_array = df.as_matrix()
        # print(row_label_array)
        # print(col_label_array)
        # print(value_array)
        writer = tf.python_io.TFRecordWriter(file_name)
        rows = value_array.shape[0]
        cols = value_array.shape[1]
        for row in range(rows):
            for col in range(cols):
                val = value_array[row, col]
                unit = {
                    'row_name': self.bytes_feature(str.encode(row_label_array[row])),
                    'col_name': self.bytes_feature(str.encode(col_label_array[col])),
                    'rows': self.int64_feature(rows),
                    'cols': self.int64_feature(cols),
                    'val': self.float_feature(val)
                cur_feature = tf.train.Features(feature=unit)
                example = tf.train.Example(features=cur_feature)
  • Here’s the fast way:
    def write_pbf(self, file_name: str) -> bool:
        df = self._data_frame
        df_str = df.to_csv()
        df_enc = str.encode(df_str)
        writer = tf.python_io.TFRecordWriter(file_name)
        unit = {
            'DataFrame': self.bytes_feature(df_enc)
        cur_feature = tf.train.Features(feature=unit)
        example = tf.train.Example(features=cur_feature)
  • Reading in data.
    def read_pbf(self, file_name: str) -> list[pandas.DataFrame]:
        features = {'DataFrame': tf.FixedLenFeature([1], tf.string)}
        data = []
        for s_example in tf.python_io.tf_record_iterator(file_name):
            example = tf.parse_single_example(s_example, features=features)
            data.append(tf.expand_dims(example['DataFrame'], 0))
        result = tf.concat(data,0)
        return result
  • Parsing the data requires running a TF Session
    # call the function that gets the data from the file
    df_graph_list = pbr.read_pbf(
    # start up a tf.Session to get the data from the graph and parse
    with tf.Session() as sess: #set up a TF Session and initialize
        df_list =[df_graph_list]) # get the data
        for df_element in df_list:
            ptr = df_element[0][0] # dereference. TODO: Understand all the levels
            dfPtr = StringIO(bytes.decode(ptr)) # make a pointer to the string so it can be treated as a file
            df = pandas.read_csv(dfPtr, index_col=0) # create the dataframe. It will have 'float' datatype which imshow chokes on
            mat = df.as_matrix() # get the data matrix
            mat = mat.astype(np.float64) # force it to float64
            df.update(mat) # replace the 'float' mat with the 'float64' mat
            if df.shape[0] < 20:
                if == True:

Phil 3.28.17

7:00 – 8:00 Research

  • Still working my way through The Origins of Totalitarianism. Arendt makes some really strong points:
    • Lawfulness sets limitations to actions, but does not inspire them; the greatness, but also the perplexity of laws in free societies is that they only tell what one should not, but never what one should do. Her point here is that laws provide the boundaries of acceptable belief space. Freedom (in a republic) is the ability to move unfettered within theses spaces, not outside of them. Totalitarianism eliminates the freedom to move, or to make decisions, either by the perpetrator or the victim.
    • It substitutes for the boundaries and channels of communication between individual men a band of iron which holds them so tightly together that it is as though their plurality had disappeared into One Man of gigantic dimensions. This is what I see in by simulations:echochambertest There is are a couple of issues that aren’t treated though – spontaneity and lifespan. In my simulations, spontaneity is approximated by the initial placement and orientation of the explorers. At the very least, I should see how the change to a random walk would affect supergroup formation. The second issue of lifespan is also important:
    • The laws hedge in each new beginning and at the same time assure its freedom of movement, the potentiality of something entirely new and unpredictable; the boundaries of positive laws are for the political existence of man what memory is for his historical existence: they guarantee the pre-existence of a common world, the reality of some continuity which transcends the individual life span of each generation, absorbs all new origins and is nourished by them. One way of looking at this is that the rules and the population affect each other. This makes intuitive sense – slavery used to be legal, and the changing ethics of the population changed the laws. These are not ether law or population, they are matters of emphasis, and I think these qualities can be traced in court decisions (particularly Supreme Court, since they get harder decisions). SCOTUS, in Dred Scott lagged behind the popular will, while in Miranda (and Row as discussed in quantitative terms here), probably led it. So lifespan, as expressed in demographics, plays a part in moving these boundaries of belief. This is certainly the case in a republic, and may also be the case in a more oppressive regime (e.g. student protests). And according to Arendt, this could be the greatest risk to totalitarianism.
    • The last point she makes is about ideology.  The word “ideology” seems to imply that an idea can become the subject matter of a science just as animals are the subject matter of zoology, and that the suffix -logy in ideology, as in zoology, indicates nothing but the logoi, the scientific statements made on it. The fact that ideology has at it’s code a set of assumptions about past, present and future history make it an organizational structure, or a framing narrative. And there is no reason to think that ideologies can’t have the same cascade effects as other stories.
  • So here’s the point. Laws are not the only kind of rules. Interfaces are as well. And the amount of freedom, as described above, means the allowable motion in belief space that the interface affords. Bad interfaces can literally be a tyranny or worse…

8:30 – 4:00 BRC

Phil 3.24.17

No research this morning. Had flu-like symptoms from midnight to 2:00 or so and slept in. Considering how bad I felt last night, I’m pleasantly surprised to feel good enough to get into work at 9:00.

I did take the Superpedestrian wheel out (yes, as I was getting sick) for my 18 mile hilly loop. It took around an hour and performed really well. It just flattens hills, while behaving like a normal bike at all other times.

Google Says Its Job Is to Promote Climate Change Conspiracy Theories

9:00 – 4:00 BRC

  • My fix from yesterday doesn’t work on Aaron’s machine. Verified that it works on any of my cmd-line interfaces. Weird.
  • Testing to see if my Excel has been fixed. Nope.
  • More work on getting Python imports to behave. All written up here.
  • Got the data sheets working with the plots: clusters
    plt.imshow(df, extent=[3, points/3, 0.9, 0.1], aspect='auto',"hot"))

    clusters3d This took a while to figure out. The X, Y arrays are used to create the mesh. The df.as_matrix() contains the Z values. Just make sure that row size and column size match!

    fig = plt.figure()
    ax = Axes3D(fig)
    X = co.get_min_cluster_array()
    Y = co.get_eps_array()
    X, Y = np.meshgrid(X, Y)
    Z = df.as_matrix()
    ax.plot_surface(X, Y, Z,"hot"))
  • Next is to save the best cluster run with its EPS and cluster size
  • Wound up just using the number of clusters for now. That seems to be a better fitness test. Also, I discovered that all Dataframes are references, so it’s important to make a deep copy if you’re trying to keep the best one around! Here are the best results so far:

Phil 3.23.17

7:00 – 8:00, 4:00 – 5:00 Research

8:30 – 10:30, 12:30 – 3:30 BRC

  • I don’t think my plots are right. Going to add some points to verify…
  • First, build a matrix of all the values. Then we can visualize as a surface, and look for the best values after calculation
  • Okay………. So there is a very weird bug that Aaron stumbled across in running python scripts from the command line. There are many, many, many thoughts on this, and it comes from a legacy issue between py2 and py3 apparently. So, much flailing:
    python -i -m
    python -m OptimizedClustererPackage\
    C:\Development\Sandboxes\TensorflowPlayground\OptimizedClustererPackage>C:\Users\philip.feldman\AppData\Local\Programs\Python\Python35\python.exe -m C:\Development\Sandboxes\TensorflowPlayground\OptimizedClustererPackage\


  • After I’d had enough of this, I realized that the IDE is running all of this just fine, so something works. So, following this link, I set the run config to “Show command line afterwards”: PyRunConfig The outputs are very helpful:
    C:\Users\philip.feldman\AppData\Local\Programs\Python\Python35\python.exe C:\Users\philip.feldman\.IntelliJIdea2017.1\config\plugins\python\helpers\pydev\ 60741 60742 C:/Development/Sandboxes/TensorflowPlayground/OptimizedClustererPackage/
  • Editing out the middle part, we get
    C:\Users\philip.feldman\AppData\Local\Programs\Python\Python35\python.exe C:/Development/Sandboxes/TensorflowPlayground/OptimizedClustererPackage/

    And that worked! Note the backslashes on the executable and the forward slashes on the argument path.

  • Update #1. Aaron’s machine was not able to run a previous version of the code, so we poked at the issues, and I discovered that I had left some code in my imports that was not in his code. It’s the Solution #4: Use absolute imports and some boilerplate code“section from this StackOverflow post. Specifically, before importing the local files, the following four lines of code need to be added:
    import sys # if you haven't already done so
    from pathlib import Path # if you haven't already done so
    root = str(Path(__file__).resolve().parents[1])
  • After which, you can add your absolute imports as I do in the next two lines:
    from OptimizedClustererPackage.protobuf_reader import ProtobufReader
    from OptimizedClustererPackage.DBSCAN_clusterer import DBSCANClusterer
  • And that seems to really, really, really work (so far).

Phil 3.22.17

8:30 – 6:00 BRC

  • Working on GA optimizer. I have the fitness function running and it seems reasonable. First, here’s the data with one clustering run: Cluster_128
  • And here’s the PDF of fitness by min cluster size clusterOptimizer note that there are at least three pdfs, though the overall best overall value doesn’t change
  • Aaron is importing now. for some output, I now write the cluster iterations to a text file

Aaron 3.21.17

Missed my blog yesterday as I got overwhelmed with a bunch of tasks. I’ll include some elements here:

  • KeyGeneratorLibrary
    • I got totally derailed for multiple hours as one of the core libraries we use throughout the system to generate 128-bit non-crypto hashes for things like rowIds had gotten thoroughly dorked up. Someone had accidentally dumped 70 mb of binary unstructured content into the library and checked it in.
    • While I was clearing out all the binary content, I was asked to remove all of the unused dependencies from our library template. All of our other libraries include SpringBoot and a bunch of other random crap, but I took the time to rip it all out and build a new version, and update our Hadoop jobs to use the latest one. The combined changes dropped the JAR from ~75 mb to 3k. XD
  • Hadoop Development
    • More flailing wildly trying to get our Hadoop testing and development process fixed. We’re on a new environment, and essentially it broke everything, so we have no way to develop, update, or test any of our Hadoop code.
    • Apparently this has been fixed (again).
  • TensorFlow / Sci-Py Clustering
    • Sat in with Phil for a bit looking at his latest fancy code and the output of the clusters. Very impressive, and the code is nice and clean. I’m really looking forward to moving over to predominantly Python code. I’m super burned out on Java right now, and would far rather be working on pure machine learning content rather than infrastructure and pre-processing. Maybe next sprint?
  • TFRecord Output
    • Got a chance to write a playground for TFRecord output and Python integration, before realizing that the TF ecosystem code only supports InputFormat/OutputFormat for Hadoop, and due to our current issues I cannot run those tests locally at all. *sad trombone*
  • Python Integration
    • My day is rapidly winding to a close, but slapping out the test code for the Python process launching so I can at least feel like I accomplished something today.
  • Cycling / Health
    • Didn’t get to cycle today because I spent 2 hours trying to get a blood test so my doctor can verify my triglycerides have gone down.

Phil 3.21.17

7:00 – 8:00 Research

8:30 – 3:00 BRC

  • Switching gears from LaTex to Python takes effort. Neither is natural or comfortable yet
  • Sent Jeremy a note on conferences and vacation. Using the hours on my paycheck stub, which *could* be correct…
  • More clustering. Adding output that will be used for the optimizer clusters
    clusters = 4
    Total  = 512
    clustered = 437
    unclustered = 75
  • Built out the optimizer and filled it with a placeholder function. Will fill in after lunchminima
  • Had to leave to take care of dad, who fainted. But here are my thoughts on the GA construction. The issue with fitness test is that we have two variables to optimize, the EPS and the minimum cluster size, based on the number of clusters and the number of unclustered. I want to unitize the outputs sop that 2.0 is best and 0.0 is worst. The unclustered should be 1.0 – unclustered/total. The number of clusters should be clusters/(total/min_cluster_size).
  • The way the GA should work is that we start with a set of initial EPSs (0 – 1) and a set of cluster sizes (3 – total/3). We try each, throw the bottom half away, keep the top result and breed a new set by interpolating (random distances?) between the remaining. We also  randomly generate a new allele or two in case we get trapped on a local maxima.  When we are no longer getting any improvement (some epsilon) we stop. All the points can be plotted and we can try to fit a polyline as well (one for eps and for minimum cluster? Could plot as a surface…)

Phil 3.20.17


7:00 – 8:30 Research

  • Morning thought. If the perimeter is set to ‘lethal’, what is the SHR that required the lowest replenishment in an “All Exploit” scenario? Also, how many explorers are needed to keep a runaway echo chamber in range?
  • Need to factor in some of what Arendt talks about in her Bose-Einstein Condensate model of end-stage totalitarianism. Also ordered this
  • MASON is a fast discrete-event multiagent simulation library core in Java, designed to be the foundation for large custom-purpose Java simulations, and also to provide more than enough functionality for many lightweight simulation needs. MASON contains both a model library and an optional suite of visualization tools in 2D and 3D. Documentation here.
  • Working on poster. Going to try LaTex mostly to get better at it. Need to pull up my TEI poster to see the format we use. Using the beamerposter format. So far, pretty painless.

9:00 – 5:00 BRC

  • Create the framework
    • Reader – built the generator part
    • Clusterer – have simple DBSCAN working. It’s pickier than I would have thoughtclusters
    • Optimizer
    • Writer
  • Request time “off” for collective intelligence (June 15-16)  and  HCIC (June 25- 29), and vacation (June 4 – 11)


Monday task!!!

Call OPM at 1-888-767-6738 after scrum

And this looks pretty interesting: Found it looking for bill full text to feed into the LMN system. Here’s an example of tagged xml

<?xml version="1.0"?>
<bill bill-stage="Introduced-in-House" dms-id="H7B2411C180AA4EF7AE87C3F9B3844016" public-private="public" bill-type="olc"> 
<metadata xmlns:dc="">
<dc:title>113 HR 1237 IH: To authorize and request the President to award the Medal of Honor posthumously to Major Dominic S. Gentile of the United States Army Air Forces for acts of valor during World War II.</dc:title>
<dc:publisher>U.S. House of Representatives</dc:publisher>
<dc:rights>Pursuant to Title 17 Section 105 of the United States Code, this file is not subject to copyright protection and is in the public domain.</dc:rights>
<distribution-code display="yes">I</distribution-code> 
<congress>113th CONGRESS</congress>
<session>1st Session</session>
<legis-num>H. R. 1237</legis-num> 
<current-chamber>IN THE HOUSE OF REPRESENTATIVES</current-chamber> 
<action-date date="20130318">March 18, 2013</action-date> 
<action-desc><sponsor name-id="B001281">Mrs. Beatty</sponsor> introduced the following bill; which was referred to the <committee-name committee-id="HAS00">Committee on Armed Services</committee-name></action-desc>
<legis-type>A BILL</legis-type> 
<official-title>To authorize and request the President to award the Medal of Honor posthumously to Major Dominic S. Gentile of the United States Army Air Forces for acts of valor during World War II.</official-title> 
<legis-body id="HC4FC3A2EC9CD480F8E7100E4CF3C2F3C" style="OLC"> 
<section id="HED5DF0B8F90849ECB7D6A49028BC38E1" section-type="section-one"><enum>1.</enum><header>Authorization and request for award of Medal of Honor to Dominic S. Gentile for acts of valor during World War II</header> 
<subsection id="H21C369FA21D644EB9F38767C54E49A0B"><enum>(a)</enum><header>Findings</header><text display-inline="yes-display-inline">Congress makes the following findings:</text> 
<paragraph id="H6A5FBF181F68426CB457AB237F565723"><enum>(1)</enum><text display-inline="yes-display-inline">Major Dominic S. Gentile of the United States Army Air Forces destroyed at least 30 enemy aircraft during World War II, making him one of the highest scoring fighter pilots in American history and earning him the title of <quote>Ace of Aces</quote>.</text></paragraph> 
<paragraph id="HF3355B912FA6432295B768BE5B58842A"><enum>(2)</enum><text>Major Gentile was the first American fighter pilot to surpass Captain Eddie Rickenbacker’s WWI record of 26 enemy aircraft destroyed.</text></paragraph> 
<paragraph id="H18EE27FA1F7A48CB8AE62301BF0B31ED"><enum>(3)</enum><text>Major Gentile was awarded several medals in recognition of his acts of valor during World War II, including two Distinguished Service Crosses, seven Distinguished Flying Crosses, the Silver Star, the Air Medal, and received similar honors from Great Britain, Italy, Belgium, and Canada.</text></paragraph> 
<paragraph id="H2F7E271C44E84E5DBC95C9F58127B93E"><enum>(4)</enum><text display-inline="yes-display-inline">Major Gentile was born in Piqua, Ohio, and died January 23, 1951, after which he was posthumously appointed to the rank of major.</text></paragraph> 
<paragraph id="HA6F4601200454270939A016AC9B9F96D"><enum>(5)</enum><text>Major Gentile is buried in Columbus, Ohio. Gentile Air Force Station in Kettering, Ohio, is named in his honor and he was inducted into the National Aviation Hall of Fame in 1995.</text></paragraph></subsection> 
<subsection display-inline="no-display-inline" id="H5B70830D32B64B4B858A89AAC16A8A4D"><enum>(b)</enum><header>Authorization</header><text display-inline="yes-display-inline">Notwithstanding the time limitations specified in <external-xref legal-doc="usc" parsable-cite="usc/10/3744">section 3744</external-xref> of title 10, United States Code, or any other time limitation with respect to the awarding of certain medals to persons who served in the Armed Forces, the President is authorized and requested to award the Medal of Honor posthumously under section 3741 of such title to former Major Dominic S. Gentile of the United States Army Air Forces for the acts of valor during World War II described in subsection (c).</text></subsection> 
<subsection commented="no" display-inline="no-display-inline" id="H0003A92F45354335B4497C04FE62D068"><enum>(c)</enum><header>Acts of valor described</header><text display-inline="yes-display-inline">The acts of valor referred to in subsection (b) are the actions of then Major Dominic S. Gentile who, as a pilot of a P–51 Mustang in the Army’s 336th Fighter Squadron, Fourth Fighter Group, of the Eighth Air Force in Europe during World War II, distinguished himself conspicuously by gallantry and intrepidity at the risk of his life above and beyond the call of duty by destroying at least 30 enemy aircraft during his service in the United State Army Air Forces.</text></subsection></section> 

Aaron 3.17.17

  • Hadoop Environment
    • More fun discussions on our changes to Hadoop development today. Essentially we have a DevOps box with a baby Hadoop cluster we can use for development.
  • ClusteringService scaffold / deploy
    • I spent a bit of time today building out the scaffold MicroService that will manage clustering requests, dispatch the MapReduce to populate the comparison tensor, and interact with the TensorFlow Python.
    • I ran into a few fits and starts with syntax issues where the service name was causing fits because of errant “-“. I resolved those and updated the dockerfile with the new TensorFlow docker image. Once I have a finished list of the packages I need installed for Python integration I’ll have to have them updated to that image.
    • Bob said he would look at moving over the scaffold of our MapReduce job launching code from a previous service, and I suggested he not blow away all the work I had just done and copy the as needed pieces in.
  • TFRecord output
    • Trying to complete the code for outputting MapReduce results as a TFRecord protobuff objects for TensorFlow.
    • I created a PythonIntegrationPlayground project with an class responsible for building a populated test matrix in a format that TensorFlow can view.
    • Google supports this with their ecosystem libraries here. The library includes instructions with versions and a working sample for MapReduce as well as Spark.
    • The frustrating thing is that presumably to avoid issues with version mismatches, they require you to compile your own .proto files with the protoc compiler, then build your own JAR for the ecosystem.hadoop library. Enough changes have happened with protoc and how it handles the locations of multiple inter-connected proto files that you absolutely HAVE to use the locations they specify for your TensorFlow installation or it will not work. In the old days you could copy the .proto files local to where you wanted to output them to avoid path issues, but that is now a Bad Thing(tm).
    • The correct commands to use are:
      • protoc –proto_path=%TF_SRC_ROOT% –java_out=src\main\java\ %TF_SRC_ROOT%\tensorflow\core\example\example.proto
      • protoc –proto_path=%TF_SRC_ROOT% –java_out=src\main\java\ %TF_SRC_ROOT%\tensorflow\core\example\feature.proto
    • After this you will need Apache Maven to build the ecosystem JAR and install so it can be used. I pulled down the latest (v3.3.9) from
    • Because I’m a sad, sad man developing on a Windows box I had to disable to Maven tests to build the JAR, but it’s finally built and in my repo.
  • Java/Python interaction
    • I looked at a bunch of options for Java/Python interaction that would be performant enough, and allow two-way communication between Java/Python if necessary. This would allow the service to provide the location in HDFS to the TensorFlow/Sci-Kit Python clustering code and receive success/fail messages at the very least.
    • Digging on StackOverflow lead me to a few options.
    • Digging a little further I found JPServe, a small library based on PyServe that uses JSON to send complex messages back to Java.
    • I think for our immediate needs its most straightforward to use the ProcessBuilder approach:
      • ProcessBuilder pb = new ProcessBuilder(“python”,””,””+number1,””+number2);
      • Process p = pb.start();
    • This does allow return codes, although not complex return data, but it avoids having to manage a PyServe instance inside a Java MicroService.
  • Cycling
    • I’ve been looking forward to a good ride for several days now, as the weather has been awful (snow/ice). Got up to high 30s today, and no visible ice on the roads so Phil and I went out for our ride together.
    • It was the first time I’ve been out with Phil on a bike with gears, and its clear how much I’ve been able to abuse him being on a fixie. If he’s hard to keep up with on a fixed gear, its painful on gears. That being said, I think I surprised him a bit when I kept a 9+ mph pace up the first hill next to him and didn’t die.
    • My average MPH dropped a bit because I burned out early, but I managed to rally and still clock a ~15 mph average with some hard peddling towards the end.
    • I’m really enjoying cycling. It’s not a hobby I would have expected would click with me, but its a really fun combination of self improvement, tenacity, min-maxing geekery, and meditation.

Phil 3.17.17

7:00 – 8:00 Research

8:30 – 6:00 BRC

Phil 3.16.17

7:00 – 8:00, 4:00 – 5:30 Research

8:30 – 3:30 BRC

  • Added subtasks for the clustering optimizer
  • Meeting with Aaron and Heath about scalability
  • Converting a panda data frame to a numpy ndarray – done
    df = createDictFrame(rna, cna)
    df = df.sort_values(by='sum', ascending=True)
    mat = df.as_matrix()
  • Working on polynomials – donepolyLine
  • Played with the Mandelbrot set as well. Speedy!
  • This came across my feed: Scikit-Learn Tutorial Series