Monthly Archives: March 2017

Aaron 3.1.17

  • TensorFlow
    • Figuring out TensorFlow documentation and tutorials (with a focus on matrix operations, loading from hadoop, and clustering).
    • Really basic examples with tiny data sets like linear regression with gradient descent optimizers are EASY. Sessions, variables, placeholders, and other core artifacts all make sense. Across the room Phil’s hair is getting increasingly frizzy as he’s dealing with more complicated examples that are far less straightforward.
  • Test extraction of Hadoop records
    • Create TF tensors using Python against HBASE tables to see if the result is performant enough (otherwise recommend we write a MapReduce job to build out a proto file consumed by TF)
  • Test polar coordinates against client data
    • See if we can use k-means/DBSCAN against polar coordinates to generate the correct clusters with known data). If we cannot use polar coordinates for dimension reduction, what process is required to implement DBSCAN in TensorFlow?
  • Architecture Diagram
    • The artifacts for this sprint’s completion are architecture diagrams and proposal for next sprint’s implementation. I haven’t gotten feedback from the customer about our proposed framework, but it will come up in our end-of-sprint activities. Design path and flow diagram are due on Wednesday.
  • Cycling
    • I did my first 15.2 mile ride today. My everything hurts, and my average speed was way down from yesterday, but I finished.

Phil 3.1.17

It’s March and no new wars! Hooray!

7:00 – 8:00 Research

8:30 – 4:30 BRC

  • More TensorFlow
    • MNIST tutorial – clear, but a LOT of stuff
    • Neural Networks and Deep Learning is an online book referenced in the TF documentation (at least the softmax chapter)
    • A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.
    • If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.
    • x = tf.placeholder(tf.float32, [None, 784])

      We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)

    • A good explanation of cross-entropy, apparently.
    • tf.reduce_mean
    • Success!!! Here’s the code:
      import tensorflow as tf
      from tensorflow.examples.tutorials.mnist import input_data
      
      mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
      
      x = tf.placeholder(tf.float32, [None, 784])
      
      W = tf.Variable(tf.zeros([784, 10]))
      b = tf.Variable(tf.zeros([10]))
      
      y = tf.nn.softmax(tf.matmul(x, W) + b)
      
      y_ = tf.placeholder(tf.float32, [None, 10]) #note that y_ means 'y prme'
      
      cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
      
      train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
      
      sess = tf.InteractiveSession()
      
      tf.global_variables_initializer().run()
      
      for _ in range(1000):
          batch_xs, batch_ys = mnist.train.next_batch(100)
          sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
      
      correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
      accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
      print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
    • And here are the results:
      C:\Users\philip.feldman\AppData\Local\Programs\Python\Python35\python.exe C:/Development/Sandboxes/TensorflowPlayground/HelloPackage/MNIST_tutorial.py
      Extracting MNIST_data/train-images-idx3-ubyte.gz
      Extracting MNIST_data/train-labels-idx1-ubyte.gz
      Extracting MNIST_data/t10k-images-idx3-ubyte.gz
      Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
      
      0.9192
    • Working on the advanced tutorial. Fixed fully_connected_feed.py to work with local data.
    • And then my brain died