March | 2017 | viztales

TensorFlow
- Figuring out TensorFlow documentation and tutorials (with a focus on matrix operations, loading from hadoop, and clustering).
- Really basic examples with tiny data sets like linear regression with gradient descent optimizers are EASY. Sessions, variables, placeholders, and other core artifacts all make sense. Across the room Phil’s hair is getting increasingly frizzy as he’s dealing with more complicated examples that are far less straightforward.

Test extraction of Hadoop records
- Create TF tensors using Python against HBASE tables to see if the result is performant enough (otherwise recommend we write a MapReduce job to build out a proto file consumed by TF)

Test polar coordinates against client data
- See if we can use k-means/DBSCAN against polar coordinates to generate the correct clusters with known data). If we cannot use polar coordinates for dimension reduction, what process is required to implement DBSCAN in TensorFlow?

Architecture Diagram
- The artifacts for this sprint’s completion are architecture diagrams and proposal for next sprint’s implementation. I haven’t gotten feedback from the customer about our proposed framework, but it will come up in our end-of-sprint activities. Design path and flow diagram are due on Wednesday.
Cycling
- I did my first 15.2 mile ride today. My everything hurts, and my average speed was way down from yesterday, but I finished.

It’s March and no new wars! Hooray!

7:00 – 8:00 Research

Added an excerpt (the last half or so) from Home to Roost to the paper. Disturbingly appropriate.

Continuing conference/journal spreadsheet. Here’s the sorted list:

journals	website	impact factor	papers collected
Physical Review E	http://journals.aps.org/pre/	2.252	7
IEEE Transactions on Automatic Control	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9	2.777	4
Nature	http://www.nature.com/nature/index.html	42.351	2
The Journal of Mathematical Sociology	http://www.tandfonline.com/toc/gmas20/current	0.68	2
IEEE Transactions on Industrial Informatics	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9424	4.708	1
Expert Systems with Applications	https://www.journals.elsevier.com/expert-systems-with-applications	2.981	1
Public Opinion Quarterly	https://academic.oup.com/poq	2.825	1
Communications in Mathematical Physics	https://link.springer.com/journal/220	2.375	1
Journal of Conflict Resolution	http://journals.sagepub.com/home/jcr	1.687	1
Physics Letters A	http://www.sciencedirect.com/science/journal/03759601	1.677	1
Journal of Artifical Societies and Social Simulation	http://jasss.soc.surrey.ac.uk/	1.42	1
Behavioural Processes	http://www.sciencedirect.com/science/journal/03766357	1.318	1
Journal of Political Philosophy	http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1467-9760	1.044	1
The Knowledge Engineering Review	https://www.cambridge.org/core/journals/knowledge-engineering-review	1.039	1
Proceedings of the National Academy of Sciences of the United States of America	http://www.pnas.org/		1
PlosOne	http://journals.plos.org/plosone/		1

Next step is to find submission formats. The obvious targets are Phys Rev E and IEEE TAC. I’m tempted by the JofPP, since that was where the original group polarization paper was published. And I need to understand what a Nature letter is.

8:30 – 4:30 BRC

More TensorFlow

MNIST tutorial – clear, but a LOT of stuff
Neural Networks and Deep Learning is an online book referenced in the TF documentation (at least the softmax chapter)
A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimension. For example, 3 would be [0,0,0,1,0,0,0,0,0,0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.
If you want to assign probabilities to an object being one of several different things, softmax is the thing to do, because softmax gives us a list of values between 0 and 1 that add up to 1. Even later on, when we train more sophisticated models, the final step will be a layer of softmax.
```
x = tf.placeholder(tf.float32, [None, 784])
```
We represent this as a 2-D tensor of floating-point numbers, with a shape [None, 784]. (Here None means that a dimension can be of any length.)
A good explanation of cross-entropy, apparently.
tf.reduce_mean

Success!!! Here’s the code:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10]) #note that y_ means 'y prme'

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

sess = tf.InteractiveSession()

tf.global_variables_initializer().run()

for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

And here are the results:

C:\Users\philip.feldman\AppData\Local\Programs\Python\Python35\python.exe C:/Development/Sandboxes/TensorflowPlayground/HelloPackage/MNIST_tutorial.py
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

0.9192

Working on the advanced tutorial. Fixed fully_connected_feed.py to work with local data.
And then my brain died

viztales

Dimension reduction, State, Orientation, and Speed

Monthly Archives: March 2017

Aaron 3.1.17

Phil 3.1.17