Aaron 4.3.17

ML Architecture
- Spent a bunch of time last Friday meeting with Phil to discuss the proposed path for the Machine Learning epics to develop the research browser.
- Our plan uses a thin-client Angular 2 app for the bulk of the annotation/tagging process, with an optional companion browser plugin developed later to do in-document tagging, which will capture the URL, and snippet text.
- We’re intending to a simple Naive Bayesian classifier for document categories; and to use more complex classifiers (DNNs) for snippet content and user behaviors in the future.
- Given this we’re feeling pretty confident about the proposed timeframe. It’s unclear how we’re implement the Bayesian Classifier, since it’s already been developed in Weka/Java, it may not be in our best interests to re-write it into a Python-based version.
Python integration
- Using ProcessBuilder works for the simple case where we want to do essentially batch clustering, but it is very difficult to debug in CI/Prod instances as it becomes a “black box”. There are methods to make it more communicative, but we should investigate looking at a Python based WSO2 secured microservice. It would make it far easier to integrate Python code into our stack.
- I looked at multiple methods to do HDFS integration using Python, and found some canonical recent examples with Python 3.x.
  - http://wesmckinney.com/blog/python-hdfs-interfaces/
  - https://pypi.python.org/pypi/hdfs/
Hadoop is dead, long live ML?
- http://www.datasciencecentral.com/profiles/blogs/goodbye-age-of-hadoop-hello-cambrian-explosion-of-deep-learning
- https://www.thoughtworks.com/radar
ClusteringService
- Reviewed the MapReduce code for the service. It’s pretty straightforward, using the mapper to build the row data and the reducer to format it for output.
- The actual table it needs to pull from is currently missing… so tests do not pass if set to the real table, but once my new laptop is loaded I will be able to make changes.

viztales

Dimension reduction, State, Orientation, and Speed

Aaron 4.3.17

Share this:

Related