Aaron 4.3.17

  • ML Architecture
    • Spent a bunch of time last Friday meeting with Phil to discuss the proposed path for the Machine Learning epics to develop the research browser.
    • Our plan uses a thin-client Angular 2 app for the bulk of the annotation/tagging process, with an optional companion browser plugin developed later to do in-document tagging, which will capture the URL, and snippet text.
    • We’re intending to a simple Naive Bayesian classifier for document categories; and to use more complex classifiers (DNNs) for snippet content and user behaviors in the future.
    • Given this we’re feeling pretty confident about the proposed timeframe. It’s unclear how we’re implement the Bayesian Classifier, since it’s already been developed in Weka/Java, it may not be in our best interests to re-write it into a Python-based version.
  • Python integration
    • Using ProcessBuilder works for the simple case where we want to do essentially batch clustering, but it is very difficult to debug in CI/Prod instances as it becomes a “black box”. There are methods to make it more communicative, but we should investigate looking at a Python based WSO2 secured microservice. It would make it far easier to integrate Python code into our stack.
    • I looked at multiple methods to do HDFS integration using Python, and found some canonical recent examples with Python 3.x.
  • Hadoop is dead, long live ML?
  • ClusteringService
    • Reviewed the MapReduce code for the service. It’s pretty straightforward, using the mapper to build the row data and the reducer to format it for output.
    • The actual table it needs to pull from is currently missing… so tests do not pass if set to the real table, but once my new laptop is loaded I will be able to make changes.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.