Phil 12.15.15

7:00 – 3:30 VTX

  • Representations: Classes, Trajectories, Transitions
    • Inner language, the language with which we think
    • Semantic nets
      • parasitic semantics – where we project knowing to the machine. We contain the meaning, not the machine.
    • Combinators = edge
    • Reification – linking links?
    • Sequence
    • Minsky – Frames or templates add a localization layer.
    • Classification
    • Transition
      • Vocabulary of change, not state
      • (!)Increase, (!)decrease, (!)change, (!)appear, (!)disappear
    • Trajectory
      • Objects moving along trajectories
      • Trajectory frame (prepositions help refine – by, with, from, for, etc)
        • Starts at a source
        • Arranged by agent, possibly with collaborator
        • assisted by instrument
        • can have a conveyance
        • Arrives at destination
        • Beneficiary
      • Wall Street Journal Corpus
        • 25% transitions or trajectories.
      • Pat comforted Chris
        • Role Frame
          • Agent: Pat
          • Action: ??
          • Object: Chris
          • Result: Transition Frame
            • Object: Chris
            • Mood: Improved (increased)
    • Story Libraries
      • Event Frames – adds time and place
        • Disaster -adds fatalities, cost
          • Earthquake – adds name, category
          • Hurricane – – adds magnitude, fault
        • Party
          • Birthday
          • Wedding – adds bride and groom
  • Scrum
  • Working on downloading and running the NLP code
    • Downloaded Java EE 7u2
    • Downloaded Gradle 2.9
    • Installed and compiled. Took 41 minutes!
    • Working on running it now, which looks like I need Tomcat. To run Tomcat on port 80, I had to finally chase down what was blocking port 80. I finally found it by running NET stop HTTP, (from here) which gave me a list that I could check against the services. I monitored this with Xampp’s nifty Netstat tool. The offending process was BranchCache, which I disabled. Now we’ll see what that breaks…
    • Tomcat up and running
    • NLPService blew up. More secret knowledge:
      Local RabbitMQ Setup
      
      Install Erlang 
      
      # http://www.erlang.org/download/otp_win64_17.5.exe
      
      # Set *ERLANG_HOME* in system variables. (e.g. C:\Program Files\erl6.4)
      
      Install RabbitMQ 
      
      # http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.3/rabbitmq-server-3.5.3.exe
      
      #* If you get Windows Security Alert(s) for *epmd.exe* and/or *erl.exe*, check "Domain networks..." and uncheck "Private networks" and "Public networks"
      
      # Open the command prompt as *administrator*
      
      # Go to C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.3\sbin.
      
      # Run the following commands:             
      
      rabbitmq-plugins.bat enable rabbitmq_web_stomp rabbitmq_stomp rabbitmq_management
      
      rabbitmq-service.bat stop                                                        
      rabbitmq-service.bat install                                                     
      rabbitmq-service.bat start                                                      
      
      RabbitMQ Admin Console
      http://localhost:15672/mgmt
      
      guest/guest
    • Installed Erlang and RabbitMQ. We’ll try running tomorrow.

Phil 12.14.15

7:00 – 3:30 VTX

  • Learning: Boosting
    • Binary classifications
    • Weak Classifier = one that is barely better than chance.
    • Adaboost for credibility analysis? Politifact is the test. Speakers, subjects, etc are classifiers. What mix of classifiers produces the most accurate news? Something like this (check citations in the paper)
    • Which means that we can keep track of those items that are always moved to the top of the pertinence list and score them as true(?). This means that we can then use that result to weight the sources that appear to be credible so that they in turn become more relevant (we can also look at the taxonomy terms that get maximized and minimized) the next query.
  • Discussion with Jeremy about the RDB schemas
  • Scrum – really short
  • RDB design meeting. Lots of discussion about data sources but nothing clear. Jeremy didn’t like the unoptimized storage of the general model
  • Followon discussions with Jeremy. I showed him how unions can fix his concerns. He adjusted the schema, but I can’t get on the VPN at home for some reason. Will see tomorrow.

Phil 12.11.15

8:00 – 5:00 VTX

  • No AI course this morning, had to drop off the car.
  • Some preliminary discussions about sprint planning with Aaron yesterday. Aside from the getting the two ‘Derived’ database structures reconciled, I need to think about a few things:
    • who the network ‘users’ are. I think it could be VTX, or the system customers, like Aetna.
    • What kinds of networks exist?
      • Each individual doctor is a network of doctors, keywords, entities, sources, threats and ratings. That can certainly run on the browser
      • Then there is the larger network of ‘relevant’ doctors. That’s a larger network, certainly in the 10s – 100s range. On the lower end of the scale that could be done directly in the browser. For larger networks, we might have to use the GPU? Which seems very doable, via Steve Sanderson.
      • Then there is the master ranking, which should be something like most threatening to least threatening, probably. Queries with additional parameters pull a subset of the ordered data (SELECT foo, bar from ?? ORDER BY eigenvalue). Interestingly, according to this IEEE article from 2010, GPU processing  was handling 10 million nodes in about 30 seconds using optimized sparse matrix (SpMV) calculations. So it’s conceivable that calculations could be done in real time.
  • More documentation
  • More discussions wit Aaron about where data lives and how it’s structured.
  • Sprint planning

Phil 12.10.15

7:00 – 3:30 VTX

  • Sandy Spring Bank!
  • Honda!
  • Learning: Support Vector Machines
    • More sophisticated decision bounding, with fewer ad hoc choices than GAs and NNs
    • A positive sample must have a dot product with the ‘nomal vector’ that is >= 1.0. Similarly, a negative sample mus be <= -1.0.
    • Gotta minimize with constraints: Lagrange Multipliers from Multivariable Calculus
    • Guaranteed no local maxima
  • System Description (putting it up here)

Phil 12.9.15

7:00 – VTX

  • Learning: Near Misses, Felicity Conditions
    • One shot learning
    • Describing the difference between the desired goal/object and near misses. Model is decorated with information is important.
      • Relations are in imperative form (must not touch, must support, etc.)
    • Pick a seed
    • Apply your heuristics until all the positives are included
    • Then use negatives to throw away unneeded heuristics
    • Use a beam search
    • Near misses lead to specialization, compare to general models lead to generalization (look for close items using low disorder measures for near misses and high for examples?)
    • Model Heuristics (
      An application of variable-valued logic to inductive learning of plant disease diagnostic rules)

      • Require Link (Specialization step)
      • Forbid Link (Specialization step)
      • Extend Set (Generalization step)
      • Drop Link (Generalization step)
      • Climb Tree (Generalization step)
    • Packaging ideas
      • Symbol associated with the work – a visual handle
      • Slogan – a verbal handle (‘Near Miss’ learning)
      • Surprise – Machine can learn something definate from a single example
      • Salient – something that sticks out (One shot learning via near misses)
      • Story
  • More dev machine setup
    • Added typescript-install to the makefile tasks, since I keep on forgetting about it.
    • Compiled and ran WebGlNeworkCSS. Now I need to set up the database.
    • Got that in, but had a problem with the new db having problems with the text type of PASSWORD(). I had to add COLLATE to the where clause as follows:
      "UPDATE tn_users set password = PASSWORD(:newPassword) where password = PASSWORD(:oldPassword) COLLATE utf8_unicode_ci and login = :login"
    • last error is that the temp network isn’t being set in the dropdown for available networks. Fixed. It turned out to be related to the new typescript compiler catching some interface errors that the old version didn’t.
  • Ok, I think it’s time to start writing up what the current system is and how it works.

Phil 12.8.15

7:00 – 4:30 VTX

Phil 12.7.15

8:00 – 5:00 VTX

  • Got my laptop from John and got it set up. Incredibly slow network performance, which I figured was the wifi. Hooked up the hard line and disabled the wifi, which doesn’t see the network at all, and won’t let me reconnect the wifi. Working from home for the rest of the day.
  • At seminar, had a really interesting discussion about how taxonomies intersecting with knowledge graphs essentially result in a kind of pro-forma synthesis. Hmm.

Phil 12.4.15

8:00 – VTX

  • Scrum
  • Found an interesting tidbit on the WaPo this morning. It implies that if there is a pattern of statement followed by a search for confirming information followed by a public citation of confirming information could be the basic unit of an information bubble. For this to be a bubble, I think the pertinent information extracted from the relevant search results would have to be somehow identifiable as a minority view. This could be done by comparing the Jaccard index of the adjusted results with the raw returns of a search? In other words, if the world (relevant search)  has an overall vector in one direction and the individual preferences produce a pertinent result that is pointing in the opposite direction (large dot product), then the likelihood of those results being the result of echo-chamber processes are higher?
  • If the Derived DB depends on analyst examination of the data, this could be a way of flagging analyst bias.
  • Researching WebScaleSQL, I stumbled on another db from Facebook. This one,  RocksDB, is more focused on speed. From the splash page:
    • RocksDB can be used by applications that need low latency database accesses. A user-facing application that stores the viewing history and state of users of a website can potentially store this content on RocksDB. A spam detection application that needs fast access to big data sets can use RocksDB. A graph-search query that needs to scan a data set in realtime can use RocksDB. RocksDB can be used to cache data from Hadoop, thereby allowing applications to query Hadoop data in realtime. A message-queue that supports a high number of inserts and deletes can use RocksDB.
  • Interestingly, RocksDB appears to have integration with MongoDB and is working on MySQL integration. Cassandra appears to be implementing similar optimizations.
  • Just discovered reported.ly, which is a social medial sourced, reporter curated news stream. Could be a good source of data to compare against things like news feeds from Google or major news venues.
  • Control System Meeting
    • Send RCS and Search Competition to Bob
    • Seems like this whole system is a lot like what Databricks is doing?

Phil 12.3.15

7:00 – 5:00 VTX

  • Learning: Genetic Algorithms
    • Rank space (probability is based on unsorted values??)
    • Simulated annealing – reducing step size.
    • Diversity rank (from the previous generation) plus fitness rank
  • Some more timing results. The view test (select count(*) from tn_view_network_items where network_id = 1) for the small network_1 is about the same as the pull for the large network_8, about .75 sec. The pull from the association table without the view is very fast – 0.01 for network_1 and 0.02 for network_8. So this should mean that a 1,000,000 item pull would take 1-2 seconds.
  • mysql> select count(*) from tn_associations where network_id = 1;
     11 
    1 row in set (0.01 sec)
    
    mysql> select count(*) from tn_associations where network_id = 8;
     10000 
    1 row in set (0.01 sec)
    
    mysql> select count(*) from tn_view_network_items where network_id = 8;
     10000 
    1 row in set (0.88 sec)
    
    mysql> select count(*) from tn_view_network_items where network_id = 1;
     11 
    1 row in set (0.71 sec)
  • Field trip to Wall NJ
    • Learned more about the project, started to put faces to names
    • Continued to look at DB engines for the derived DB. Discovered WebScaleSQL, which is a collaboration between Alibaba, Facebook, Google, LinkedIn, and Twitter to produce a big(!!) version of MySql.
    • More discussions with Aaron D. about control systems, which means I’m going to be leaning on my NIST work again.

Phil 12.2.15

7:00 –

  • Learning: Neural Nets, Back Propagation
    • Synaptic weights are higher for some synapses than others
    • Cumulative stimulus
    • All-or-none threshold for propagation.
    • Once we have a model, we can ask what we can do with it.
    • Now I’m curious about the MIT approach to calculus. It’s online too: MIT 18.01 Single Variable Calculus
    • Back-propagation algorithm. Starts from the end and works forward so that each new calculation depends only on its local information plus values that have already been calculated.
    • Overfitting and under/over damping issues are also considerations.
  • Scrum meeting
  • Remember to bring a keyboard tomorrow!!!!
  • Checking that my home dev code is the same as what I pulled down from the repository
    • No change in definitelytyped
    • No change in the other files either, so those were real bugs. Don’t know why they didn’t get caught. But that means the repo is good and the bugs are fixed.
  • Validate that PHP runs and debugs in the new dev env. Done
  • Add a new test that inputs large (thousands -> millions) of unique ENTITY entries with small-ish star networks of partially shared URL entries. Time view retrieval times for SELECT COUNT(*) from tn_view_network_items WHERE network_id = 8;
    • Computer: 2008 Dell Precision M6300
    • System: Processor Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz, 2201 Mhz, 2 Core(s), 2 Logical Processor(s), Available Physical Memory 611 MB
    • 100 is 0.09 sec
    • 1000 is 0.14 sec
    • 10,000 is 0.84 sec
    • Using Open Office’s linear regression function, I get the equation t = 0.00007657x + 0.733 with an R squared of 0.99948.
    • That means 1,000,000 view entries can be processed in 75 seconds or so as long as things don’t get IO bound
  • Got the PHP interpreter and debugger working. In this case, it was just refreshing in settings->languages->php

Phil 12.1.15

7:30 – 5:00

  • Learning: Identification Trees, Disorder
    • Trees of tests
    • Identification Tree (Not a decision tree!)
    • Measuring Disorder – lowest disorder is best test
      • Disorder(set of binaries) = -(positive/total*log2(positive/total)) – (neg/total*log2(neg/total))
        • is the base log related to the base of the set?
        • Add up the disorder of each result in the test to determine the disorder of the test normalized by the number of samples. Lowest disorder is winner
  • Bringing in my machine learning, pattern recognition and stats books.
  • Bringing in my big laptop
  • Setting up dev environment.
    • Using the new IDEA 15.x, which seems to be OK for the typescript, will check PHP tomorrow.
    • Installed grunt (grunt-global, then grunt-local from the makefiles)
    • installed typescript (npm i -g typescript)
    • Installed gnuWin32 , which has makefile and touch support, along with all the important DLLs. It turns out that there is also gnuWin64. Will use that next time
    • Fixed bugs that didn’t get caught before. Older compiler?
      • commented out the waa.d.ts from the three.d.ts definitelytyped file
      • deleted the { antialias: boolean; alpha: boolean; } args from the CanvasRenderer call in classes/WebGlCanvasClasses
      • added title?:string and assoc_name?:string to IPostObject in RssController
      • had to add the experiments/wglcharts2 folder to the xampp Apache htdocs
      • added word?:string to IPostObj in RssAppDirectives
      • added word_type_name?:string to IPostObj in RssAppDirectives
      • fixed the font calls in WebGl3dCharts IComponentConfig.
    • Since these issues really shouldn’t have happened, I’m going to verify that they are not in my home dev environment before checking in.
  • And the new computer arrived, so I get to do some of the install tomorrow.

Phil 11.30.15

7:00 – 2:30: ???

  • Introduction to Learning, Nearest Neighbors
    • Learning based on observations of regularity (Bulldozer Computing)
      • Nearest Neighbor
        • Pattern Recognition
      • Neural Networks
      • Boosting
    • Learning based on constraint (Human-Like)
      • One Shot Learning
      • Explanation-based learning
    • Pattern Recognition
      • Feature detector produces a vector of values.
      • Fed into a Comparator which tests the new vector against a library of other vectors
      • Can use decision boundaries
      • If something is similar in some respects, it is likely to be similar in other respects.
      • Robotic motion is a search problem these days??
  • Work
    • Standard first-day stuff
    • Discussions with Aaron about design
    • And the interesting thought for the day:
      • Do we need a sort of crowd-sourced weighting determination of machine ethics? Right now, the person that writes the code for the first self-driving car that decides the runaway trolley problem could reasonably be thought of as having committed premeditated murder. But what if we all together set those outcomes, in a way that reflected our current culture and local values?

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.24.15

7:00 – Leave

  • Constraints: Interpreting Line Drawings
    • Successful research:
      • Finds a problem
      • Finds a method that solves the problem
      • Using some principal (That can be generalized)
  • Gave Aaron M. A subversion account and sent him a description of the structure of the project
  • Back to dictionary creation
    • Wire up Extract into Dictionary
      • I think I’m going to do most of this on the server. If I do a select text from tn_view_network_items where network = X, then I can run that text that is already in the DB through the term extractor, which should be the fastest thing I can do.
      • The next fastest thing would be to pull the text from the url (if it exists) and add that to the text pull.
      • Added a getTextFromNetwork() method to NetworkDbObject.
      • The html was getting extracted badly, so I had to add a call to alchemy to return the cleaned text. TODO: in the future add a ‘clean_text’ column to tn_items so this is done on ingestion. I also added
      • Added all the pieces to the rssPull.php file and tested. And integrated with the client. Looks like it takes about 8 seconds to go through my resume, so some offline processing will probably be needed for ACM papers, for example.
    • Wire up Attach Dictionary to Network
      • The current setup is set so that a new item that is read in will associate with the current network dictionary. Need to add a way to have the items that are already in the network to check themselves against the new dictionary.
      • Added class AlchemyDictReflect that will place keywords in the DB. Still need to debug. And don’t forget that the controller will have to reload the network after all thechanges are made.