Category Archives: Derived DB

Phil 12.8.15

7:00 – 4:30 VTX

Learning: Sparse Spaces, Phonology
- Structure and Interpretation of Computer Programs – (constraints/propagators)
- Pick a positive example to start learning (seed)
- Generalize by matching the minimum attributes that allow the difference to be observed.
- High-dimensional sparse space can more easily be separated with a hyperplane.
  - Sparse Representations for Fast, One-Shot Learning
- Artificial Intelligence–A Personal View (Marr’s catechism)
  - A good representation makes the right things explicit and exposes constraints. A representation that allows for local rather than global identification is better.
Spring end scrum
Spent the rest of the day getting my new machine running. Almost there!

Phil 12.4.15

8:00 – VTX

Scrum
Found an interesting tidbit on the WaPo this morning. It implies that if there is a pattern of statement followed by a search for confirming information followed by a public citation of confirming information could be the basic unit of an information bubble. For this to be a bubble, I think the pertinent information extracted from the relevant search results would have to be somehow identifiable as a minority view. This could be done by comparing the Jaccard index of the adjusted results with the raw returns of a search? In other words, if the world (relevant search) has an overall vector in one direction and the individual preferences produce a pertinent result that is pointing in the opposite direction (large dot product), then the likelihood of those results being the result of echo-chamber processes are higher?
If the Derived DB depends on analyst examination of the data, this could be a way of flagging analyst bias.
Researching WebScaleSQL, I stumbled on another db from Facebook. This one, RocksDB, is more focused on speed. From the splash page:
- RocksDB can be used by applications that need low latency database accesses. A user-facing application that stores the viewing history and state of users of a website can potentially store this content on RocksDB. A spam detection application that needs fast access to big data sets can use RocksDB. A graph-search query that needs to scan a data set in realtime can use RocksDB. RocksDB can be used to cache data from Hadoop, thereby allowing applications to query Hadoop data in realtime. A message-queue that supports a high number of inserts and deletes can use RocksDB.
Interestingly, RocksDB appears to have integration with MongoDB and is working on MySQL integration. Cassandra appears to be implementing similar optimizations.
Just discovered reported.ly, which is a social medial sourced, reporter curated news stream. Could be a good source of data to compare against things like news feeds from Google or major news venues.
Control System Meeting
- Send RCS and Search Competition to Bob
- Seems like this whole system is a lot like what Databricks is doing?

Phil 12.3.15

7:00 – 5:00 VTX

Learning: Genetic Algorithms
- Rank space (probability is based on unsorted values??)
- Simulated annealing – reducing step size.
- Diversity rank (from the previous generation) plus fitness rank
Some more timing results. The view test (select count(*) from tn_view_network_items where network_id = 1) for the small network_1 is about the same as the pull for the large network_8, about .75 sec. The pull from the association table without the view is very fast – 0.01 for network_1 and 0.02 for network_8. So this should mean that a 1,000,000 item pull would take 1-2 seconds.

mysql> select count(*) from tn_associations where network_id = 1;
 11 
1 row in set (0.01 sec)

mysql> select count(*) from tn_associations where network_id = 8;
 10000 
1 row in set (0.01 sec)

mysql> select count(*) from tn_view_network_items where network_id = 8;
 10000 
1 row in set (0.88 sec)

mysql> select count(*) from tn_view_network_items where network_id = 1;
 11 
1 row in set (0.71 sec)

Field trip to Wall NJ
- Learned more about the project, started to put faces to names
- Continued to look at DB engines for the derived DB. Discovered WebScaleSQL, which is a collaboration between Alibaba, Facebook, Google, LinkedIn, and Twitter to produce a big(!!) version of MySql.
- More discussions with Aaron D. about control systems, which means I’m going to be leaning on my NIST work again.

viztales

Dimension reduction, State, Orientation, and Speed

Category Archives: Derived DB

Phil 12.8.15

Phil 12.4.15

Phil 12.3.15