Category Archives: mysql

Phil 12.1.15

7:30 – 5:00

  • Learning: Identification Trees, Disorder
    • Trees of tests
    • Identification Tree (Not a decision tree!)
    • Measuring Disorder – lowest disorder is best test
      • Disorder(set of binaries) = -(positive/total*log2(positive/total)) – (neg/total*log2(neg/total))
        • is the base log related to the base of the set?
        • Add up the disorder of each result in the test to determine the disorder of the test normalized by the number of samples. Lowest disorder is winner
  • Bringing in my machine learning, pattern recognition and stats books.
  • Bringing in my big laptop
  • Setting up dev environment.
    • Using the new IDEA 15.x, which seems to be OK for the typescript, will check PHP tomorrow.
    • Installed grunt (grunt-global, then grunt-local from the makefiles)
    • installed typescript (npm i -g typescript)
    • Installed gnuWin32 , which has makefile and touch support, along with all the important DLLs. It turns out that there is also gnuWin64. Will use that next time
    • Fixed bugs that didn’t get caught before. Older compiler?
      • commented out the waa.d.ts from the three.d.ts definitelytyped file
      • deleted the { antialias: boolean; alpha: boolean; } args from the CanvasRenderer call in classes/WebGlCanvasClasses
      • added title?:string and assoc_name?:string to IPostObject in RssController
      • had to add the experiments/wglcharts2 folder to the xampp Apache htdocs
      • added word?:string to IPostObj in RssAppDirectives
      • added word_type_name?:string to IPostObj in RssAppDirectives
      • fixed the font calls in WebGl3dCharts IComponentConfig.
    • Since these issues really shouldn’t have happened, I’m going to verify that they are not in my home dev environment before checking in.
  • And the new computer arrived, so I get to do some of the install tomorrow.

Phil 11.26.15

7:00 – Leave

  • Constraints: Visual Object Recognition
    • to see if to signals match, a maximising function that integrates the area under the signal with respect to offsets (translation and rotation) is very good, even with noise.
  • Dictionary
    • Add ‘Help Choose Doctor’, ‘Help Choose Investments’, ‘Help Choose Healthcare Plan’, ‘Navigate News’ and ‘Help Find CHI Paper’ dictionaries. At this point they can be empty. We’ll talk about them in the paper.
    • Added ‘archive’ to dictionary, because we’ll need temporary dicts associated with users like networks.
    • Deploy new system. Done!
      • Reloaded the DB
      • Copied over the server code
      • Ran the simpleTests() for AlchemyDictText. That adds network[5] with tests against the words that are in my manual resume dictionary. Then network[2] is added with no dictionary.
      • Commented out simpleTests for AlchemyDictText
      • copied over all the new client code
      • Ran the client and verified that all the networks and dictionaries were there as they were supposed to be.
      • Loaded network[2] ‘Using extracted dict’
      • Selected the empty dictionary[2] ‘Phil’s extracted resume dict’
      • Ran Extract from Network, which is faster on Dreamhost! That populated the dictionary.
      • Deleted the entry for ‘3’
      • Ran Attach to Network. Also fast 🙂
  • And now time for ThanksGiving. On a really good note!

AllWorking

Phil 11.25.15

7:00 – 1:00 Leave

  • Constraints: Search, Domain Reduction
    • Order from most constrained to least.
    • For a constrained problem, check over and under allocations to see where the gap between fast failure and fast completion lie.
    • Only recurse through neighbors where domain (choices) have been reduced to 1.
  • Dictionary
    • Add an optional ‘source_text’ field to the tn_dictionaries table so that user added words can be compared to the text. Done. There is the issue that the dictionary could be used against a different corpus, at which point this would be little more than a creation artifact
    • Add a ‘source_count’ to the tn_dictionary_entries table that is shown in the directive. Defaults to zero? Done. Same issue as above, when compared to a new corpus, do we recompute the counts?
    • Wire up Attach Dictionary to Network
      • Working on AlchemyDictReflect that will place keywords in the tn_items table and connect them in the tn_associations table.
      • Had to add a few helper methods in networkDbIo.php to handle the modifying of the network tables, since alchemyNLPbase doesn’t extend baseBdIo. Not the cleanest thing I’ve ever done, but not *horrible*.
      • Done and working! Need to deploy.

Phil 11.24.15

7:00 – Leave

  • Constraints: Interpreting Line Drawings
    • Successful research:
      • Finds a problem
      • Finds a method that solves the problem
      • Using some principal (That can be generalized)
  • Gave Aaron M. A subversion account and sent him a description of the structure of the project
  • Back to dictionary creation
    • Wire up Extract into Dictionary
      • I think I’m going to do most of this on the server. If I do a select text from tn_view_network_items where network = X, then I can run that text that is already in the DB through the term extractor, which should be the fastest thing I can do.
      • The next fastest thing would be to pull the text from the url (if it exists) and add that to the text pull.
      • Added a getTextFromNetwork() method to NetworkDbObject.
      • The html was getting extracted badly, so I had to add a call to alchemy to return the cleaned text. TODO: in the future add a ‘clean_text’ column to tn_items so this is done on ingestion. I also added
      • Added all the pieces to the rssPull.php file and tested. And integrated with the client. Looks like it takes about 8 seconds to go through my resume, so some offline processing will probably be needed for ACM papers, for example.
    • Wire up Attach Dictionary to Network
      • The current setup is set so that a new item that is read in will associate with the current network dictionary. Need to add a way to have the items that are already in the network to check themselves against the new dictionary.
      • Added class AlchemyDictReflect that will place keywords in the DB. Still need to debug. And don’t forget that the controller will have to reload the network after all thechanges are made.

 

Phil 10.30.15

8:00 – 4:00 SR

  • Working from home today, waiting for people to show up.
  • Here’s the fix for the Reqonciler issue:
    • Open Reqonciler in your browser.
    • click Post-Processing button to see all queries
    • double click the one that you disabled this morning to edit, Order 2100, update month 1 year 2 to 100% from month 12 year 1
    • add ” AND NOT ISNULL(bc.uid)” at the end of the query without the double quotes. Make sure there is a space before.
    • Save, run, and check the data
  • In the process of getting my home dev environment working again. I swear I should just do this once a week so it’s less stressful.
    • Fixed the Imagick load so that there is a test for the extension and whether the extension is installed correctly.
    • Disabled the world wide web service so that apache could run on port 80
    • Updates all the files in the Apache htdocs directory. Forgot that I had updated the server access methods to take an object.
    • It occurs to me that I can load up the DB directly on the server if I don’t get everything done with the dictionary by Wednesday.
  • Examine AlchemyNLP and see if there is a hierarchy that can be used. Not without a lot of work.
  • Buy and download the fivefilters term extractor and see how to integrate.
    • Ordered. Waiting for confirmation to show up.
    • Installed. Time to see if it’ll work. It looks good, though possibly slow? starting to put together a dictionary class to examine more deeply.
  • Add dictionary Flyout directive
    • Name the dictionary
    • Choose the networks (add/remove from list) ()
    • Input html, text or url
    • Get the clean text and show the machine extracted terms. We could look up potential definitions too – from wordnik. Set up an account and applied for a developer key.
    • Show a list of selected terms with checkboxes
      • Checked items can be deleted or grouped
      • Items can be added by typing into a field
    • Show a list of ‘group items’.  This displays a list of the items who’s index appears in the ‘parent’ field
      • Selecting an item in this list reorders the item list to show the appropriate group first
  • There should also be a select dictionary option on the network flyout

Phil 10.29.15

8:00 – 4:30 SR

  • Sent Dong screenshots of the issue. He’s checking queries and code now.
  • Added simpleTests($dbObj) to each class in AlchemyNLP
  • Added ‘skill’ ‘capability’ and  ‘task’ as parents in the dictionary
  • Add flyout directive to create and assign dictionaries and entries.
  • Set the dictionary to zero in the networkDbIo.addNetwork()  PHP code and add the dict_id to the typescript interface. Done
  • Make sure that an association between a keyword and another item is always from the keyword. Otherwise PageRank won’t calculate correctly. Done.
  • Chain up the dictionary and add parent keywords to the network (parents point to children). That way, for example, all ‘skills’ can be elevated, while all ‘tasks’ can be suppressed. Done
  • Changed keywords to be ‘editable’ so they have adjustable link weights. It does make the keywords in the network editable as well. May need to just add a slider to ITEMS of certain types. Still need to think about this…
  • Next step is to buy and download the fivefilters term extractor and see how to integrate?

Phil 10.28.15

8:00 – 5:00 SR

  • Walked through the FA bug with Dong on the phone. Took some screenshots that I will send over tonight.
  • Add a DictionaryText class that uses a passed-in tag list to determine what items to create associations to. Low edit-distance matches get added to the item. Possibly the keyword list can be hierarchical?
  • Add a tn_dictionary table with fields for word, type (optional), description (optional), server_code (optional), parent (optional), and user_id. Multiple users can have different versions of the same word. When a new word is entered, the content of the network is rescanned and items that contain the keyword link to it. We will need to know which definition is being used in the network, since it will point to the master item. – Done, except for the last part
    • The server_code field would include scripts/regexes or something similar that could do special text scanning. This would require the use of eval, for example. In the db, but not used.
  • So now, when an external query is made, only items from the result that contain words in the dictionary will be added to the network. Done and working in the DB and PHP!Done and working in the DB and PHP!
  • There should also be a ‘resubmit’ button that looks for new material while running the stored queries. TODO
  • It’s possible to use NLP, particularly five filter’s, to create a strawman dictionary as a starting point. TODO
  • Meeting with Dr. Pan
    • There are different contexts that a keyword dictionary needs to be aware of. Resumes have skills, tasks and achievements. Scientific papers have contributions and methods, financial data has budget centers, companies, clients, invoices, etc.
    • Phrases add specificity, single words can be very noisy.

Phil 10.19.15

8:00 – 12:00 SR

  • Add  the check in the flyout directive that looks at the user_id and read_only coming back. If the user_id is not the current user_id from the session object, force the network to be read_only regardless of its listing in the db
    • Populated network 3 for user_id = 3. Done, and verified on the server.
  • Need to change the QueryHistory list to Visible Ranks.
  • Try running a pdf through alchemy. It returns a ‘non-html’ error. However, I can parse the pdf in PHP and then send the text off to Alchemy for analysis
    • I think though, that the way to do this is to add a ‘manual’ item. This would add the following fields(?):
      • Text_content – cut and paste of the text that matters
      • Authors (comma separated authors – add validation and parsing)
      • The rest of the items could be the RSS2.0 spec, which makes sense anyway.
      • This would require a new button and a new directive. On ‘save, the text_content gets sent to alchemy for the creation of the keyword(?) network. The other items (title, author, etc) get added to the network explicitly.
      • This does mean that when when an item is added to the network, that there are other ‘items’ lie author that should be attached automatically.

Phil 10.16.15

8:00 – 4:00 SR

  • I have my access back!
  • More justification for dev machine
  • Updated truancy reports.
  • Add ‘read only’ to network and use it to disable buttons on the GUI. It should have a checkbox on the flyout next to private.
    • Added. Had to add a read_only field to tn_networks.
    • Next, make the various buttons that cause writes to the DB to be conditional.Done
    • Updated the public server.
    • Changed findNetworks() so $queryString .= “select * from tn_networks where user_id = :user_id or is_private = 0”;. Now I need to verify that I’ve reconciled in the Flyout Directive. Need to make a new network for that. Monday.

Phil 10.14.15

8:00 – 5:00 SR

  • Got an error message after trying to explore an URL (http://www.sportsgrid.com/mlb/back-to-the-future-part-ii-predicted-cubs-as-2015-world-series-champions/):
    • Error: itemTableArray is undefined
      RssController</RssController.prototype.parseDataObject@http://philfeldman.com/WglCharts2/controllers/rssControllerTS.js:195:29
    • Need to flag that so that it shows up as an error message in the feed rather than having the ‘waiting’ message appear
    • Really need to re-weight the Alchemy items. Changed to 0.5 for now. Still A hack
    • Adding behaviors to support the changes I made to the GUI.
      • Unlink (from Wampeter)
        • GUI – done. Lots of oddness with the <select> directive.
          • Fixed the blank first selection by setting the initial values in the guiVars object (I’m using to keep things clean) to null.
          • The ng-model is being set to a JSON string. So I had to get the object pointer from the list used to populate the select. In this case I used the guid. What a hack!
        • DB – Done. Had to add a addDirectedAssociation()
      • Link to wampeter
        • GUI – Done
        • DB – TODO
      • Update wampter value change DB (means updating association weights)– TODO
      • Update Rating value change DB – TODO

Phil 10.13.15

8:00 – 5:00 SR

  • Sent the (hopefully) fixed FA and RA to Bill
  • Think I got my accesses fixed. Again
  • Adding behaviors to support the changes I made to the GUI yesterday.
    • Unlink (from Wampeter)
      • GUI – done. Lots of oddness with the <select> directive.
        • Fixed the blank first selection by setting the initial values in the guiVars object (I’m using to keep things clean) to null.
        • The ng-model is being set to a JSON string. So I had to get the object pointer from the list used to populate the select. In this case I used the guid. What a hack!
      • DB – TODO
    • Link to wampeter
      • GUI – TODO
      • DB – TODO
    • Update wampter DB – TODO
    • Update Rating DB – TODO

Phil 10.7.15

8:00 – 6:00 SR

  • Gina’s still having problems logging on
  • Write up the DB and programming test for Lenny, since the next interview(?) won’t have me along
  • Banged away at PageRank and finally have it working. The matricies were evaporating, so I wound up layering inbound and outbound links over an identity matrix. That worked great. I need to think about why though.
  • Meeting with Dr. Pan and Dr. Lutters at 4:00(?). Went well.
    • One thing that came up in the discussion was how to feed in enough information to make the analytics work. Dr. Pan suggested using Politifact, since it’s well formatted ground truth that references a source. In essence, I could build a small java program that could iterate over all the PF reports and build a network for each of the Entities (or concepts) referenced. Once nice side effect is that this could be a ‘seed set’ of vetted data that could be used to inform other searches.
    • The other item discussed is whether this is what kind of view this is. Dr. Lutters suggested that this might be the analyst’s (development) view, while a consumer’s view would be more like a traditional  news feed, like inkl.com

Phil 10.5.15

  1. 8:00 – 4:00 SR
  • Change the divisor in the PageRank class to be a scalar.
  • Change the ‘History’ list to show the ranked items. Either all, or individually. (tab view)?
  • Add sliders to the rating view in the feed. Will also need a save with associated PHP code
  • Add update rating PHP
  • Add update linkSelected PHP
  • Show the changes in the list of items as the sliders are adjusted.
  • add an ‘annotation’ field to the rating? Better tracking?

Phil 10.2.15

8:00 – 5:00 SR

  • So far today, the roof is leaking, I’ve lost my badge, and the espresso machine is fried. Perfect Friday.
  • Still wondering where to calculate pagerank. Here’s the algorithm in JavaScript: https://github.com/stevemacn/PageRank/blob/master/lib/pagerank.js
  • Looks like there isn’t a TypeScript version. I think I’ll start with the client version. That should allow for more interactivity. Maybe just calculate for all users on the server?
  • Implemented a PageRank algorithm from first principals. First, the versions that are out there are buggy. Second, because they use variable length arrays of links, you can’t put weights. The downside is that there is a performance hit, but on small networks that shouldn’t(?) be a problem.
  • One easy performance mod is to change the normalization process to first calculate the fraction, then create a multiplier of  1/that, so we’re multiplying rather than dividing.

Phil 10.14.15

8:00 – 4:00 SR

  • Updated truancy report
  • Seminar today 11:30 – 12:30
  • Automated culling. I think I’m going to start with the cull on the canvas, rather than the server. This way I can still have the items in the list, but not shown on the network. This implies for an item to be visible (and for physics to operate on it) the item should be the source and/or target of one or more associations. DONE!
    • Tried to be fancy by just using the target array but I need a ‘source’ listing of IComponents.
    • This is what I needed to do, basically. The physics only require checks for visibility:
      var tarray:ITarget[] = this.getTargetArray();
      var sarray:IComponentBase[] = this.getSourceArray();
      var numLinks = tarray.length + sarray.length;
      var makeVisible = false; // or maybe we're also visible when selected?
      if(numLinks > 1){
          makeVisible = true;
      }
      this.setIsVisible(makeVisible);
    • I also added a ‘forceVisibility’ argument to the config object so that items like Queries always show up.
    • And, because it was suddenly easy, I added the list of sources to the item selection code. So now we ‘shift select’ targets and sources, which feels intuitively right:
      tarray = componentBase.getTargetArray();
      for(i = 0; i < tarray.length; ++i){
          targ = tarray[i].target;
          if(!this.isModelSelected(targ)){
              this.selectedModels.push(targ);
              targ.highlight();
          }
      }
      sarray = componentBase.getSourceArray();
      for(i = 0; i < sarray.length; ++i){
          targ = sarray[i];
          if(!this.isModelSelected(targ)){
              this.selectedModels.push(targ);
              targ.highlight();
          }
      }
  • Rating. Need a new modal directive. There should also be a way of having additional information about an item, like an annotation. For example, in the query I’m currently working on, “Horse and Jockey”, There was a horse “Tirpitz”, which is also the name of a battleship. There should be a way to resolve this ambiguity.
  • Pagerank. “Links” are determined by shared associations.  TODO
  • Explicit associations TODO
  • Today’s Progress: Weeds