Category Archives: thesis

Phil 1.14.16

7:00 – 4:00 VTX

  • Good Meeting with Thom Lieb
    • Here’s a good checklist for reporting on different types of stories: http://www.sbcc.edu/journalism/manual/checklist/index.php
    • Ordered Melvin Mencher’s News Reporting and Writing
    • Discussed Chatbots, fashion in technology, and NewsTrust, a fact-checking site that piloted out of Baltimore in 2011. This post explains why it wound up folding. Important note: Tie into social media for inputs and outputs!!!
  • Added Communication Power and Counter-power in the Network Society to the corpus
  • Manuel Castells is the author of the above. Really clear thinking. Added another paper, The Future of Journalism: Networked Journalism
  • Had an interesting chat with an ex-cop about trustworthiness. He’s a fan of the Reid Technique and had a bunch of perspectives that I hadn’t considered. Looking for applications to text, I came across this, which looks potentially relevant: Eliciting Information and Detecting Lies in Intelligence Interviewing: An Overview Of Recent Research
  • Todd Schneider analyzes big data in interesting posts on his blog.
  • Chapter 7 Using Queries
    • JPQL
    • Totally digging the @NamedQuery annotation.
    • How to paginate a result:
      int pageSize = 15;
      int maxPages = 10;
      for(int curPage = 0; curPage < maxPages; ++curPage){
          List l = nt.runRawPagedQuery(GuidBase.class, curPage, pageSize, "SELECT gb.id, gb.name, gb.guid FROM guid_base gb");
          if(l == null || l.size() == 0){
              break;
          }else{
              System.out.println("Batch ["+curPage+"]");
              nt.printListContents(l);
          }
          System.out.println();
      }
    • Stopping at Queries and Uncommitted Changes, in case my computer is rebooted under me tonight.

Phil 1.13.16

7:00 – 3:00 VTX

  • More document coding
  • Review today?
  • On to Chapter 6.
  • Thinking about next steps.
    • Server
      • Produce a dictionary from a combination of manual entry and corpus extraction
      • Add word-specific code like stemming, edit distance
      • Look into synonyms. They are dictionary specific (Java as in drink, Java as in Language, Java as in island)
      • Analyze documents using the dictionary to produce the master network of items and associations. This resides on the server.  I think this negates the need for flags, since the Eigenrank of the doctor will be explained by the associations, and the network can be interrogated by looking for explanatory items within some number of hops. The dictionary entry that was used to extract that item is also added to the network as an item
        • PractitionerDictionary finds medical practitioners <membership roles?>. Providers are added to the item table and to the master network
          • Each practitioner is checked for associations like office, hospital, specialty. New items are created as needed and associations are created
        • LegalDictionary finds (disputes and findings?) in legal proceedings, and adds legal items that are associated with items currently in the network. Items that are associated with GUILTY get low (negative?) weight. A directly attributable malpractice conviction should be a marker that is always relevant. Maybe a reference to it is part of the practitioner record directly?
        • SocialDictionary finds rating items from places like Yelp. High ratings provide higher weight, low ratings produce lower weight. The weight of a rating shouldn’t be more important than a conviction, but a lot of ratings should have a cumulative effect.
        • Other dictionaries? Healthcare providers? Diseases? Medical Schools?
        • Link age. Should link weight move back to the default state as a function of time?
        • Matrix calculation. I think we calculate the rank of all items and their adjacency once per day. Queries are run against the matrix
      • Client
        • Corporate
          • The user is presented with an dashboard ordered by pre-specified criteria (“show new bad practitioners?”). This is calculated by the server looking through the eigenrank starting at the top looking for N items that contain text/tags that match the query (high Jacquard index?). It returns the set to eliminate duplication. The dictionary entries that were associated with the creation of the item are also returned.
        • Consumer
          • The user types in a search: “cancer specialist maryland carefirst”
          • The search looks through the eigenrank starting at the top looking for N items that contain text/tags that match the query (high Jacquard index?). It returns the set to eliminate duplication. The dictionary entries that were associated with the creation of the item are also returned.
        • Common
          • In the browser, the section(s) of the network are reproduced, and the words associated with the items are displayed beside search results, along with sliders that adjust their weights on the local browser network. If the user increases the slider items associated with that entry rise (as does the entry in the list?). This allows the user to reorder their results based on interactive refinement of their preferences.
          • When the user clicks on a result, the position of the clicked item, the positions of the other items, and the settings of the entry sliders is recorded on the server (with the user info?). These weights can be fed back into the master network so that the generalized user preferences are reflected. If we just want to adjust things to the particular user, the Eigenrank will have to be recalculated on a per user basis. I think this does not have to include a full network recalculation.

Phil 1.12.16

7:00 – 4:00 VTX

  • So I ask myself, is there some kind of public repository of crawled data? Why, of course there is! Common Crawl. So there is a way of getting the deep link structure for a given site without crawling it. That could give me the ability to determine how ‘bubbly’ a site is. I’m thinking there may be a ratio of bidirectional to unidirectional links (per site?) that could help here.
  • More lit review and integration.
  • Making diagrams for the Sprint review today
    • Overview
      • The purpose of this effort is to provide a capability for the system to do more sophisticated queries that do several things
        • Allow the user to emphasize/de-emphasize words or phrases that relate to the particular search and to do this interactively based on linguistic analysis of the returned text.
        • Get user value judgments on the information provided based on the link results reordering
        • Use this to feed back to the selection criteria for provider Flags.
      • This work leans on the paper PageRank without Hyperlinks if you want more background/depth.
    • Eiphcone 129 – Design database table schema.
      • Took my existing MySql db schema and migrated it to Java Persistent Entities. Basically this meant taking a db that was designed for precompiled query access and retrieval (direct data access for adding data, views for retrieval) and restructuring it. So we go from: beforeTables
      • to
      • afterTables
      • The classes are annotated POJOs in a simple hierarchy. The classes that have ‘Base’ in their names I expect to be extended, though there may be enough capability here. GuidBase has some additional capability to make adding data to one class that has a data relation to another class gets filled out properly in both: JavaClassHierarchySince multiple dictionary entries can be present in multiple corpora BaseDictionaryEntry and Corpus both have a <Set> of BaseEntryContext that connects the corpora and entries with additional information that might be useful, such as counts.
      • This manifests itself in the database as the following: ER DiagramIt’s not the prettiest drawing, but I can’t get IntelliJ to draw any better. You can see that the tables match directly to the classes. I used the InheritanceType.JOINED strategy since Jeremy was concerned about wasted space in the tables.
      • The next steps will be to start to create test cases that allow for tuning and testing of this setup at different data scales.
    • Eiphcone 132 – Document current progress on relationship/taxonomy design & existing threat model
      • Currently, a threat is extracted by comparing a set of known entities to surrounding text for keywords. In the model shown above, practitioners would exist in a network that includes items like the practice, attending hospitals, legal representation, etc. Because of this relationship, flags could be extended to the other members of the network. If a near neighbor in this network has a Flag attached, it will weight the surrounding edges and influence the practitioner. So if one doctor in a practice is convicted of malpractice, then other doctors in the practice will get lower scores.
      • The dictionary and corpus can interact as their own network to determine the amount of wight that is given to a particular score. For example, words in a dictionary that are used to extract data from a legal corpus may have more weight than a social media corpus.
    • Eiphcone 134 – Design/document NER processing in relation to future taxonomy
      • I compiled and ran the NER codebase and also walked though the Stanford NLP documentation. The current NER system looks to be somewhat basic, but solid and usable. Using it to populate the dictionaries and annotating the corpus appears to be straightforward addition of the capabilities already present in the Stanford API.
    • Demo – I don’t really have a demo, unless people want to see some tests compile and run. To save the time, I have this exiting printout that shows the return of dynamically created data:
[EL Info]: 2016-01-12 14:09:40.481--ServerSession(1842102517)--EclipseLink, version: Eclipse Persistence Services - 2.6.1.v20150916-55dc7c3
[EL Info]: connection: 2016-01-12 14:09:40.825--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService login successful

Users
firstName(firstname_0), lastName(lastname_0), login(login_0), networks( network_0)
firstName(firstname_1), lastName(lastname_1), login(login_1), networks( network_4)
firstName(firstname_2), lastName(lastname_2), login(login_2), networks( network_3)
firstName(firstname_3), lastName(lastname_3), login(login_3), networks( network_1 network_2)
firstName(firstname_4), lastName(lastname_4), login(login_4), networks()

Networks
name(network_0), owner(login_0), type(WAMPETER), archived(false), public(false), editable(true)
	[92]: name(DataNode_6_to_BaseNode_8), guid(network_0_DataNode_6_to_BaseNode_8), weight(0.5708945393562317), type(IDENTITY), network(network_0)
		Source: [86]: name('DataNode_6'), type(ENTITIES), annotation('annotation_6'), guid('50836752-221a-4095-b059-2055230d59db'), double(18.84955592153876), int(6), text('text_6')
		Target: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
	[91]: name(BaseNode_5_to_UrlNode_4), guid(network_0_BaseNode_5_to_UrlNode_4), weight(0.3703539967536926), type(COMPUTED), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[98]: name(BaseNode_5_to_UrlNode_1), guid(network_0_BaseNode_5_to_UrlNode_1), weight(0.4556456208229065), type(ENTITIES), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_1), owner(login_3), type(WAMPETER), archived(false), public(false), editable(true)
	[96]: name(BaseNode_2_to_UrlNode_1), guid(network_1_BaseNode_2_to_UrlNode_1), weight(0.5733484625816345), type(URL), network(network_1)
		Source: [82]: name('BaseNode_2'), type(ITEM), annotation('annotation_2'), guid('c5867557-2ac3-4337-be34-da9da0c7e25d')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[95]: name(DataNode_0_to_UrlNode_7), guid(network_1_DataNode_0_to_UrlNode_7), weight(0.85154128074646), type(MERGE), network(network_1)
		Source: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[94]: name(DataNode_9_to_BaseNode_5), guid(network_1_DataNode_9_to_BaseNode_5), weight(0.72845458984375), type(KEYWORDS), network(network_1)
		Source: [89]: name('DataNode_9'), type(USER), annotation('annotation_9'), guid('5bdb67de-5319-42db-916e-c4050dc682dd'), double(28.274333882308138), int(9), text('text_9')
		Target: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')

name(network_2), owner(login_3), type(EXPLICIT), archived(false), public(false), editable(true)
	[90]: name(BaseNode_8_to_UrlNode_7), guid(network_2_BaseNode_8_to_UrlNode_7), weight(0.2619180679321289), type(WAMPETER), network(network_2)
		Source: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_3), owner(login_2), type(EXPLICIT), archived(false), public(false), editable(true)
	[93]: name(UrlNode_4_to_DataNode_3), guid(network_3_UrlNode_4_to_DataNode_3), weight(0.7689594030380249), type(ITEM), network(network_3)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
	[97]: name(DataNode_3_to_DataNode_0), guid(network_3_DataNode_3_to_DataNode_0), weight(0.5808262825012207), type(URL), network(network_3)
		Source: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
		Target: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')

name(network_4), owner(login_1), type(ITEM), archived(false), public(false), editable(true)
	[99]: name(UrlNode_4_to_UrlNode_7), guid(network_4_UrlNode_4_to_UrlNode_7), weight(0.48601675033569336), type(WAMPETER), network(network_4)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')


Dictionaries
[30]: name(dictionary_0), guid(943ea8b6-6def-48ea-8b0f-a4e52e53954f), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_11
	Parent = word_10
	word_11 has 790 occurances in corpora0_chapter_1

	Entry = word_14
	word_14 has 4459 occurances in corpora1_chapter_2

	Entry = word_1
	Parent = word_0
	word_1 has 3490 occurances in corpora1_chapter_2

	Entry = word_10
	word_10 has 3009 occurances in corpora3_chapter_4

	Entry = word_4
	word_4 has 2681 occurances in corpora3_chapter_4

	Entry = word_5
	Parent = word_4
	word_5 has 5877 occurances in corpora1_chapter_2


[31]: name(dictionary_1), guid(c7b62a4b-b21a-4ebe-a939-0a71a891a3f9), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_3
	Parent = word_2
	word_3 has 4220 occurances in corpora0_chapter_1

	Entry = word_6
	word_6 has 4852 occurances in corpora2_chapter_3

	Entry = word_17
	Parent = word_16
	word_17 has 8394 occurances in corpora2_chapter_3

	Entry = word_2
	word_2 has 1218 occurances in corpora3_chapter_4

	Entry = word_19
	Parent = word_18
	word_19 has 8921 occurances in corpora2_chapter_3

	Entry = word_8
	word_8 has 4399 occurances in corpora3_chapter_4



Corpora
[27]: name(corpora1_chapter_2), guid(08803d93-deeb-4699-bdb2-ffa9f635c373), totalWords(1801), importer(login_1), url(http://americanliterature.com/author/herman-melville/book/moby-dick-or-the-whale/chapter-2-the-carpet-bag)
	word_15 has 5338 occurances in corpora1_chapter_2
	word_13 has 2181 occurances in corpora1_chapter_2
	word_14 has 4459 occurances in corpora1_chapter_2
	word_1 has 3490 occurances in corpora1_chapter_2
	word_5 has 5877 occurances in corpora1_chapter_2
	word_16 has 2625 occurances in corpora1_chapter_2

[EL Info]: connection: 2016-01-12 14:09:41.116--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService logout successful
  • Sprint review delayed. Tomorrow
  • Filling in some knowledge holes in JPA. Finished Chapter 4.
  • Tried getting enumerated types to work. No luck…?

Phil 1.11.16

7:00 – 3:00 VTX

  • Good bye David Bowie. I was hoping to see you on tour this year.
  • Working my way through papers, building a corpus and a taxonomy
  • The last sprint task is to “Design/document NER processing in relation to future taxonomy”. I think that’s the dictionary/corpus integration, but I need to check with Aaron, since he wrote it…
  • Beware is software that scans public records for risks that police face when engaging with the public. Threats and dashboards. From WaPo.
  • Added in the BaseEntryContext to replace the Join table between Corpus and BaseDictionaryEntry. It’s nice actually, I’d rather have a join table that actually does something. it’s based on this stackoverflow post

Phil 1.7.16

7:00 – 4:00 VTX

  • Adding more codes in Atlas.
  • Found a good stemming algorithm/implementation, including java
  • Discovered the Lemur ProjectThe Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset. Our software and datasets are used widely in scientific and research applications, as well as in some commercial applications.
  • Also, discovered that TREC also has sets of queries that they use. Here’s an example
  • Ok. Pro JPA, Chapter 2.
    • Got the example code from here
    • How to drop and create the DB from the schema: http://antoniogoncalves.org/2014/12/11/generating-database-schemas-with-jpa-2-1/
    • STILL having problems finding the provider! – Exception in thread “main” javax.persistence.PersistenceException: No Persistence provider for EntityManager
    • Finally found this on stackoverflow
      • Go to Project Structure.
      • Select your module.
      • Find the folder in the tree on the right and select it.
      • Click the Sources button above that tree (with the blue folder) to make that folder a sources folder.
    • And that worked. Here’s the ‘after’ screenshot: AddToIntelliJPath

Phil 1.4.16

7:00 – 2:30 VTX

  • Got my Copy of AtlasTi. Going to try using it to organize my papers/thoughts for the proposal. Imported a bunch of papers. Next, I’m going to re-do my annotations of the Gezi paper into Atlas and then see if I can start to cross-correlate, code and so forth. After that’ we’ll try some fancy things like getting eigenvectors out of taxonomies.
  • Realized that I should be able to automate Hibernate criteria so that a query like
    • Criteria criteria = drilldown(session, Showroom.customers, LIKE, ‘Aaron’) should be possible.
  • But before that, I’m going to try out spring JPA and Intellij spring / springboot integration.
  • Replicated the hibernate sandbox (SpringHibernate1) using spring. not really sure what it gave me yet.
  • Adding in JPA support in the IDE
  • Still some missing jars. Since I can’s think of any other way to do it, grabbing the jars as needed from Maven.
  • Ok, I think I got everything in, but it blows up:
    [2016-01-04 11:18:13.409] - 3116 INFO [main] --- com.philfeldman.mains.SpringJPATest: Starting SpringJPATest on PFELDMAN-NCS with PID 3116 (C:\Development\Sandboxes\SpringHibernate1\out\production\SpringHibernate1 started by philip.feldman in C:\Development\Sandboxes\SpringHibernate1)
    [2016-01-04 11:18:13.428] - 3116 INFO [main] --- com.philfeldman.mains.SpringJPATest: No active profile set, falling back to default profiles: default
    [2016-01-04 11:18:13.476] - 3116 INFO [main] --- org.springframework.context.annotation.AnnotationConfigApplicationContext: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6321e813: startup date [Mon Jan 04 11:18:13 EST 2016]; root of context hierarchy
    [2016-01-04 11:18:14.504] - 3116 INFO [main] --- org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
    [2016-01-04 11:18:14.577] - 3116 WARNING [main] --- org.springframework.context.annotation.AnnotationConfigApplicationContext: Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private javax.sql.DataSource org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration.dataSource; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [javax.sql.DataSource] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
    [2016-01-04 11:18:14.588] - 3116 SEVERE [main] --- org.springframework.boot.SpringApplication: Application startup failed
    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private javax.sql.DataSource org.springframework.boot.autoconfigure.orm.jpa.JpaBaseConfiguration.dataSource; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [javax.sql.DataSource] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
  • Taking a break on the Spring JPA to add in the ability to drill down to a class element with hibernate. This really isn’t provided somewhere?
    /**
     * For some reason, hibernate can't create a nested alias. This loops over the path to create one.
     * @param rootClass - The root class that we are going to query
     * @param leafNodeName - the path to the node we wan't to restrict on (e.g. "Foo.bar.baz").
     * @return - A Criteria if successful, null if not.
     */
    public Criteria drillDownAlias(Class rootClass, String leafNodeName){
        String className = rootClass.getSimpleName();
        System.out.println("Class name = "+className);
    
        String[] nodeNames = leafNodeName.split("\\.");
    
        if(nodeNames.length < 1){
            return null;
        }
        Criteria criteria = session.createCriteria(rootClass, nodeNames[0]);
    
        // TODO: add some testing that verifies the path is valid
        for(int i = 1; i < nodeNames.length; ++i){
            String prevNode = nodeNames[i-1];
            String curNode = nodeNames[i];
            criteria.createAlias(prevNode+"."+curNode, curNode);
        }
    
        return criteria;
    }

Phil 12.31.15

Phil 7:00 – 4:00 VTX

  • Decided to get a copy (hopefully with student discount) of Atlas. It does taxonomic analysis and outputs a matrix to Excel that I should be able to use to produce PageRank???
  • Timesheets! Done
  • Seeing if I can add a bit of reflection to addXXX(YYY) that will invoke add/setYYY(XXX). Since the target is a map, it shouldn’t care, but I do need to worry about recursion…
  • Added addSourceToTarget() and testSourceInTarget() to GuidBase. So Now addM2M looks like
    public void addM2M(GuidBase obj) throws Exception {
        System.out.println("person.addM2M");
        addSourceToTarget(this, obj);
        addSourceToTarget(obj, this);
    }

    and the example of Showroom.addCar() looks like

    public void addCar(Car car){
        if(cars == null){
            cars = new HashSet<>();
        }
        cars.add(car);
    
        try {
            if(!testSourceInTarget(this, car)){
                addSourceToTarget(this, car);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    Which now means that the two way mapping is automatic. And in case you’re wondering, testSourceInTarget looks first for a method that returns the source type, and then looks for a method that returns a Set<source type>. If it find the source in either one of those, it returns true.

  • Got queries running. Simple queries are easy, but the more complex ones can be pretty ugly. Here’s an example that pulls a Showroom Object based on a nested Person Object (instanced as ‘customer’):
    // do a query based on a nested item's value. Kinda yucky...
    Criteria criteria = session.createCriteria(Showroom.class, "sh");
    criteria.createAlias("sh.customers", "c");
    List result = criteria.add(Restrictions.like("c.name", "%Aaron%")).list();
    for(Object o : result){
        System.out.println(o.toString());
    }

Phil 12.29.15

7:00 – 5:30 VTX

  • Finished Social Media and Trust during the Gezi Protests in Turkey.
  • More Hibernate
    • After stepping through the ‘Showroom’ example on pg 54 (Using a Join Table), of Just Hibernate I think I see my problem. In my case, the corpus has already been created and exists as an entry in the corpora table. I need to add a relationship when a word is run against a new corpus. Which come to think about it, should have the word count in it. Maybe?
    • Ok, I don’t like the way that ManyToMany is implemented. Hibernate should be smart enough to figure out how to make a default mapping table. Sigh.
    • And you have to call each object with the other object or it doesn’t load properly.
    • After being annoyed for a while, I decided to try reflection as a making fewer calls. The following still needs the get/set member element calls, but I like the direction it’s going. Will work on it some more tomorrow (need to add the call on the this class):
      public void addM2M(Object obj){
          System.out.println("person.addM2M");
          Class thisClass = this.getClass();
          String thisName = thisClass.getName();
          Class thatClass = obj.getClass();
          Method[] thatMethods = thatClass.getMethods();
          Method thatMethod = null;
          for(int i = 0; i < thatMethods.length; ++i){
              Method m = thatMethods[i];
              Type[] types = m.getGenericParameterTypes();
              if (types.length == 1) { // looking for one arg setters
                  for (int j = 0; j < types.length; ++j) {
                      Type t = types[j];
                      if ((t.getTypeName() == thisName)) {
                          thatMethod = m;
                          break;
                      }
                  }
              }
              if(thatMethod != null){
                  break;
              }
          }
          if(thatMethod != null){
              try {
                  thatMethod.setAccessible(true);
                  thatMethod.invoke(obj, this);
              } catch (IllegalAccessException e) {
                  e.printStackTrace();
              } catch (InvocationTargetException e) {
                  System.out.println("addM2M failed: "+e.getCause().getMessage());
                  e.printStackTrace();
              }
          }
      }

Phil 12.28.15

7:00 – 5:00 VTX

  • Oliver, J. Eric, and Thomas J. Wood. “Conspiracy theories and the paranoid style (s) of mass opinion.” American Journal of Political Science 58.4 (2014): 952-966., and the Google Scholar page of papers that cite this. Looking for insight as to the items that make (a type of?) person believe false information.
  • This follows up on an On the Media show called To Your Health, that had two interesting stories: An interview with John Bohannon, who published the intentionally bad study on chocolate, and an interview with Taryn Harper Wright, a blogger who chases down cases of Munchausen by Internet, and says that excessive drama is a strong indicator of this kind of hoax.
  • Reading Social Media and Trust during the Gezi Protests in Turkey.
    • Qualitative study that proposes Social Trust and System Trust
      • Social Trust
      • System Trust
  • Hibernating Moderately
    • Working on the dictionary
    • Working on the Corpus
      • Name
      • Date created
      • Source URL
      • Raw Content
      • Cleaned Content
      • Importer
      • Word count
      • guid
    • I think I’ll need a table that has the word id that points to a corpus and gives the count of that word in that corpus. The table gets updated whenever a dictionary is run against a corpus. Since words are not shared between dictionaries (Java != Java), getting the corpus to dictionary relationship is straightforward if needed.
    • Created a GuidBase that handles the name, id, and guid code that’s shared across most of the items.
    • Discovered Jsoup, which has some nice (fast?) html parsing.
    • Finished most of Corpus. Need to add a join to users. Done
    • Added BaseDictionary.
    • Added BaseDictionaryEntry.
    • Working on getting a join table working that maps words to corpora and am getting a “WARN: SQL Error: 1062, SQLState: 23000”. I was thinking that I could create a new HashMap, but I think I may have to point to the list in a different way. Here’s the example from JustHibernate:
              session.beginTransaction();
              Showroom showroom = new Showroom();
              showroom.setLocation("East Croydon, Greater London");
              showroom.setManager("Barry Larry");
              Set cars = new HashSet();
              
              cars.add(new Car("Toyota", "Racing Green"));
              cars.add(new Car("Nissan", "White"));
              cars.add(new Car("BMW", "Black"));
              cars.add(new Car("Mercedes", "Silver"));
      
              showroom.setCars(cars);
              
              session.save(showroom);
              
              session.getTransaction().commit();
    • Where the Showroom class has the Cars Set annotation as follows:
       @OneToMany
          @JoinTable
          (name="SHOWROOM_CAR_SET_ANN_JOINTABLE",
           joinColumns = @JoinColumn(name="SHOWROOM_ID")
           )
          @Cascade(CascadeType.ALL)
          private Set cars = null;
      
    • Anyway, more tomorrow…
    • Start on queries that:
      • List networks for users
      • List dictionaries for users
      • List Corpora

Phil 12.15.15

7:00 – 3:30 VTX

  • Representations: Classes, Trajectories, Transitions
    • Inner language, the language with which we think
    • Semantic nets
      • parasitic semantics – where we project knowing to the machine. We contain the meaning, not the machine.
    • Combinators = edge
    • Reification – linking links?
    • Sequence
    • Minsky – Frames or templates add a localization layer.
    • Classification
    • Transition
      • Vocabulary of change, not state
      • (!)Increase, (!)decrease, (!)change, (!)appear, (!)disappear
    • Trajectory
      • Objects moving along trajectories
      • Trajectory frame (prepositions help refine – by, with, from, for, etc)
        • Starts at a source
        • Arranged by agent, possibly with collaborator
        • assisted by instrument
        • can have a conveyance
        • Arrives at destination
        • Beneficiary
      • Wall Street Journal Corpus
        • 25% transitions or trajectories.
      • Pat comforted Chris
        • Role Frame
          • Agent: Pat
          • Action: ??
          • Object: Chris
          • Result: Transition Frame
            • Object: Chris
            • Mood: Improved (increased)
    • Story Libraries
      • Event Frames – adds time and place
        • Disaster -adds fatalities, cost
          • Earthquake – adds name, category
          • Hurricane – – adds magnitude, fault
        • Party
          • Birthday
          • Wedding – adds bride and groom
  • Scrum
  • Working on downloading and running the NLP code
    • Downloaded Java EE 7u2
    • Downloaded Gradle 2.9
    • Installed and compiled. Took 41 minutes!
    • Working on running it now, which looks like I need Tomcat. To run Tomcat on port 80, I had to finally chase down what was blocking port 80. I finally found it by running NET stop HTTP, (from here) which gave me a list that I could check against the services. I monitored this with Xampp’s nifty Netstat tool. The offending process was BranchCache, which I disabled. Now we’ll see what that breaks…
    • Tomcat up and running
    • NLPService blew up. More secret knowledge:
      Local RabbitMQ Setup
      
      Install Erlang 
      
      # http://www.erlang.org/download/otp_win64_17.5.exe
      
      # Set *ERLANG_HOME* in system variables. (e.g. C:\Program Files\erl6.4)
      
      Install RabbitMQ 
      
      # http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.3/rabbitmq-server-3.5.3.exe
      
      #* If you get Windows Security Alert(s) for *epmd.exe* and/or *erl.exe*, check "Domain networks..." and uncheck "Private networks" and "Public networks"
      
      # Open the command prompt as *administrator*
      
      # Go to C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.5.3\sbin.
      
      # Run the following commands:             
      
      rabbitmq-plugins.bat enable rabbitmq_web_stomp rabbitmq_stomp rabbitmq_management
      
      rabbitmq-service.bat stop                                                        
      rabbitmq-service.bat install                                                     
      rabbitmq-service.bat start                                                      
      
      RabbitMQ Admin Console
      http://localhost:15672/mgmt
      
      guest/guest
    • Installed Erlang and RabbitMQ. We’ll try running tomorrow.

Phil 12.14.15

7:00 – 3:30 VTX

  • Learning: Boosting
    • Binary classifications
    • Weak Classifier = one that is barely better than chance.
    • Adaboost for credibility analysis? Politifact is the test. Speakers, subjects, etc are classifiers. What mix of classifiers produces the most accurate news? Something like this (check citations in the paper)
    • Which means that we can keep track of those items that are always moved to the top of the pertinence list and score them as true(?). This means that we can then use that result to weight the sources that appear to be credible so that they in turn become more relevant (we can also look at the taxonomy terms that get maximized and minimized) the next query.
  • Discussion with Jeremy about the RDB schemas
  • Scrum – really short
  • RDB design meeting. Lots of discussion about data sources but nothing clear. Jeremy didn’t like the unoptimized storage of the general model
  • Followon discussions with Jeremy. I showed him how unions can fix his concerns. He adjusted the schema, but I can’t get on the VPN at home for some reason. Will see tomorrow.

Phil 12.11.15

8:00 – 5:00 VTX

  • No AI course this morning, had to drop off the car.
  • Some preliminary discussions about sprint planning with Aaron yesterday. Aside from the getting the two ‘Derived’ database structures reconciled, I need to think about a few things:
    • who the network ‘users’ are. I think it could be VTX, or the system customers, like Aetna.
    • What kinds of networks exist?
      • Each individual doctor is a network of doctors, keywords, entities, sources, threats and ratings. That can certainly run on the browser
      • Then there is the larger network of ‘relevant’ doctors. That’s a larger network, certainly in the 10s – 100s range. On the lower end of the scale that could be done directly in the browser. For larger networks, we might have to use the GPU? Which seems very doable, via Steve Sanderson.
      • Then there is the master ranking, which should be something like most threatening to least threatening, probably. Queries with additional parameters pull a subset of the ordered data (SELECT foo, bar from ?? ORDER BY eigenvalue). Interestingly, according to this IEEE article from 2010, GPU processing  was handling 10 million nodes in about 30 seconds using optimized sparse matrix (SpMV) calculations. So it’s conceivable that calculations could be done in real time.
  • More documentation
  • More discussions wit Aaron about where data lives and how it’s structured.
  • Sprint planning

Phil 12.10.15

7:00 – 3:30 VTX

  • Sandy Spring Bank!
  • Honda!
  • Learning: Support Vector Machines
    • More sophisticated decision bounding, with fewer ad hoc choices than GAs and NNs
    • A positive sample must have a dot product with the ‘nomal vector’ that is >= 1.0. Similarly, a negative sample mus be <= -1.0.
    • Gotta minimize with constraints: Lagrange Multipliers from Multivariable Calculus
    • Guaranteed no local maxima
  • System Description (putting it up here)

Phil 12.9.15

7:00 – VTX

  • Learning: Near Misses, Felicity Conditions
    • One shot learning
    • Describing the difference between the desired goal/object and near misses. Model is decorated with information is important.
      • Relations are in imperative form (must not touch, must support, etc.)
    • Pick a seed
    • Apply your heuristics until all the positives are included
    • Then use negatives to throw away unneeded heuristics
    • Use a beam search
    • Near misses lead to specialization, compare to general models lead to generalization (look for close items using low disorder measures for near misses and high for examples?)
    • Model Heuristics (
      An application of variable-valued logic to inductive learning of plant disease diagnostic rules)

      • Require Link (Specialization step)
      • Forbid Link (Specialization step)
      • Extend Set (Generalization step)
      • Drop Link (Generalization step)
      • Climb Tree (Generalization step)
    • Packaging ideas
      • Symbol associated with the work – a visual handle
      • Slogan – a verbal handle (‘Near Miss’ learning)
      • Surprise – Machine can learn something definate from a single example
      • Salient – something that sticks out (One shot learning via near misses)
      • Story
  • More dev machine setup
    • Added typescript-install to the makefile tasks, since I keep on forgetting about it.
    • Compiled and ran WebGlNeworkCSS. Now I need to set up the database.
    • Got that in, but had a problem with the new db having problems with the text type of PASSWORD(). I had to add COLLATE to the where clause as follows:
      "UPDATE tn_users set password = PASSWORD(:newPassword) where password = PASSWORD(:oldPassword) COLLATE utf8_unicode_ci and login = :login"
    • last error is that the temp network isn’t being set in the dropdown for available networks. Fixed. It turned out to be related to the new typescript compiler catching some interface errors that the old version didn’t.
  • Ok, I think it’s time to start writing up what the current system is and how it works.

Phil 12.8.15

7:00 – 4:30 VTX