Category Archives: Java

Phil 1.21.16

7:00 – 4:00 VTX

  • Inverse Laplace examples
  • Dirac delta function
  • Useful link of the day: Firefox user agent strings
  • Design Overview presentation.
  • Working on (simple!) name disambiguation
    • Building word chains of sequential tokens that are entities (PERSON and ORGANIZATION) Done
    • Given a name, split by spaces and get best match on last name, then look ahead one or two words for best match on first name. If both sets are triples, then check the middle. Wound up iterating over all the elements looking for the best match. This does let things like reverse order work. Not sure if it’s best
    • Checks need to look for initials for first and middle name in source and target. Still working on this one.
    • Results (lower is better):
      ------------------------------
      Robert S. Goodwin
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: L. Crist score = 6.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: G score = 2.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Ajay Kumar score = 9.0
      PERSON: Charles L. Crist score = 13.0
      PERSON: Mark V. Sivieri score = 10.0
      PERSON: B. Brown Schoenfeld score = 13.0
      PERSON: L. Connor score = 6.0
      PERSON: S. Crosse score = 6.0
      
      ------------------------------
      Goodwin Robert S.
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: L. Crist score = 6.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: G score = 2.0
      PERSON: Robert S. Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Goodwin score = 0.0
      PERSON: Ajay Kumar score = 9.0
      PERSON: Charles L. Crist score = 13.0
      PERSON: Mark V. Sivieri score = 10.0
      PERSON: B. Brown Schoenfeld score = 13.0
      PERSON: L. Connor score = 6.0
      PERSON: S. Crosse score = 6.0

Phil 1.20.16

7:00 – 5:30 VTX

 

Phil 1.15.16

7:00 – 4:00 VTX

  • Finished Communication Power and Counter-power in the Network Society
  • Started The Future of Journalism: Networked Journalism
  • Here’s a good example of a page with a lot of outbound links, videos and linked images. It’s about the Tunisia uprising before it got real traction. So can we now vet it as a trustworthy source? Is this a good pattern? The post is by Ethan Zuckerman. He directs the Center for Civic Media at MIT, among other things.
  • Public Insight Network: “Every day, sources in the Public Insight Network add contextdepthhumanity and relevance to news stories at trusted newsrooms around the country.”
  • Hey, my computer wasn’t restarted last night. Picking up JPA at Queries and Uncommitted Changes.
  • Updating all the nodes as objects:
    //@NamedQuery(name = "BaseNode.getAll", query = "SELECT bn FROM base_nodes bn")
    TypedQuery<BaseNode> getNodes = em.createNamedQuery("BaseNode.getAll", BaseNode.class);
    List<BaseNode> nodeList = getNodes.getResultList();
    Date date = new Date();
    em.getTransaction().begin();
    for(BaseNode bn : nodeList){
        bn.setLastAccessedOn(date);
        bn.setAccessCount(bn.getAccessCount()+1);
        em.persist(bn);
    }
    em.getTransaction().commit();
  • Updating all nodes with a JPQL call:
    //@NamedQuery(name = "BaseNode.touchAll", query = "UPDATE base_nodes bn set bn.accessCount = (bn.accessCount+1), bn.lastAccessedOn = :lastAccessed")
    em.getTransaction().begin();
    TypedQuery<BaseNode> touchAllQuery = em.createNamedQuery("BaseNode.touchAll", BaseNode.class);
    touchAllQuery.setParameter("lastAccessed", new Date());
    touchAllQuery.executeUpdate();
    em.getTransaction().commit();
  • And we can even add in query logic. This updates the accessed date and increments the accessed count if it’s not null:
    @NamedQuery(name = "BaseNode.touchAll", query = "UPDATE base_nodes bn " +
            "set bn.accessCount = (bn.accessCount+1), " +
            "bn.lastAccessedOn = :lastAccessed " +
            "where NOT (bn.accessCount IS NULL )")

Phil 1.13.16

7:00 – 3:00 VTX

  • More document coding
  • Review today?
  • On to Chapter 6.
  • Thinking about next steps.
    • Server
      • Produce a dictionary from a combination of manual entry and corpus extraction
      • Add word-specific code like stemming, edit distance
      • Look into synonyms. They are dictionary specific (Java as in drink, Java as in Language, Java as in island)
      • Analyze documents using the dictionary to produce the master network of items and associations. This resides on the server.  I think this negates the need for flags, since the Eigenrank of the doctor will be explained by the associations, and the network can be interrogated by looking for explanatory items within some number of hops. The dictionary entry that was used to extract that item is also added to the network as an item
        • PractitionerDictionary finds medical practitioners <membership roles?>. Providers are added to the item table and to the master network
          • Each practitioner is checked for associations like office, hospital, specialty. New items are created as needed and associations are created
        • LegalDictionary finds (disputes and findings?) in legal proceedings, and adds legal items that are associated with items currently in the network. Items that are associated with GUILTY get low (negative?) weight. A directly attributable malpractice conviction should be a marker that is always relevant. Maybe a reference to it is part of the practitioner record directly?
        • SocialDictionary finds rating items from places like Yelp. High ratings provide higher weight, low ratings produce lower weight. The weight of a rating shouldn’t be more important than a conviction, but a lot of ratings should have a cumulative effect.
        • Other dictionaries? Healthcare providers? Diseases? Medical Schools?
        • Link age. Should link weight move back to the default state as a function of time?
        • Matrix calculation. I think we calculate the rank of all items and their adjacency once per day. Queries are run against the matrix
      • Client
        • Corporate
          • The user is presented with an dashboard ordered by pre-specified criteria (“show new bad practitioners?”). This is calculated by the server looking through the eigenrank starting at the top looking for N items that contain text/tags that match the query (high Jacquard index?). It returns the set to eliminate duplication. The dictionary entries that were associated with the creation of the item are also returned.
        • Consumer
          • The user types in a search: “cancer specialist maryland carefirst”
          • The search looks through the eigenrank starting at the top looking for N items that contain text/tags that match the query (high Jacquard index?). It returns the set to eliminate duplication. The dictionary entries that were associated with the creation of the item are also returned.
        • Common
          • In the browser, the section(s) of the network are reproduced, and the words associated with the items are displayed beside search results, along with sliders that adjust their weights on the local browser network. If the user increases the slider items associated with that entry rise (as does the entry in the list?). This allows the user to reorder their results based on interactive refinement of their preferences.
          • When the user clicks on a result, the position of the clicked item, the positions of the other items, and the settings of the entry sliders is recorded on the server (with the user info?). These weights can be fed back into the master network so that the generalized user preferences are reflected. If we just want to adjust things to the particular user, the Eigenrank will have to be recalculated on a per user basis. I think this does not have to include a full network recalculation.

Phil 1.12.16

7:00 – 4:00 VTX

  • So I ask myself, is there some kind of public repository of crawled data? Why, of course there is! Common Crawl. So there is a way of getting the deep link structure for a given site without crawling it. That could give me the ability to determine how ‘bubbly’ a site is. I’m thinking there may be a ratio of bidirectional to unidirectional links (per site?) that could help here.
  • More lit review and integration.
  • Making diagrams for the Sprint review today
    • Overview
      • The purpose of this effort is to provide a capability for the system to do more sophisticated queries that do several things
        • Allow the user to emphasize/de-emphasize words or phrases that relate to the particular search and to do this interactively based on linguistic analysis of the returned text.
        • Get user value judgments on the information provided based on the link results reordering
        • Use this to feed back to the selection criteria for provider Flags.
      • This work leans on the paper PageRank without Hyperlinks if you want more background/depth.
    • Eiphcone 129 – Design database table schema.
      • Took my existing MySql db schema and migrated it to Java Persistent Entities. Basically this meant taking a db that was designed for precompiled query access and retrieval (direct data access for adding data, views for retrieval) and restructuring it. So we go from: beforeTables
      • to
      • afterTables
      • The classes are annotated POJOs in a simple hierarchy. The classes that have ‘Base’ in their names I expect to be extended, though there may be enough capability here. GuidBase has some additional capability to make adding data to one class that has a data relation to another class gets filled out properly in both: JavaClassHierarchySince multiple dictionary entries can be present in multiple corpora BaseDictionaryEntry and Corpus both have a <Set> of BaseEntryContext that connects the corpora and entries with additional information that might be useful, such as counts.
      • This manifests itself in the database as the following: ER DiagramIt’s not the prettiest drawing, but I can’t get IntelliJ to draw any better. You can see that the tables match directly to the classes. I used the InheritanceType.JOINED strategy since Jeremy was concerned about wasted space in the tables.
      • The next steps will be to start to create test cases that allow for tuning and testing of this setup at different data scales.
    • Eiphcone 132 – Document current progress on relationship/taxonomy design & existing threat model
      • Currently, a threat is extracted by comparing a set of known entities to surrounding text for keywords. In the model shown above, practitioners would exist in a network that includes items like the practice, attending hospitals, legal representation, etc. Because of this relationship, flags could be extended to the other members of the network. If a near neighbor in this network has a Flag attached, it will weight the surrounding edges and influence the practitioner. So if one doctor in a practice is convicted of malpractice, then other doctors in the practice will get lower scores.
      • The dictionary and corpus can interact as their own network to determine the amount of wight that is given to a particular score. For example, words in a dictionary that are used to extract data from a legal corpus may have more weight than a social media corpus.
    • Eiphcone 134 – Design/document NER processing in relation to future taxonomy
      • I compiled and ran the NER codebase and also walked though the Stanford NLP documentation. The current NER system looks to be somewhat basic, but solid and usable. Using it to populate the dictionaries and annotating the corpus appears to be straightforward addition of the capabilities already present in the Stanford API.
    • Demo – I don’t really have a demo, unless people want to see some tests compile and run. To save the time, I have this exiting printout that shows the return of dynamically created data:
[EL Info]: 2016-01-12 14:09:40.481--ServerSession(1842102517)--EclipseLink, version: Eclipse Persistence Services - 2.6.1.v20150916-55dc7c3
[EL Info]: connection: 2016-01-12 14:09:40.825--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService login successful

Users
firstName(firstname_0), lastName(lastname_0), login(login_0), networks( network_0)
firstName(firstname_1), lastName(lastname_1), login(login_1), networks( network_4)
firstName(firstname_2), lastName(lastname_2), login(login_2), networks( network_3)
firstName(firstname_3), lastName(lastname_3), login(login_3), networks( network_1 network_2)
firstName(firstname_4), lastName(lastname_4), login(login_4), networks()

Networks
name(network_0), owner(login_0), type(WAMPETER), archived(false), public(false), editable(true)
	[92]: name(DataNode_6_to_BaseNode_8), guid(network_0_DataNode_6_to_BaseNode_8), weight(0.5708945393562317), type(IDENTITY), network(network_0)
		Source: [86]: name('DataNode_6'), type(ENTITIES), annotation('annotation_6'), guid('50836752-221a-4095-b059-2055230d59db'), double(18.84955592153876), int(6), text('text_6')
		Target: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
	[91]: name(BaseNode_5_to_UrlNode_4), guid(network_0_BaseNode_5_to_UrlNode_4), weight(0.3703539967536926), type(COMPUTED), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[98]: name(BaseNode_5_to_UrlNode_1), guid(network_0_BaseNode_5_to_UrlNode_1), weight(0.4556456208229065), type(ENTITIES), network(network_0)
		Source: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_1), owner(login_3), type(WAMPETER), archived(false), public(false), editable(true)
	[96]: name(BaseNode_2_to_UrlNode_1), guid(network_1_BaseNode_2_to_UrlNode_1), weight(0.5733484625816345), type(URL), network(network_1)
		Source: [82]: name('BaseNode_2'), type(ITEM), annotation('annotation_2'), guid('c5867557-2ac3-4337-be34-da9da0c7e25d')
		Target: [81]: name('UrlNode_1'), type(UNKNOWN), annotation('annotation_1'), guid('f9693110-6b5b-4888-9585-99b97062a4e4'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[95]: name(DataNode_0_to_UrlNode_7), guid(network_1_DataNode_0_to_UrlNode_7), weight(0.85154128074646), type(MERGE), network(network_1)
		Source: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
	[94]: name(DataNode_9_to_BaseNode_5), guid(network_1_DataNode_9_to_BaseNode_5), weight(0.72845458984375), type(KEYWORDS), network(network_1)
		Source: [89]: name('DataNode_9'), type(USER), annotation('annotation_9'), guid('5bdb67de-5319-42db-916e-c4050dc682dd'), double(28.274333882308138), int(9), text('text_9')
		Target: [85]: name('BaseNode_5'), type(RATING), annotation('annotation_5'), guid('bf28f478-626d-4e8f-9809-b4a37f2ad504')

name(network_2), owner(login_3), type(EXPLICIT), archived(false), public(false), editable(true)
	[90]: name(BaseNode_8_to_UrlNode_7), guid(network_2_BaseNode_8_to_UrlNode_7), weight(0.2619180679321289), type(WAMPETER), network(network_2)
		Source: [88]: name('BaseNode_8'), type(COMPUTED), annotation('annotation_8'), guid('77250282-3b5e-416e-a469-bbade10c5e88')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')

name(network_3), owner(login_2), type(EXPLICIT), archived(false), public(false), editable(true)
	[93]: name(UrlNode_4_to_DataNode_3), guid(network_3_UrlNode_4_to_DataNode_3), weight(0.7689594030380249), type(ITEM), network(network_3)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
	[97]: name(DataNode_3_to_DataNode_0), guid(network_3_DataNode_3_to_DataNode_0), weight(0.5808262825012207), type(URL), network(network_3)
		Source: [83]: name('DataNode_3'), type(UNKNOWN), annotation('annotation_3'), guid('e7565935-6429-451f-b7f4-cc2d612ca3fd'), double(9.42477796076938), int(3), text('text_3')
		Target: [80]: name('DataNode_0'), type(USER), annotation('annotation_0'), guid('e9b7fa0a-37f1-41bd-a2c1-599841d1507a'), double(0.0), int(0), text('text_0')

name(network_4), owner(login_1), type(ITEM), archived(false), public(false), editable(true)
	[99]: name(UrlNode_4_to_UrlNode_7), guid(network_4_UrlNode_4_to_UrlNode_7), weight(0.48601675033569336), type(WAMPETER), network(network_4)
		Source: [84]: name('UrlNode_4'), type(IDENTITY), annotation('annotation_4'), guid('bffe13ae-bb70-46a6-b1b4-9f58cadad04e'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')
		Target: [87]: name('UrlNode_7'), type(QUERY), annotation('annotation_7'), guid('b9351194-d10e-4f6a-b997-b84c61344fcf'), Date(2016-01-11 11:51Z), html(some text), text('some text'), link('http://source.com/source.html'), image('http://source.com/soureImage.jpg')


Dictionaries
[30]: name(dictionary_0), guid(943ea8b6-6def-48ea-8b0f-a4e52e53954f), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_11
	Parent = word_10
	word_11 has 790 occurances in corpora0_chapter_1

	Entry = word_14
	word_14 has 4459 occurances in corpora1_chapter_2

	Entry = word_1
	Parent = word_0
	word_1 has 3490 occurances in corpora1_chapter_2

	Entry = word_10
	word_10 has 3009 occurances in corpora3_chapter_4

	Entry = word_4
	word_4 has 2681 occurances in corpora3_chapter_4

	Entry = word_5
	Parent = word_4
	word_5 has 5877 occurances in corpora1_chapter_2


[31]: name(dictionary_1), guid(c7b62a4b-b21a-4ebe-a939-0a71a891a3f9), Owner(login_0), archived(false), public(false), editable(true)
	Entry = word_3
	Parent = word_2
	word_3 has 4220 occurances in corpora0_chapter_1

	Entry = word_6
	word_6 has 4852 occurances in corpora2_chapter_3

	Entry = word_17
	Parent = word_16
	word_17 has 8394 occurances in corpora2_chapter_3

	Entry = word_2
	word_2 has 1218 occurances in corpora3_chapter_4

	Entry = word_19
	Parent = word_18
	word_19 has 8921 occurances in corpora2_chapter_3

	Entry = word_8
	word_8 has 4399 occurances in corpora3_chapter_4



Corpora
[27]: name(corpora1_chapter_2), guid(08803d93-deeb-4699-bdb2-ffa9f635c373), totalWords(1801), importer(login_1), url(http://americanliterature.com/author/herman-melville/book/moby-dick-or-the-whale/chapter-2-the-carpet-bag)
	word_15 has 5338 occurances in corpora1_chapter_2
	word_13 has 2181 occurances in corpora1_chapter_2
	word_14 has 4459 occurances in corpora1_chapter_2
	word_1 has 3490 occurances in corpora1_chapter_2
	word_5 has 5877 occurances in corpora1_chapter_2
	word_16 has 2625 occurances in corpora1_chapter_2

[EL Info]: connection: 2016-01-12 14:09:41.116--ServerSession(1842102517)--/file:/C:/Development/Sandboxes/JPA_2_1/out/production/JPA_2_1/_NetworkService logout successful
  • Sprint review delayed. Tomorrow
  • Filling in some knowledge holes in JPA. Finished Chapter 4.
  • Tried getting enumerated types to work. No luck…?

Phil 1.11.16

7:00 – 3:00 VTX

  • Good bye David Bowie. I was hoping to see you on tour this year.
  • Working my way through papers, building a corpus and a taxonomy
  • The last sprint task is to “Design/document NER processing in relation to future taxonomy”. I think that’s the dictionary/corpus integration, but I need to check with Aaron, since he wrote it…
  • Beware is software that scans public records for risks that police face when engaging with the public. Threats and dashboards. From WaPo.
  • Added in the BaseEntryContext to replace the Join table between Corpus and BaseDictionaryEntry. It’s nice actually, I’d rather have a join table that actually does something. it’s based on this stackoverflow post

Phil 1.8.16

8:00 – 5:00

  • Today is Roy Batty’s Birthday
  • Had a thought this morning. Rather than just having anonymous people post what they think is newsworthy, have a Journalist chatbot (something as simple as Eliza could work) tease out more information. The pattern of response, possibly augmented by server pulls for additional information might get to some really interesting responses, and a lot more input from the user.
  • Ok, now that I’ve got the path information figured out, migrating to vanilla JPA.
  • Viewing the sql requiresa  library specific property, but everything else is vanilla. This gets the tables built:
    <persistence xmlns="http://xmlns.jcp.org/xml/ns/persistence" version="2.1">
        <persistence-unit name="NetworkService" transaction-type="RESOURCE_LOCAL">
            <class>com.philfeldman.mappings.GuidBase</class>
            <class>com.philfeldman.mappings.BaseAssociation</class>
            <class>com.philfeldman.mappings.BaseDictionary</class>
            <class>com.philfeldman.mappings.BaseDictionaryEntry</class>
            <class>com.philfeldman.mappings.BaseNetwork</class>
            <class>com.philfeldman.mappings.BaseNode</class>
            <class>com.philfeldman.mappings.BaseUser</class>
            <class>com.philfeldman.mappings.Corpus</class>
            <class>com.philfeldman.mappings.DataNode</class>
            <class>com.philfeldman.mappings.NetworkType</class>
            <class>com.philfeldman.mappings.UrlNode</class>
            <validation-mode>NONE</validation-mode>
            <properties>
                <property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver"/>
                <property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost:3306/projpa"/>
                <property name="javax.persistence.jdbc.user" value="root"/>
                <property name="javax.persistence.jdbc.password" value="edge"/>
                <property name="javax.persistence.schema-generation.database.action" value="drop-and-create"/>
                <!-- enable this property to see SQL and other logging -->
                <property name="eclipselink.logging.level" value="FINE"/>
            </properties>
        </persistence-unit>
    </persistence>
  • Here’s a simple JPA commit:
    public void addUsers(int num){
        em.getTransaction().begin();
        for(int i = 0; i < num; ++i) {
            BaseUser bu = new BaseUser("firstname_" + i, "lastname_" + i, "login_" + i, "password_" + i);
            em.persist(bu);
        }
    
        em.getTransaction().commit();
    }
  • Here’s a simple Criteria pull:
    public void getAllUsers(){
        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<BaseUser> cq = cb.createQuery(BaseUser.class);
        TypedQuery<BaseUser> tq = em.createQuery(cq);
        users = new ArrayList<>(tq.getResultList());
    }
  • Here’s a more sophisticated query. This can be made much better easily, but that’s for next week.
    System.out.println("\nDictionaries");
    String Query = "SELECT bd FROM dictionaries bd WHERE bd.owner.login LIKE '%_4%'";
    TypedQuery<BaseDictionary> dictQuery = em.createQuery(Query, BaseDictionary.class);
    List<BaseDictionary> bds = dictQuery.getResultList();
    for(BaseDictionary bd : bds){
        System.out.println(bd.toString());
    }

Phil 1.7.16

7:00 – 4:00 VTX

  • Adding more codes in Atlas.
  • Found a good stemming algorithm/implementation, including java
  • Discovered the Lemur ProjectThe Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset. Our software and datasets are used widely in scientific and research applications, as well as in some commercial applications.
  • Also, discovered that TREC also has sets of queries that they use. Here’s an example
  • Ok. Pro JPA, Chapter 2.
    • Got the example code from here
    • How to drop and create the DB from the schema: http://antoniogoncalves.org/2014/12/11/generating-database-schemas-with-jpa-2-1/
    • STILL having problems finding the provider! – Exception in thread “main” javax.persistence.PersistenceException: No Persistence provider for EntityManager
    • Finally found this on stackoverflow
      • Go to Project Structure.
      • Select your module.
      • Find the folder in the tree on the right and select it.
      • Click the Sources button above that tree (with the blue folder) to make that folder a sources folder.
    • And that worked. Here’s the ‘after’ screenshot: AddToIntelliJPath

Phil 1.6.16

10:30 – 6:00 VTX

  • Took Mom in for a colonoscopy. Her insides are looking good for 89 years old…
  • Was able to generate a matrix of codes from AtlasTi, which means that I should be able to do centrality calculations of the Excel exports.
  • Also placed the main Atlas work files in SVN. It’s a little tricky since the project library in on Google drive. My fix has been to leave the ‘MyLibrary’ location in its default location and just update the library information when asked. I think it’s just populating a file in the emptier(?) library file. I think it’s important for the Google Drive file locations to be identical though.
  • Flailing stupidly at getting a JPA hello world to run. Constantly getting: Exception in thread “main” javax.persistence.PersistenceException: No Persistence provider for EntityManager named instrument
  • Trying to flail a little smarter. Got Pro JPA 2, 2nd ed.
  • Added checking to the criteria string so that if there is no match on the criteria field in question, it’ll throw an exception.

Phil 12.31.15

Phil 7:00 – 4:00 VTX

  • Decided to get a copy (hopefully with student discount) of Atlas. It does taxonomic analysis and outputs a matrix to Excel that I should be able to use to produce PageRank???
  • Timesheets! Done
  • Seeing if I can add a bit of reflection to addXXX(YYY) that will invoke add/setYYY(XXX). Since the target is a map, it shouldn’t care, but I do need to worry about recursion…
  • Added addSourceToTarget() and testSourceInTarget() to GuidBase. So Now addM2M looks like
    public void addM2M(GuidBase obj) throws Exception {
        System.out.println("person.addM2M");
        addSourceToTarget(this, obj);
        addSourceToTarget(obj, this);
    }

    and the example of Showroom.addCar() looks like

    public void addCar(Car car){
        if(cars == null){
            cars = new HashSet<>();
        }
        cars.add(car);
    
        try {
            if(!testSourceInTarget(this, car)){
                addSourceToTarget(this, car);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    Which now means that the two way mapping is automatic. And in case you’re wondering, testSourceInTarget looks first for a method that returns the source type, and then looks for a method that returns a Set<source type>. If it find the source in either one of those, it returns true.

  • Got queries running. Simple queries are easy, but the more complex ones can be pretty ugly. Here’s an example that pulls a Showroom Object based on a nested Person Object (instanced as ‘customer’):
    // do a query based on a nested item's value. Kinda yucky...
    Criteria criteria = session.createCriteria(Showroom.class, "sh");
    criteria.createAlias("sh.customers", "c");
    List result = criteria.add(Restrictions.like("c.name", "%Aaron%")).list();
    for(Object o : result){
        System.out.println(o.toString());
    }

Phil 12.30.15

7:00 – 5:00 VTX

  • Finished up notes on the Gezi paper
  • Back to hibernate
    • finishing up M2M method
    • Have both items pointing at each other. Currently throwing a plain Exception. Will probably need to change that later, if the message turns out not to be enough.
    • Moved M2M to the GuidBase class and tested with the showroom example I’ve been playing with. Success! Here’s the use case from the header:
      Class that sets up the mutual relationships between two classes that have to have access to each other's
      Sets in the context of a 'ManyToMany' Hibernate configuration. To set up the classes, do the following:
      1) Add the annotations that define the mapping table in one class - e.g:
      For the class 'Kitten'
         @ManyToMany(cascade = CascadeType.ALL)
         @JoinTable(name = "kittens_puppies",
            joinColumns = @JoinColumn(name = "kitten_id"),
            inverseJoinColumns = @JoinColumn(name = "puppy_id")
         )
         private Set<Puppy> puppies;
      Similarly, for the class 'Puppy'
      @ManyToMany(cascade = javax.persistence.CascadeType.ALL, mappedBy = "puppies" )
       private Set<Kitten> kittens;
      Each class will need an 'addXXX(XXX xxx) class that adds a single element of XXX to the Set. This is the
      template that M2M is looking for. There needs to be one and only one method for each mapping.
      An example from the class 'Kitten' is shown below:
         public class addPuppy(Puppy puppy){
             if(puppies == null){
                 puppies = new HashSet()
             }
             puppies.add(puppy)
         }
      Lastly, the code that handles the session needs to call xxx.M2M once for each of the relationships:
      session.beginTransaction();
      Kitten k1 = new Kitten();
      Kitten k2 = new Kitten();
      Kitten k3 = new Kitten();
      Kitten k4 = new Kitten();
      Puppy p1 = new Puppy();
      Puppy p2 = new Puppy();
      Puppy p3 = new Puppy();
      Puppy p4 = new Puppy();
      k1.M2M(p1);
      k1.M2M(p2);
      k1.M2M(p2);
      k1.M2M(p4);
      k2.M2M(p2);
      k2.M2M(p4);
      k3.M2M(p1);
      k3.M2M(p3);
      session.save(k1);
      session.save(k2);
      session.save(k3);
      session.save(k4);
      session.getTransaction().commit();
    • And that seems to be working! I got a little confused as to which item should be mapped to, but now understand that the collection to be mapped to has to (should?) be the Set<XXX> of items that are referenced in the YYY class that contains the @JoinTable annotations.
    • Need to add a relation between networks and associations and items. Done.
    • Time to figure out queries. Get all the networks names for a user, then get a network, that sort of thing.

Phil 11.9.15

7:00 – 3:00 SR

  • Training
  • Got all the Java files built and burned to disk the main problem that I had was getting a Tomcat runtime instance showing up. Here was the fix: http://stackoverflow.com/questions/2000078/apache-tomcat-not-showing-in-eclipse-server-runtime-environments