Phil 9.9.16

7:00 – 5:00 ASRC

Finished section 3.14
Back to reading Data Mining.Currently on Chapter 3. Done!
Chapter 4.
Discussion with Aaron, then Bob about sprint-ish planning.

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Jonathan D. Cohen
Samuel M. McClur
Angela J. Yu
…often our decisions depend on a higher level choice: whether to exploit well known but possibly suboptimal alternatives or to explore risky but potentially more proﬁtable ones. How adaptive agents choose between exploitation and exploration remains an important and open question that has received relatively limited attention in the behavioural and brain sciences. The choice could depend on a number of factors, including the familiarity of the environment, how quickly the environment is likely to change and the relative value of exploiting known sources of reward versus the cost of reducing uncertainty through exploration.
The need to balance exploitation with exploration is confronted at all levels of behaviour and time-scales of decision making from deciding what to do next in the day to planning a career path.
The signiﬁcance of Gittins’ contribution is that it reduced the decision problem to computing and comparing these scalar indices. In practice, computing the Gittins index is not tractable for many problems for which it is known to be optimal. However, for some limited problems, explicit solutions have been found. For instance, the Gittins index has been computed for certain two armed bandit problems (in which the agent chooses between two options with independent probabilities of generating a reward), and compared to the foraging behaviour of birds under comparable circumstances; the birds were found to behave approximately optimally
Perhaps, the most important exception to Gittins’ assumptions is that real-world environments are typically non-stationary; i.e. they change with time. To understand how organisms manage the balance between exploration and exploitation in non-stationary environments, investigators have begun to study how organisms adapt their behaviour in response to the experimentally induced changes in reward contingencies. Several studies have now shown that both humans and other animals dynamically update their estimates of rewards associated with speciﬁc courses of action, and abandon actions that are deemed to be diminishing in value in search of others that may be more rewarding
At the same time, there is also longstanding evidence that humans sometimes exhibit an opposing tendency. When reward diminishes (e.g. following an error in performance), subjects often try harder at what they have been doing rather than less (e.g. Rabbitt 1966; Laming 1979; Gratton et al. 1992).
The balance between exploration and exploitation also seems to be sensitive to time horizons. Humans show a greater tendency to explore when there is more time left in a task, presumably because this allows them sufﬁcient time later to enjoy the fruits of those explorations (Carstensen et al. 1999). – is this related to (lack of) stress? Something about cognitive bandwidth?
Bandit problems are well suited for studying the tension between exploitation and exploitation since they offer a direct trade-offbetween exploiting a known source of reward (continuing to play one arm of the bandit) and exploring the environment (trying other arms) to acquire information about other sources of reward
The investigators found that the time at which birds stopped exploring (operationalized as the point at which they stayed at one feeding post) closely approximated that predicted by the optimal solution. Despite their ﬁndings, Krebs et al. (1978) recognized that it was highly unlikely that their birds were carrying out the complex calculations required by the Gittins index. Rather, they suggested that the birds were using simple behavioural heuristics that produces exploration times that qualitatively approximate the optimal solution – this might be good for the modelling section.
Nevertheless, to our knowledge, the Daw et al. (2006)study was the ﬁrst to address formally the question of how subjects weigh exploration against exploitation in a non-stationary, but experimentally controlled environment. It also produced some interesting neurobiological ﬁndings. Their subjects performed the n-armed bandit task while being scanned using functional magnetic resonance imaging (fMRI). Among the observations reported was task-related activity in two sets of regions of prefrontal cortex (PFC). One set of regions was in ventromedial PFC and was associated with both the magnitude of reward associated with a choice, and that predicted by their computational model of the task (using the softmax decision rule). This area has been consistently associated with the encoding of reward value across a variety of task domain – biological basis for different behaviors
Yu & Dayan (2005) proposed that a critical function of two important neuromodulators—acetylcholine (ACh) and norepinephrine (NE)—may be to signal expected and unexpected sources of uncertainty. While the model they developed for this was not intended to address the trade-off between exploitation and exploration, the distinction between expected and unexpected uncertainty is likely to be an important factor in regulating this trade-off. For example, the detection of unexpected uncertainty can be an important signal of the need to promote exploration.
…the distinction between expected and unexpected forms ofuncertainty may be an important element in choosing between exploitation versus exploration. As long as prediction errors can be accounted for in terms of expected uncertainty—that is the amount that we expect a given outcome to vary—then all other things being equal (e.g. ignoring potential non-stationarities in the environment), we should persist in our current behaviour (exploit). However, if errors in prediction begin to exceed the degree expected—i.e. unexpected uncertainty mounts—then we should revise our strategy and consider alternatives (explore).
Yu & Dayan (2005) proposed that ACh levels are used to signal expected uncertainty, and NE to signal unexpected uncertainty. They describe a computationally tractable algorithm by which these maybe estimated that approximates the Bayesian optimal computation of those estimates. Furthermore, they proposed how these estimates, reﬂected by NE and ACh levels, could be used to determine when to revise expectations

Phil 9.8.16

7:00 – 4:00 ASRC

Shimei and Wayne have responded to the Doodle that I sent out last night. No reading as a result.
Need to write an abstract!
Working through the TODOs
- Working on confirming, avoiding and exploring section.
  - Wow. Need to take a look at this as an in-depth analysis of explorer patterns: Human-inspired algorithms for search A framework for human-machine multi-armed bandit problems. It’s Paul Reverdy’s Dissertation, and he’s done some followup work as well. Send him an email!
  - Looking through the above, found this reference: Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Downloading and reading.
  - And there’s this as well: Humans Use Directed and Random Exploration to Solve the Explore–Exploit Dilemma “We thus conclude that both information seeking and choice variability can be controlled and put to use in the service of exploration.” In other words, there is biological support for the belief that this can be designed for.
Lunchtime ride thoughts
- Social Trust is the prisoner’s dilemma. It depends on negotiation. The natural communication is stories. Behaviors are dominance, submission, rejection, etc… God of stories
- System Trust is the multi-armed bandit problem. It depends on navigation. The natural communication is diagrams and maps. Behaviors are explore/exploit
- Collection Trust is about storage and access. It depends on counting The natural communication is lists and numbers. Behaviors are organizing, misplacing, loosing, hoarding, etc
- Knowledge Dieties
True to my word, I now have a WebExceptionHandler that launches stackoverflow.
Need to register for EMNLP 2016. Early registration ends October 1.
Reviewing Chapter2
Reviewing Chapter 3
Discussion with Aaron about building corpora
- Build a LanguageModelNetworks browser using WebView. Backend connects to DB for clickstream logging, page storage, CSEs, etc.
  - Name/select the collection that’s being worked on
  - Enter the search term(s)
  - Results come back from the specified CSEs
  - When a page is found that looks good, add it to the collection
    - TF-IDF and centrality is calculated based on the updated corpus (tab for the current display that allows for manipulation). Top n words are made available for insertion into the search term
    - Tag the page with some kind of smart, integrated tagger?
  - Rinse, lather, repeat.

Phil 9.7.16

7:00 – 4:30 ASRC

Sent a follow up note to Shimei. Regardless, I’ll send out the schedule options this evening. Thinking about the 27th and 28th as my preference. Structure so that it begins before rush hour and ends after for Thom?
Fixed the research through design section to focus on explicitly designing fo behaviors.
Need to read up on what worked yesterday – NaiveBayes, SGD (stochastic gradient descent), SMO (sequential minimal optimization algorithm for training a support vector classifier)
- NaiveBayes
- SGD
- SMO
Try implementing NB in straight Java?
Talked to Bob and was re-inspired to build a static method that fires off the browser to stackoverflow with the exception.

Phil 9.6.16

7:00 – 4:30 ASRC

Have everyone’s schedule for proposal but Shimei.
Saw an interesting article in Gamasutra on Behaviourism, In-game Economies and the Steam Community Market, which led me to get Hooked: How to Build Habit-Forming Products, which should be good for gamification of the UI
Working on section 1.6 – What the rest of the proposal looks like. Kinda done?
Just found a blog post that mentions this reviewer guideline for registered reports, which is kind of like a study proposal, where the research methods of a paper are submitted before the study is done. Interesting. Need to make sure that my proposal fits with this…

Back to WEKA and the analysis of the physician data.

Overall stats – 30 ‘good’, 12 junk, per these rules in RatingObj2 in the GoogleCSE2 project:

public String junkOrGood(){
    boolean junk = true;
    if(personCharacterization.equals(INAPPROPRIATE)){
        return "junk";
    }
    if(sourceType.equals(MACHINE_GENERATED)){
        return "junk";
    }
    if(qualityCharacterization.equals(LOW) || qualityCharacterization.equals(MINIMAL))
    {
        return "junk";
    }
    if(trustworthiness.equals(NOT_CREDIBLE) || trustworthiness.equals(DISTRUSTWORTHY) || trustworthiness.equals(VERY_DISTRUSTWORTHY)){
        return "junk";
    }
    return "good";
}

This shows the second pass using just the text. It turns out that the classifiers were targeting the meta information as the best predictor. And of course they were right. Pulled out the meta information and got the following. I do want to try some of the other meta information as well, like trustworthiness and see if there’s anything that makes sense. Not that this corpus is just html pages that were successfully downloaded and scanned. No MSWORD or PDF.

NaiveBayes:

Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 33 78.5714 %
Incorrectly Classified Instances 9 21.4286 %
Kappa statistic 0.5116
Mean absolute error 0.2143
Root mean squared error 0.4629
Relative absolute error 51.8311 %
Root relative squared error 102.1856 %
Total Number of Instances 42 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 0.750 0.200 0.600 0.750 0.667 0.519 0.747 0.509 junk
 0.800 0.250 0.889 0.800 0.842 0.519 0.810 0.876 good
Weighted Avg. 0.786 0.236 0.806 0.786 0.792 0.519 0.792 0.771 

=== Confusion Matrix ===

 a b <-- classified as
 9 3 | a = junk
 6 24 | b = good

SGD (stochastic gradient descent):

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 35 83.3333 %
Incorrectly Classified Instances 7 16.6667 %
Kappa statistic 0.637 
Mean absolute error 0.1667
Root mean squared error 0.4082
Relative absolute error 40.3131 %
Root relative squared error 90.1193 %
Total Number of Instances 42 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 0.917 0.200 0.647 0.917 0.759 0.660 0.858 0.617 junk
 0.800 0.083 0.960 0.800 0.873 0.660 0.858 0.911 good
Weighted Avg. 0.833 0.117 0.871 0.833 0.840 0.660 0.858 0.827 

=== Confusion Matrix ===

 a b <-- classified as
 11 1 | a = junk
 6 24 | b = good

SMO (sequential minimal optimization algorithm for training a support vector classifier.):

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 32 76.1905 %
Incorrectly Classified Instances 10 23.8095 %
Kappa statistic 0.5139
Mean absolute error 0.2381
Root mean squared error 0.488 
Relative absolute error 57.5901 %
Root relative squared error 107.7131 %
Total Number of Instances 42 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 0.917 0.300 0.550 0.917 0.687 0.558 0.808 0.528 junk
 0.700 0.083 0.955 0.700 0.808 0.558 0.808 0.882 good
Weighted Avg. 0.762 0.145 0.839 0.762 0.773 0.558 0.808 0.781 

=== Confusion Matrix ===

 a b <-- classified as
 11 1 | a = junk
 9 21 | b = good

Multilayer Perceptron took a long time but didn’t produce any results?

Attribute Selected Classifier – J48(Dimensionality of training and test data is reduced by attribute selection before being passed on to a classifier.)

Time taken to build model: 1.41 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 34 80.9524 %
Incorrectly Classified Instances 8 19.0476 %
Kappa statistic 0.4815
Mean absolute error 0.2238
Root mean squared error 0.3805
Relative absolute error 54.1364 %
Root relative squared error 83.9928 %
Total Number of Instances 42 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 0.500 0.067 0.750 0.500 0.600 0.499 0.729 0.682 junk
 0.933 0.500 0.824 0.933 0.875 0.499 0.729 0.823 good
Weighted Avg. 0.810 0.376 0.803 0.810 0.796 0.499 0.729 0.783 

=== Confusion Matrix ===

 a b <-- classified as
 6 6 | a = junk
 2 28 | b = good

Discussion with Aaron about the upcoming epics for machine learning. I thin ka lot of this is going to be about classifying data well for subsequent learning

Phil 9.1.16

7:00 – 4:30 ASRC

Added Aaron and Don’s schedules to Wayne’s. Waiting on Shimei and Thom. Leaning to Oct 28 right now.
Added more support for section 1.4
Need to add some explorer support. This is really interesting: Rostrolateral Prefrontal Cortex and Individual Differences in Uncertainty-Driven Exploration.
Computational ants! https://medium.com/mit-media-lab/computational-ants-agent-based-visualization-with-od-matrices-fc7463d5c985#.2wteifa7j
Creating the config file from scanned physician. Having to fix more junk text. For some reason arff hates % and &. My regex should throw the items out but it fails for some reason. Ah, cut-and-paste error.
First results for curated data!

time taken to build model: 0.07 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 33 78.5714 %
Incorrectly Classified Instances 9 21.4286 %
Kappa statistic 0.5116
Mean absolute error 0.2143
Root mean squared error 0.4629
Relative absolute error 51.8311 %
Root relative squared error 102.1856 %
Total Number of Instances 42 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 0.750 0.200 0.600 0.750 0.667 0.519 0.757 0.540 junk
 0.800 0.250 0.889 0.800 0.842 0.519 0.810 0.876 good
Weighted Avg. 0.786 0.236 0.806 0.786 0.792 0.519 0.795 0.780 

=== Confusion Matrix ===

 a b <-- classified as
 9 3 | a = junk
 6 24 | b = good

Phil 8.31.16

7:00 – 5:00 ASRC

Put Wayne’s schedule for October and early November
Ping the rest of the Committee
- Aaron – done
- Don – done
- Shimei – done
- Thom – done
Onward with incorporating comments – added ‘fourth estate’ paragraph.
- I trust my favorite knife because I’ve used it before and I can feel it’s sharpness.
Working on building a corpus config file from my GoogleCSE results.
Need to add a To Arff menu selection and query.
- Query is running.
- Need a binary variable as to whether this is something we want to train on. Probably match plus high quality.

Phil 8.30.16

7:00 – 3:30 ASRC

Adding in Wayne’s comments.
Got the Corpus generating arff files for BagOfWords and TF-IDF.
Here’s the result for NaiveBayes on the first four chapters of Mobey Dick

Correctly Classified Instances 3 75 %
Incorrectly Classified Instances 1 25 %
Kappa statistic 0.6667
Mean absolute error 0.125 
Root mean squared error 0.3536
Relative absolute error 29.1667 %
Root relative squared error 71.4435 %
Total Number of Instances 4 

=== Detailed Accuracy By Class ===

 TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1_-_Loomings
 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.250 3_-_The_Spouter_Inn
 1.000 0.333 0.500 1.000 0.667 0.577 0.833 0.500 2_-_The_Carpet_Bag
 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 4_-_The_Counterpane
Weighted Avg. 0.750 0.083 0.625 0.750 0.667 0.644 0.833 0.688 

=== Confusion Matrix ===

 a b c d <-- classified as
 1 0 0 0 | a = 1_-_Loomings
 0 0 1 0 | b = 3_-_The_Spouter_Inn
 0 0 1 0 | c = 2_-_The_Carpet_Bag
 0 0 0 1 | d = 4_-_The_Counterpane

This worked really well: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
And comparing Jack London stories to Edgar Allen Poe stories works with a corpus of six stories each and not so much with 3 stories each.

Phil 8.29.16

7:00 – 6:00 ASRC

Selective Use of News Cues: A Multiple-Motive Perspective on Information Selection in Social Media Environments – Quite close to the Explorer/Confirmer/Avoider study but using a custom(?) browsing interface that tracked the marking of news stories to read later. Subjects were primed for a task with motivations – accuracy, defense and impression. Added this to paragraph 2.9, where explorers are introduced.
Looked through Visual Complexity – Mapping Patterns of Information, and it doesn’t even mention navigation. Most information mapping efforts are actually graphing efforts. Added a paragraph in section 2.7
Added a TODO for groupthink/confirmation bias, etc.
Chat with Heath about AI.He’s looking to build a MUD agent and will probably wind up learning WEKA, etc. so a win, I think.
Working on getting the configurator to add string values.
Added to DocumentStatistics. Need to switch over to getSourceInfo() from getAddressStrings in the Configurator.
Meeting with Wayne about the proposal. One of the branches of conversation went into some research he did on library architecture. That’s been rattling around in my head.

We tend to talk about interface design where the scale is implicitly for the individual. The environment where these systems function is often thought of as an ecosystem, with the Darwinian perspective that goes along with that. But I think that such a perspective leads to ‘Survival of the Frictionlesss’, where the easiest thing to use wins and damn the larger consequences.

Reflecting on how the architecture and layout of libraries affected the information interactions of the patrons, I wonder whether we should be thinking about Information Space Architecture. Such a perspective means that the relationships between design at differing scales needs to be considered. In the real world, architecture can encompass everything from the chairs in a room to the landscaping around the building and how that building fits into the skyline.

I think that regarding information spaces as a designed continuum from the very small to very large is what my dissertation is about at its core. I want a park designed for people, not a wilderness, red in tooth and claw.

Phil 8.26.16

7:00 – 4:00 ASRC

Adding more model feedback
Something more to think about WRT Group Polarization models? Collective Memory and Spatial Sorting in Animal Groups
Need to be able to associate an @attribute key/value map with Labeled2Dmatrix rows so that we can compare different nominal values across a shared set of numeric columns. This may wind up being a derived class?
- Working on adding an array of key/value maps;
- Forgot to add the name to the @data section – oops!
- text is added to ARFF out. Should I add it to the xlsx outputs as well?
Here’s the initial run against the random test data within the class (L2D.arff).

=== Run information ===

Scheme: weka.classifiers.bayes.NaiveBayes
Relation: testdata
Instances: 8
Attributes: 12
name
sv1
sv2
sv3
p1
p2
p3
p4
s1
s2
s3
s4
Test mode: split 66.0% train, remainder test

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class
Attribute p1 p2 p3 p4 s1 s2 s3 s4
(0.13) (0.13) (0.13) (0.13) (0.13) (0.13) (0.13) (0.13)
=======================================================================
sv1
p4-sv1 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0
s2-sv1 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0
p2-sv1 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
s1-sv1 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0
[total] 4.0 5.0 4.0 5.0 5.0 5.0 4.0 4.0

sv2
p2-sv2 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
s4-sv2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0
p1-sv2 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
s1-sv2 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0
[total] 5.0 5.0 4.0 4.0 5.0 4.0 4.0 5.0

sv3
p2-sv3 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0
p1-sv3 2.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
s4-sv3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0
p3-sv3 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0
p4-sv3 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0
s2-sv3 1.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0
s1-sv3 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0
[total] 8.0 8.0 8.0 8.0 8.0 8.0 7.0 8.0

p1
mean 1 0 0 0 1 1 0 0
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

p2
mean 0 1 0 0 1 0 1 0
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

p3
mean 0 0 1 0 1 0 0 1
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

p4
mean 0 0 0 1 1 0 0 1
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

s1
mean 1 1 1 1 1 0 0 0
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

s2
mean 1 0 0 0 0 1 0 0
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

s3
mean 0 1 0 0 0 0 1 0
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1

s4
mean 0 0 1 1 0 0 0 1
std. dev. 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
weight sum 1 1 1 1 1 1 1 1
precision 1 1 1 1 1 1 1 1



Time taken to build model: 0 seconds

=== Evaluation on test split ===

Time taken to test model on training split: 0 seconds

=== Summary ===

Correctly Classified Instances 0 0 %
Incorrectly Classified Instances 3 100 %
Kappa statistic 0
Mean absolute error 0.2499
Root mean squared error 0.4675
Relative absolute error 108.2972 %
Root relative squared error 133.419 %
Total Number of Instances 3

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.000 0.333 0.000 0.000 0.000 0.000 ? ? p1
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.333 p2
0.000 0.333 0.000 0.000 0.000 0.000 ? ? p3
0.000 0.000 0.000 0.000 0.000 0.000 ? ? p4
0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.500 s1
0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000 s2
0.000 0.333 0.000 0.000 0.000 0.000 ? ? s3
0.000 0.000 0.000 0.000 0.000 0.000 ? ? s4
Weighted Avg. 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.611

=== Confusion Matrix ===

a b c d e f g h <-- classified as
0 0 0 0 0 0 0 0 | a = p1
0 0 0 0 0 0 1 0 | b = p2
0 0 0 0 0 0 0 0 | c = p3
0 0 0 0 0 0 0 0 | d = p4
0 0 1 0 0 0 0 0 | e = s1
1 0 0 0 0 0 0 0 | f = s2
0 0 0 0 0 0 0 0 | g = s3
0 0 0 0 0 0 0 0 | h = s4

Need to add text data from xml or from other(wrapper info? structured data? UI selections?) sources

Phil 8.25.16

7:00 – 3:30 ASRC

Paper
- More on contributions. Realized that I need a figure showing the relationships between individual and group behaviors.
- Found Driving a Wedge Between Evidence and Beliefs – How Online Ideological News Exposure Promotes Political Misperceptions. Nice look about awareness vs belief. The question is how this manifests. It could be that a person who believes false information may spend a lot of time looking at opposition information and as such creates an explorer pattern? Added to the paper archive.
- Added model feedback to section 5.4.6.2
Code
- Build class(s) that uses some of the CorpusBuilder (or just add to output?) codebase to
- Access webpages based on xml config file
- Read in, lemmatize , and build bag-of-words per page (configurable max). Done. Took out DF-ITF code and replaced it with BagOfWords in DocumentStatistics.
- Write out .arff file that includes the following elements
  - @method (TF-IDF, LSI, BOW)
  - @source (loomings, the carpet bag, the spouter inn, the counterpane)
  - @title (Moby-dick, Tarzan)
  - @author (Herman Melville, Edgar Rice Burroughs)
  - @words (nantucket,harpooneer,queequeg,landlord,euroclydon,bedford,lazarus,passenger,circumstance,civilized,water,thousand,about,awful,slowly,supernatural,reality,sensation,sixteen,awake,explain,savage,strand,curbstone,spouter,summer,northern,blackness,embark,tempestuous,expensive,sailor,purse,ocean,tomahawk,black,night,dream,order,follow,education,broad,stand,after,finish,world,money,where,possible,morning,light)
- So a line should look something like
  - LSI, chapter-1-loomings, Moby-dick, Herman Melville, 0,0,0,0,0,0,0,5,0,0,7,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,3,3,0,0,0,0,2,0,0,0,5,0,0,3,4,2,0,0,0
  - Updated LabledMatrix2D to generate arff files.

Phil 8.24.16

7:00 – 3:00 ASRC

Finished fact checking session
Starting contributions. Finished the first pass. That was quick!
Ordered my map of the internet

Weka Tutorials
- 1 – ARFF 101
- 2 – Data PreProcessing 101
- 3 – Classification 101
- 4 – Systematic Oversampling
Just stumbled across Deeplearning4j. Looks very nice.
- Attempting to set up

Phil 8.23.16

7:00 – 4:00 ASRC

Continuing to read The Sovereign Map. While thinking about the Twitter expert paper, I thought that maybe there were mapping projects of the Wikipedia, Schema.org or dmoz.org. I found this for Wikipedia.
xkcd maps
- Online Communities
- Online Communities 2 – and a really nice version
- IPV4 2006
Paper – continued work on fact-checking/crowdsourced data
Code
- Enable slider in fitnessTest – done
- Enable reading xml config files – done. Also setting the sliders from load
- Added Dom4j utils to JavaUtils2
- get started on WEKA – Starting with Emily’s intro. So far so good! Also ran a Naive Bayes classifier on the weather data set for Aaron to compare.

Phil 8.22.16

7:00 – 2:30 ASRC

Get the Porsche towed! Scheduled for 3:00
For tracking bike cases: http://www.phreakmonkey.com/2016/08/towl-telemetry-over-opportunistic-wifi.html
Paper
- Finishing up credibility. Skimming n the Wisdom of Experts vs. Crowds – Discovering Trustworthy Topical News in Microblogs. It’s really good. Need to drill down into this one. Page rank of user lists is used to determine credible experts.
- Starting on Crowdsourcing.
Code
- Changed fitnessTest() to work with adjusting statements based on similarity/dissimilarity between two agents.
- Added slider for similarity.
- Fixed some reloading bugs
- Found Emily’s WEKA blog

Phil 8.19.16

7:00 – 3:30 ASRC

Wrote up the action items from the discussion with Thom last night. Now that I have the committee’s initial input, I need to write up an email and re-distribute. Done.
Had a thought about the initial GP model. In the fitness test, look for beliefs that are more than ATTRACTION_THRESHOLD similar and be more like them. Possibly look for beliefs that are less than REPULSION_THRESHOLD similar and make the anti-belief more like them. If a statement exists in both belief and antibelief, delete the lowest ranked item, or choose randomly.
- Working through the logic in beliefMain. I’m just *slow* today.
- Think I got it. Had to write a method ‘rectifyBeliefs’ that goes in BaseBeliefCA and makes sure that beliefs and antibeliefs don’t overlap. And it’s late enough in the day that I don’t want to try it in the full sim.
Working through the fact-checking section
Submitted ACM, ICA and UMBC reimbursement requests.

viztales

Dimension reduction, State, Orientation, and Speed

Phil 9.9.16

Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration

Phil 9.8.16

Phil 9.7.16

Phil 9.6.16

Phil 9.1.16

Phil 8.31.16

Phil 8.30.16

Phil 8.29.16

Phil 8.26.16

Phil 8.25.16

Phil 8.24.16

Phil 8.23.16

Phil 8.22.16

Phil 8.19.16