7:00 – 3:30 ASRC
- Adding in Wayne’s comments.
- Got the Corpus generating arff files for BagOfWords and TF-IDF.
- Here’s the result for NaiveBayes on the first four chapters of Mobey Dick
-
Correctly Classified Instances 3 75 % Incorrectly Classified Instances 1 25 % Kappa statistic 0.6667 Mean absolute error 0.125 Root mean squared error 0.3536 Relative absolute error 29.1667 % Root relative squared error 71.4435 % Total Number of Instances 4 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1_-_Loomings 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.250 3_-_The_Spouter_Inn 1.000 0.333 0.500 1.000 0.667 0.577 0.833 0.500 2_-_The_Carpet_Bag 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 4_-_The_Counterpane Weighted Avg. 0.750 0.083 0.625 0.750 0.667 0.644 0.833 0.688 === Confusion Matrix === a b c d <-- classified as 1 0 0 0 | a = 1_-_Loomings 0 0 1 0 | b = 3_-_The_Spouter_Inn 0 0 1 0 | c = 2_-_The_Carpet_Bag 0 0 0 1 | d = 4_-_The_Counterpane
- This worked really well: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
- And comparing Jack London stories to Edgar Allen Poe stories works with a corpus of six stories each and not so much with 3 stories each.
