Category Archives: thesis

Phil 12.7.18

7:00 – 4:30 ASRC NASA/PhD

Analyzing Discourse and Text Complexity for Learning and Collaborating

Analyzing Discourse and Text Complexity for Learning and Collaborating

Author: Mihai Dascalu

Notes

  • …informational level, coherence is most frequently accounted by: lexical chains (Morris and Hirst 1991; Barzilay and Elhadad 1997; Lapata and Barzilay 2005) (see 4.3.1 Semantic Distances and Lexical Chains), centering theory (Miltsakaki and Kukich 2000; Grosz et al. 1995) (see 4.2 Discourse Analysis and the Polyphonic Model) in which coherence is established via center continuation, or Latent Semantic Analysis (Foltz et al. 1993, 1998) (see 4.3.2 Semantic Similarity through Tagged LSA) used for measuring the cosine similarity between adjacent phrases
  • Among chat voices there are sequential and transversal relations, highlighting a specific point of view in a counterpointal way, as mentioned in previous work (Trausan-Matu and Rebedea 2009).
  • From a computational perspective, until recently, the goals of discourse analysis in existing approaches oriented towards conversations analysis were to detect topics and links (Adams and Martell 2008), dialog acts (Kontostathis et al. 2009), lexical chains (Dong 2006) or other complex relations (Rose et al. 2008) (see 3.1.3 CSCL Computational Approaches). The polyphonic model takes full advantage of term frequency – inverse document frequency Tf-Jdf (Adams and Martell 2008; Schmidt and Stone), Latent Semantic Analysis (Schmidt and Stone ; Dong 2006), Social Network Analysis (Dong 2006), Machine Learning (e.g., Nai”ve Bayes (Kontostathis et al. 2009), Support Vector Machines and Collin’s perceptron (Joshi and Rose 2007), the TagHelper environment (Rose et al. 2008) and the semantic distances from the lexicalized ontology WordNet (Adams and Martell 2008; Dong 2006). The model starts from identifying words and patterns in utterances that are indicators of cohesion among them and, afterwards, performs an analysis based on the graph, similar in some extent to a social network, and on threads and their interactions.
  • Semantic Distances and Lexical Chains: an ontology consists of a set of concepts specific to a domain and of the relations between pairs of concepts. Starting from the representation of a domain, we can define various distance metrics between concepts based on the defined relationships among them and later on extract lexical chains, specific to a given text that consist of related/cohesive concepts spanning throughout a text fragment or the entire document.
    • Lexicalized Ontologies and Semantic Distances: One of the most commonly used resources for English sense relations in terms of lexicalized ontologies is the WordNet lexical database (Fellbaum 1998; Miller I 995, 2010) that consists of three separate databases, one for nouns, a different one for verbs, and a third one for adjectives and adverbs. WordNet groups words into sets of cognitively related words (synsets), thus describing a network of meaningfully inter-linked words and concepts.
    • Nevertheless, we must also present the limitations of WordNet and of semantic distances, with impact on the development of subsequent systems (see 6 PolyCAFe – Polyphonic Conversation Analysis and Feedback and 7 ReaderBench (I) – Cohesion-based Discourse Analysis and Dialogism): I/ the focus only on common words, without covering any special domain vocabularies; 2/ reduced extensibility as the serialized model makes difficult the addition of new domain-specific concepts or relationships
    • Building the Disambiguation Graph:Lexical chaining derives from textual cohesion (Halliday and Hasan 1976) and involves the selection of related lexical items in a given text (e.g. , starting from Figure 8, the following lexical chain could be generated if all words occur in the initial text fragment: “cheater, person, cause, cheat, deceiver, . .. “). In other words, the lexical cohesive structure of a text can be represented as lexical chaining that consists of sequences of words tied together by semantic relationships and that can span across the entire text or a subsection of it. (Ontology-based chaining formulas on page 63)
    • The types of semantic relations taken into consideration when linking two words are hypernymy, hyponymy, synonymy, antonymy, or whether the words are siblings by sharing a common hypernym. The weights associated with each relation vary according to the strength of the relation and the proximity of the two words in the text analyzed.
  • Semantic Similarity through Tagged LSA: Latent Semantic Analysis (LSA) (Deerwester et al. 1989; Deerwester et al. 1990; Dumais 2004; Landauer and Dumais 1997) is a natural language processing technique starting from a vector-space representation of semantics highlighting the co-occurrence relations between terms and containing documents, after that projecting the terms in sets of concepts (semantic spaces) related to the initial texts. LSA builds the vector-space model, later on used also for evaluating similarity between terms and documents, now indirectly linked through concepts (Landauer et al. 1998a; Manning and Schi.itze 1999). Moreover, LSA can be considered a mathematical method for representing words’ and passages’ meaning by analyzing in an unsupervised manner a representative corpus of natural language texts.
    • In terms of documents size, semantically and topically coherent passages of approximately 50 to 100 words are the optimal units to be taken into consideration while building the initial matrix (Landauer and Dumais 2011).
      • This fits nicely to post size. Also a good design consideration for JuryRoom
    • Therefore, as compromise of all previous NLP specific treatments, the latest version of the implemented tagged LSA model (Dascalu et al. 2013a; Dascalu et al. 2013b) uses lemmas plus their corresponding part-of-speech, after initial input cleaning and stop words elimination.
  • Topic Relatedness through Latent Dirichlet Allocation
    • Starting from the presumption that documents integrate multiple topics, each document can now be considered a random mixture of corpus-wide topics. In order to avoid confusion, an important aspect needs to be addressed: topics within LDA are latent classes, in which every word has a given probability, whereas topics that are identified within subsequently developed systems (A .S.A.P., Ch.A.MP., Po/yCAFe and ReaderBench) are key concepts from the text. Additionally, similar to LSA, LDA also uses the implicit assumption of the bag of words approach that the order of words doesn’t matter when extracting key concepts and similarities of concepts through co-occurrences within a large corpus.
    • Every topic contains a probability for every word, but after the inference phase a remarkable demarcation can be observed between salient or dominant concepts of a topic and all other vocabulary words. In other words, the goal of LDA is to reflect the thematic structure of a document or of a collection through hidden variables and to infer this hidden structure by using a posterior inference model (Blei et al. 2003)
    • there are inevitably estimation errors, more notable when addressing smaller documents or texts with a wider spread of concepts, as the mixture of topics becomes more uncertain

Phil 12.6.18

7:00 – 4:00 ASRC PhD/NASA

  • Looks like Aaron has added two users
  • Create a “coherence” matrix, where the threshold is based on an average of one or more previous cells. The version shown below uses the tf-idf matrix as a source and checks to see if there are any non-zero values within an arbitrary span. If there are, then the target matrix (initialized with zeroes) is incremented by one on that span. This process iterates from a step of one (the default), to the specified step size. As a result, the more contiguous nonzero values are, the larger and more bell-curved the row sequences will be: spreadsheet3
  • Create a “details” sheet that has information about the database, query, parameters, etc. Done.
  • Set up a redirect so that users have to go through the IRB page if they come from outside the antibubbles site
  • It’s the End of News As We Know It (and Facebook Is Feeling Fine)
    • And as the platforms pumped headlines into your feed, they didn’t care whether the “news” was real. They didn’t want that responsibility or expense. Instead, they honed in on engagement—did you click or share, increasing value to advertisers?
      • Diversity (responsibility, expense), Stampede (engagement, share)
  • Finished Analyzing Discourse and Text Complexity for Learning and Collaborating, and created this entry for the notes.
  • Was looking at John Du Bois paper Towards a dialogic syntax, which looks really interesting, but seems like it might be more appropriate for spoken dialog. Instead, I think I’ll go to Claire Cardie‘s presentation on chat argument analysis at UMD tomorrow and see if that has better alignment.
    • Argument Mining with Structured SVMs and RNNs
      • We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

Phil 12.5.18

7:00 – 4:30 ASRC PhD/NASA

Phil 12.4.18

7:00 – 8:00 (13 hrs) ASRC NASA/PhD

  • Put my discourse analysis finds here, so they don’t get lost.
  • Adding a bit more to my post that talks about inertial network behavior
  • Added xmlwriter, since Pandas can’t handle writing out dictionaries, though it can plot them just fine…
  • The test dungeon discourse sequence as a matrix. You can clearly see the three rooms in the top rows. Aaron and I agree that this is a cross-correlation signal processing problem. Next is to gather some real-world dataspreadsheet
  • Discussion with Aaron about next steps with Antonio. Basically say that we’re booked through April, but can review and comment.
  • IEEE Talk by Hai “Helen” Li of Duke University:

     

     

Phil 12.3.18

7:00 – 6:00 ASRC PhD

  • Reading Analyzing Discourse and Text Complexity for Learning and Collaborating, basically to find methods that show important word frequency varying over time.
  • Just in searching around, I also found a bunch of potentially useful resources. I’m emphasizing Python at the moment, because that’s the language I’m using at work right now.
    • 5agado has a bunch of nice articles on Medium, linked to code. In particular, there’s Conversation Analyzer – An Introduction, with associated code.
    • High frequency word entrainment in spoken dialogue
      • Cognitive theories of dialogue hold that entrainment, the automatic alignment between dialogue partners at many levels of linguistic representation, is key to facilitating both production and comprehension in dialogue. In this paper we examine novel types of entrainment in two corpora—Switchboard and the Columbia Games corpus. We examine entrainment in use of high-frequency words (the most common words in the corpus), and its association with dialogue naturalness and flow, as well as with task success. Our results show that such entrainment is predictive of the perceived naturalness of dialogues and is significantly correlated with task success; in overall interaction flow, higher degrees of entrainment are associated with more overlaps and fewer interruptions.
    • Looked some more at the Cornel Toolkit, but it seems focussed on other conversation attributes, with more lexical analysis coming later
    • There is a github topic on discourse-analysis, of which John W. DuBoisrezonator project looks particularly interesting. Need to ask Wayne about how to reach out to someone like that.
      • Recently I’ve been interested in what happens when participants in conversation build off each other, reusing words, structures and other linguistic resources just used by a prior speaker. In dialogic syntax, as I call it, parallelism of structure across utterances foregrounds similarities in function, but also brings out differences. Participants notice even the subtlest contrasts in stance–epistemic, affective, illocutionary, and so on–generated by the resonance between juxtaposed utterances. The theories of dialogic syntax and stance are closely related, and I’m currently working on exploring this linkage–one more example of figuring out how language works on multiple levels simultaneously, uniting structure, meaning, cognition, and social interaction.
  • From Computational Propaganda: If You Make It Trend, You Make It True
    • As an example, searching for “Vitamin K shot” (a routine health intervention for newborns) returns almost entirely anti-vaccine propaganda; anti-vaccine conspiracists write prolific quantities of content about that keyword, actively selling the myth that the shot is harmful, causes cancer, causes SIDS. Searches for the phrase are sparse because medical authorities are not producing counter-content or fighting the SEO battle in response.
    • This is literally a use case where a mapping interface would show that something funny was going on in this belief space
  • Yuanyuan’s proposal defense
    • Surgical telementoring, trainee performing the operation is monitored remotely by expert.
    • These are physical models!
    • Manual coding
    • Tracks communication intention, not lexical content
    • Linear Mixed Model
      • Linear mixed models are an extension of simple linear models to allow both fixed and random effects, and are particularly used when there is non independence in the data, such as arises from a hierarchical structure. For example, students could be sampled from within classrooms, or patients from within doctors.
    • DiCoT: a methodology for applying Distributed Cognition to the design of team working systems <– might be worth looking at for dungeon teams
    • Note, a wireless headset mic is nice if there are remote participants and you need to move around the room
    • GLIMMPSE power analysis
  • Add list of publications to the dissertation?
  • Good meeting with Wayne. Brought him up to speed on antibubbles.com. We discussed chiplay 2019 as a good next venue. We also went over what the iConference presentation might be. More as this develope, since it’s not all that clear. Certainly a larger emphasis on video. Also, it will be in the first batch of presentations.

Phil 11.30.18

7:00 – 3:00 ASRC NASA

  • Started Second Person, and learned about GURPS
  • Added a section on navigating belief places and spaces to the dissertation
  • It looks like I’m doing Computational Discourse Analysis, which has more to do with how the words in a discussion shift over time. Requested this chapter through ILL
  • Looking at Cornell Conversational Analysis Toolkit
  • More Grokking today so I don’t lose too much focus on understanding NNs
        • Important numpy rules:
          import numpy as np
          
          val = np.array([[0.6]])
          row = np.array([[-0.59, 0.75, -0.94,0.34 ]])
          col = np.array([[-0.59], [ 0.75], [-0.94], [ 0.34]])
          
          print ("np.dot({}, {}) = {}".format(val, row, np.dot(val, row)))
          print ("np.dot({}, {}) = {}".format(col, val, np.dot(col, val)))
          
          '''
          note the very different results:
          np.dot([[0.6]], [[-0.59  0.75 -0.94  0.34]]) = [[-0.354  0.45  -0.564  0.204]]
          np.dot([[-0.59], [ 0.75], [-0.94], [ 0.34]], [[0.6]]) = [[-0.354], [ 0.45 ], [-0.564], [ 0.204]]
          '''
        • So here’s the tricky bit that I don’t get yet
          # Multiply the values of the relu'd layer [[0, 0.517, 0, 0]] by the goal-output_layer [.61]
          weight_mat = np.dot(layer_1_col_array, layer_1_to_output_delta) # e.g. [[0], [0.31], [0], [0]]
          weights_layer_1_to_output_col_array += alpha * weight_mat # add the scaled deltas in
          
          # Multiply the streetlights [[1], [0], [1] times the relu2deriv'd input_to_layer_1_delta [[0, 0.45, 0, 0]]
          weight_mat = np.dot(input_layer_col_array, input_to_layer_1_delta) # e.g. [[0, 0.45, 0, 0], [0, 0, 0, 0], [0, 0.45, 0, 0]]
          weights_input_to_layer_1_array += alpha * weight_mat # add the scaled deltas in
        • It looks to me that as we work back from the output layer, we multiply our layer’s weights by the manipulated (relu in this case) for the last layer, and the derivative in the next layer forward?  I know that we are working out how to distribute the adjustment of the weights via something like the chain rule…

       

Phil 11.29.18

7:00 – 4:30 ASRC PhD/NASA

    • Listening to repeat of America Abroad Sowing Chaos: Russia’s Disinformation Wars. My original notes are here
    • Finished World without End: The Delta Green Open Campaign Setting, by A. Scott Glancey
      • Overall, this describes the creation of the cannon of the Delta Green playspace. The goal as described was to root the work in existing fiction (Lovecraft’s Cthulhu) and historical fact. This provides the core of the space that players can move out from or fill in. Play does not produce more cannon, so it produces a trajectory that may have high influence for the actual players, but may not move beyond that. The article discusses Agent Angela, as an example of a thumbnail sketch that has become a mythical character, independent of the work of the authors with respect to Cannon. My guess is as the Agent Angela space became “stiffer” that it could also be shared more.
      • As a role-playing game, Delta Green’s narrative differs from the traditional narratives of literature, theater, and film because it offers only plot without characters to drive the story forward. It’s up to the role-players to provide the characters. Role-playing game settings are narratives not built around any specific protagonist, yet capable of accommodating multiple protagonists. Thus, role-playing games, particularly the classic paper-and-dice ones, are by their very nature vast narratives. (page 77)
      • During the designing of the Delta Green vast narrative it was decided that we would publish more open-ended source material than scenarios. Source material is usually built around an enemy of Delta Green with a particular agenda or set of goals, much like a traditional role-playing game scenario is set up, only without the framework of scenes and set pieces designed to channel the players through to a resolution of the scenario. The reason for emphasizing open ended source material over scenarios is that we were trying to encourage Keepers to design their own scenarios without pinning them down with too much canon. That is always a danger with creating a role-playing game background. You want to create a rich environment, but you don’t want to fill in so many details that there is nothing new for the players and Keepers to create with their own games. (Page 81)
      • If the players in a role-playing game campaign start to think that their characters are more disposable than the villain, they are going to feel marginalized After all, whose story is this-theirs or a non-player character’s? The fastest way to alienate a group of players is to give them the impression that they are not the center of the story. If they are not the ones driving the action forward, then what’s the point in playing a role-playing game? They might as well be watching a movie if they cannot affect the pacing, action, and outcome of a story. (Page 83)
    • Going to create a bag of words collection for post subjects and posts that are not from the DM, and then plot the use of the words over time (by sequential post). I think that once stop words are removed, that patterns might be visible.
      • Pulling out the words
      • Have the overall counts
      • Building the count mats
      • Stop words worked, needed to drop punctuation and caps
    • Yoast has an array that looks immediately usable:
      [ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do", "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's", "its", "itself", "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd", "she'll", "she's", "should", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "we", "we'd", "we'll", "we're", "we've", "were", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "would", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves" ]
    • Good, progress. I’m using TF-IDF to determine the importance of the term in the timeline. That’s ok, but not great. Here’s a plot: room_terms
    • You can see the three rooms, but they don’t stand out all that well. Maybe a low-pass filter on top of this? Anyway, done for the day.

 

Phil 11.28.18

7:00 – 4:00 ASRC PhD

    • Made so much progress yesterday that I’m not sure what to do next. Going to see if I can run queries against the DB in Python for a start, and then look at the Stanford tools.
      • installed pymysql (in lowercase. There is also a CamelCase version PyMySQL, that seems to be the same thing…)
      • Piece of cake! Here’s the test code:
        import pymysql
        
        class forum_reader:
            connection: pymysql.connections.Connection
        
            def __init__(self, user_name: str, user_password: str, db_name: str):
                print("initializing")
                self.connection = pymysql.connect(host='localhost', user=user_name, password=user_password, db=db_name)
        
            def read_data(self, sql_str: str) -> str:
                with self.connection.cursor() as cursor:
                    cursor.execute(sql_str)
                    result = cursor.fetchall()
                    return"{}".format(result)
        
            def close(self):
                self.connection.close()
        if __name__ == '__main__':
            fr = forum_reader("some_user", "some_pswd", "some_db")
            print(fr.read_data("select topic_id, forum_id, topic_title from phpbb_topics"))
      • And here’s the result:
        initializing
        ((4, 14, 'SUBJECT: 3 Room Linear Dungeon Test 1'),)
      • Note that this is not an object db, which I prefer, but since this is a pre-existing schema, that’s what I’ll be doing. Going to look for a way to turn a query into an object anyway. But it turns out that you can do this:
        self.connection = pymysql.connect(
            host='localhost', user=user_name, password=user_password, db=db_name,
            cursorclass=pymysql.cursors.DictCursor)
      • Which returns as an array of JSON objects:
        [{'topic_id': 4, 'forum_id': 14, 'topic_title': 'SUBJECT: 3 Room Linear Dungeon Test 1'}]
    • Built a MySQL view to get all the data back in one shot:
      CREATE or REPLACE VIEW post_view AS
      SELECT p.post_id, FROM_UNIXTIME(p.post_time) as post_time, p.topic_id, t.topic_title, t.forum_id, f.forum_name, u.username, p.poster_ip, p.post_subject, p.post_text
        FROM phpbb_posts p
        INNER JOIN phpbb.phpbb_forums f ON p.forum_id=f.forum_id
        INNER JOIN phpbb.phpbb_topics t ON p.topic_id=t.topic_id
        INNER JOIN phpbb.phpbb_users u ON p.poster_id=u.user_id;
    • And that works like a charm in the Python code:
      [{
      	'post_id': 4,
      	'post_time': datetime.datetime(2018, 11, 27, 16, 0, 27),
      	'topic_id': 4,
      	'topic_title': 'SUBJECT: 3 Room Linear Dungeon Test 1',
      	'forum_id': 14,
      	'forum_name': 'DB Test',
      	'username': 'dungeon_master1',
      	'poster_ip': '71.244.249.217',
      	'post_subject': 'SUBJECT: 3 Room Linear Dungeon Test 1',
      	'post_text': 'POST: dungeon_master1 says that you are about to take on a 3-room linear dungeon.'
      }]

       

  • Tricia Wang thick data <- add some discussion about this with respect to gathering RPG data
  • Spend some time Grokking as well. Need to nail down backpropagation. Not today
  • Long discussions with Aaron about the structure of TimeSeriesML. Including looking at FFTs for the initial analytics.
  • A2P/AIMS meeting
    • Terrabytes of AIMS data?

Progress for today 🙂 ide

Phil 11.27.18

7:00 – 5:00 ASRC PhD

  • Statistical physics of liquid brains
    • Liquid neural networks (or ”liquid brains”) are a widespread class of cognitive living networks characterised by a common feature: the agents (ants or immune cells, for example) move in space. Thus, no fixed, long-term agent-agent connections are maintained, in contrast with standard neural systems. How is this class of systems capable of displaying cognitive abilities, from learning to decision-making? In this paper, the collective dynamics, memory and learning properties of liquid brains is explored under the perspective of statistical physics. Using a comparative approach, we review the generic properties of three large classes of systems, namely: standard neural networks (”solid brains”), ant colonies and the immune system. It is shown that, despite their intrinsic physical differences, these systems share key properties with standard neural systems in terms of formal descriptions, but strongly depart in other ways. On one hand, the attractors found in liquid brains are not always based on connection weights but instead on population abundances. However, some liquid systems use fluctuations in ways similar to those found in cortical networks, suggesting a relevant role of criticality as a way of rapidly reacting to external signals.
  • Amazon is releasing a robot cloud dev environment with simulators:
    • AWS RoboMaker’s robotics simulation makes it easy to set up large-scale and parallel simulations with pre-built worlds, such as indoor rooms, retail stores, and racing tracks, so developers can test their applications on-demand and run multiple simulations in parallel. AWS RoboMaker’s fleet management integrates with AWS Greengrass and supports over-the-air (OTA) deployment of robotics applications from the development environment onto the robot. 
  • Working on script generator. Here’s the initial output:
    SUBJECT: dungeon_master1's introduction to the dungeon
    	POST: dungeon_master1 says that you are about to take on a 3-room linear dungeon.
    
    SUBJECT: dungeon_master1's introduction to room_0
    	 POST: dungeon_master1 says, The party now finds itself in room_0. There is a troll here.
    	 SUBJECT: Asra_Rogueplayer's move in room_0
    		 POST: Asra_Rogueplayer runs from the troll in room_0.
    	 SUBJECT: Ping_Clericplayer's move in room_0
    		 POST: Ping_Clericplayer walks towards the troll in room_0.
    	 SUBJECT: Valen_Fighterplayer's move in room_0
    		 POST: Valen_Fighterplayer reasons with the troll in room_0.
    	 SUBJECT: Emmi_MonkPlayer's move in room_0
    		 POST: Emmi_MonkPlayer walks towards the troll in room_0.
    	 SUBJECT: Avia_Bardplayer's move in room_0
    		 POST: Avia_Bardplayer casts a spell at the troll in room_0.
    	 SUBJECT: Mirek_Thiefplayer's move in room_0
    		 POST: Mirek_Thiefplayer casts a spell at the troll in room_0.
    	 SUBJECT: Lino_Magicplayer's move in room_0
    		 POST: Lino_Magicplayer casts a spell at the troll in room_0.
    SUBJECT: dungeon_master1's conclusion for room_0
    	 POST: dungeon_master1 says that you have triumphed in the challenge of room_0.
    
    SUBJECT: dungeon_master1's introduction to room_1
    	 POST: dungeon_master1 says, The party now finds itself in room_1. There is an idol here.
    	 SUBJECT: Asra_Rogueplayer's move in room_1
    		 POST: Asra_Rogueplayer knocks out the idol in room_1.
    	 SUBJECT: Ping_Clericplayer's move in room_1
    		 POST: Ping_Clericplayer walks towards the idol in room_1.
    	 SUBJECT: Valen_Fighterplayer's move in room_1
    		 POST: Valen_Fighterplayer casts a spell at the idol in room_1.
    	 SUBJECT: Emmi_MonkPlayer's move in room_1
    		 POST: Emmi_MonkPlayer examines the idol in room_1.
    	 SUBJECT: Avia_Bardplayer's move in room_1
    		 POST: Avia_Bardplayer sneaks by the idol in room_1.
    	 SUBJECT: Mirek_Thiefplayer's move in room_1
    		 POST: Mirek_Thiefplayer sneaks by the idol in room_1.
    	 SUBJECT: Lino_Magicplayer's move in room_1
    		 POST: Lino_Magicplayer runs from the idol in room_1.
    SUBJECT: dungeon_master1's conclusion for room_1
    	 POST: dungeon_master1 says that you have triumphed in the challenge of room_1.
    
    SUBJECT: dungeon_master1's introduction to room_2
    	 POST: dungeon_master1 says, The party now finds itself in room_2. There is an orc here.
    	 SUBJECT: Asra_Rogueplayer's move in room_2
    		 POST: Asra_Rogueplayer casts a spell at the orc in room_2.
    	 SUBJECT: Ping_Clericplayer's move in room_2
    		 POST: Ping_Clericplayer reasons with the orc in room_2.
    	 SUBJECT: Valen_Fighterplayer's move in room_2
    		 POST: Valen_Fighterplayer knocks out the orc in room_2.
    	 SUBJECT: Emmi_MonkPlayer's move in room_2
    		 POST: Emmi_MonkPlayer runs from the orc in room_2.
    	 SUBJECT: Avia_Bardplayer's move in room_2
    		 POST: Avia_Bardplayer walks towards the orc in room_2.
    	 SUBJECT: Mirek_Thiefplayer's move in room_2
    		 POST: Mirek_Thiefplayer distracts the orc in room_2.
    	 SUBJECT: Lino_Magicplayer's move in room_2
    		 POST: Lino_Magicplayer examines the orc in room_2.
    SUBJECT: dungeon_master1's conclusion for room_2
    	 POST: dungeon_master1 says that you have triumphed in the challenge of room_2.
    
    SUBJECT: dungeon_master1's conclusion
    	POST: dungeon_master1 says that you have triumphed in the challenge of the 3-room linear dungeon.
  • And here are the users. We’ll have to have multiple browsers running anonymous mode to have all these active simultaneously. users
  • Data! data.PNG

Phil 11.26.18

7:00 – 5:00ASRC PhD

  • Had a thought that simulation plus diversity might be an effective way of increasing system resilience. This is based on the discussion of Apollo 13 in Normal Accidents
  • Start folding in content from simulation papers. Don’t worry about coherence yet
  • Start figuring out PHPbb
    • Working on the IRB form – done
    • Set user creation to admin-approved – done
    • Create easily identifiable players
      • Asra Rogueplayer
      • Ping Clericplayer
      • Valen Fighterplayer
      • Emmi MonkPlayer
      • Avia Bardplayer
      • Mirek Thiefplayer
      • Lino Magicplayer
      • Daz Dmplayer
    • Some notes on play by post
    • Added Aaron as a founder. He’s set up the overall structure: dungeon
    • Add easily identifiable content. Working. Set up the AntibubblesDungeon as a python project. I’m going to write a script generator that we will then use to paste in content. Then back up and download the database and run queries on it locally.

Phil 11.24.18

Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time

  • We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people’s reactions to social and natural events of different spatial and temporal scales.

Phil 11.23.18

8:00 – 3:00 ASRC PhD

  • A Map of Knowledge
    • Knowledge representation has gained in relevance as data from the ubiquitous digitization of behaviors amass and academia and industry seek methods to understand and reason about the information they encode. Success in this pursuit has emerged with data from natural language, where skip-grams and other linear connectionist models of distributed representation have surfaced scrutable relational structures which have also served as artifacts of anthropological interest. Natural language is, however, only a fraction of the big data deluge. Here we show that latent semantic structure, comprised of elements from digital records of our interactions, can be informed by behavioral data and that domain knowledge can be extracted from this structure through visualization and a novel mapping of the literal descriptions of elements onto this behaviorally informed representation. We use the course enrollment behaviors of 124,000 students at a public university to learn vector representations of its courses. From these behaviorally informed representations, a notable 88% of course attribute information were recovered (e.g., department and division), as well as 40% of course relationships constructed from prior domain knowledge and evaluated by analogy (e.g., Math 1B is to Math H1B as Physics 7B is to Physics H7B). To aid in interpretation of the learned structure, we create a semantic interpolation, translating course vectors to a bag-of-words of their respective catalog descriptions. We find that the representations learned from enrollments resolved course vectors to a level of semantic fidelity exceeding that of their catalog descriptions, depicting a vector space of high conceptual rationality. We end with a discussion of the possible mechanisms by which this knowledge structure may be informed and its implications for data science.
  • Set up PHP BB and see how accessible the data is.
  • Found an error in the iConf paper standalone/complex/monolithic figure. Fixed for ArXive
  • Set up the dissertation document in LaTex so that I can start putting things in it. Done! In subversion. Used the UMD template here: Thesis & Dissertation Filing, which is the same as the UMBC format listed here: Thesis & Dissertation

Phil 11.22.18

Listening to How CRISPR Gene Editing Is Changing the World, where Jennifer Kahn discusses the concept of Fitness Cost, where mutations (CRISPR or otherwise) often decrease the fitness of the modified organism. I’m thinking that this relates to the conflicting fitness mechanisms of diverse and monolithic systems. Diverse systems are resilient in the long run. Monolithic systems are effective in the short run. That stochastic interaction between those two time scales is what makes the problem of authoritarianism so hard.

Fitness cost is explicitly modeled here: Kinship, reciprocity and synergism in the evolution of social behaviour

  • There are two ways to model the genetic evolution of social behaviour. Population genetic models using personal fitness may be exact and of wide applicability, but they are often complex and assume very different forms for different kinds of social behaviour. The alternative, inclusive fitness models, achieves simplicity and clarity by attributing all fitness effects of a behaviour to an expanded fitness of the actor. For example, Hamilton’s rule states that an altruistic behaviour will be favoured when -c + rb > 0, where c is the fitness cost to the altruist, b is the benefit to Its partner, and r is their relatedness. But inclusive fitness results are often inexact for interactions between kin, and they do not address phenomena such as reciprocity and synergistic effects that may either be confounded with kinship or operate in its absence. Here I develop a model the results of which may be expressed in terms of either personal or inclusive fitness, and which combines the advantages of both; it Is general, exact, simple and empirically useful. Hamilton’s rule is shown to hold for reciprocity as well as kin selection. It fails because of synergistic effects, but this failure can be corrected through the use of coefficients of synergism, which are analogous to the coefficient of relatedness.

The spread of low-credibility content by social bots

  • The massive spread of digital misinformation has been identified as a major threat to democracies. Communication, cognitive, social, and computer scientists are studying the complex causes for the viral diffusion of misinformation, while online platforms are beginning to deploy countermeasures. Little systematic, data-based evidence has been published to guide these efforts. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during ten months in 2016 and 2017. We find evidence that social bots played a disproportionate role in spreading articles from low-credibility sources. Bots amplify such content in the early spreading moments, before an article goes viral. They also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, resharing content posted by bots. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

Using Machine Learning to map the field of Collective Intelligence research cluster_enhance-width-1200

  • As part of our new research programme we have used machine learning and literature search to map key trends in collective intelligence research. This helps us build on the existing body of knowledge on collective intelligence, as well as identify some of the gaps in research that can be addressed to advance the field.

Working on 810 meta-reviews today. Done-ish!