Category Archives: research

Phil 1.17.19

7:00 – 3:30 ASRC PhD, NASA

  • Lyrn.AI – Deep Learning Explained
  • Re-learning how to code in PHP again, which is easier if you’ve been doing a lot of C++/Java and not so much if you’ve been doing Python. Anyway, I wrote a small class:
    class DbIO2 {
        protected $connection = NULL;
    
        function connect($db_hostname, $db_username, $db_password, $db_database){
            $toReturn = array();
            $this->connection = new mysqli($db_hostname, $db_username, $db_password, $db_database);
            if($this->connection->connect_error){
                $toReturn['connect_successful'] = false;
                $toReturn['connect_error'] = $this->connection->error;
            } else {
                $toReturn['connect_successful'] = true;
            }
            return $toReturn;
        }
    
    
        function runQuery($query) {
            $toReturn = array();
            if($query == null){
                $toReturn['query_error'] = "query is empty";
                return $toReturn;
            }
            $result = $this->connection->query($query);
    
            if (!$result) {
                $toReturn['database_access'] = $this->connection->error;
                return $toReturn;
            }
    
            $numRows = $result->num_rows;
    
            for ($j = 0 ; $j < $numRows ; ++$j)         {             $result->data_seek($j);
                $row = $result->fetch_assoc();
                $toReturn[$j] = $row;
            }
            return $toReturn;
        }
    }
  • And exercised it
    require_once '../../phpFiles/ro_login.php';
    require_once '../libs/io2.php';
    
    $dbio = new DbIO2();
    
    $result = $dbio->connect($db_hostname, $db_username, $db_password, $db_database);
    
    printf ("%s\n",json_encode($result));
    
    $result = $dbio->runQuery("select * from post_view");
    
    foreach ($result as $row)
        printf ("%s\n", json_encode($row));
  • Which gave me some results
    {"connect_successful":true}
    {"post_id":"4","post_time":"2018-11-27 16:00:27","topic_id":"4","topic_title":"SUBJECT: 3 Room Linear Dungeon Test 1","forum_id":"14","forum_name":"DB Test","username":"dungeon_master1","poster_ip":"71.244.249.217","post_subject":"SUBJECT: 3 Room Linear Dungeon Test 1","post_text":"POST: dungeon_master1 says that you are about to take on a 3-room linear dungeon."}
    {"post_id":"5","post_time":"2018-11-27 16:09:12","topic_id":"4","topic_title":"SUBJECT: 3 Room Linear Dungeon Test 1","forum_id":"14","forum_name":"DB Test","username":"dungeon_master1","poster_ip":"71.244.249.217","post_subject":"SUBJECT: dungeon_master1's introduction to room_0","post_text":"POST: dungeon_master1 says, The party now finds itself in room_0. There is a troll here."}
    (repeat for another 200+ lines)
  • So I’m well on my way to being able to show the stories (both from the phpbb and slack) on the Antibubbles “stories” page

4:00 – 5:00 Meeting with Don

Phil 1.16.18

7:00 – 5:00 ASRC NASA

  • Starting to take a deep look at Slack as another Antibubbles RPG dungeon. From yesterday’s post
    • You can download conversations as JSON files, and I’d need to build (or find) a dice bot.
    • Created Antibubbles.slack.com
    • Ok, getting at the data is trivial. An admin can just go to antibubbles.slack.com/services/export. You get a nice zip file that contains everything that you need to reconstruct users and conversations: slack
    • The data is pretty straightforward too. Here’s the JSON file that has my first post in test-dungeon-1:
      {
              "client_msg_id": "41744548-2c8c-4b7e-b01a-f7cba402a14e",
              "type": "message",
              "text": "SUBJECT: dungeon_master1's introduction to the dungeon\n\tPOST: dungeon_master1 says that you are about to take on a 3-room linear dungeon.",
              "user": "UFG26JUS3",
              "ts": "1547641117.000400"
          }

      So we have the dungeon (the directory/file), unique id for message and user, the text and a timestamp. I’m going to do a bit more reading and then look into getting the Chat & Slash App.

    • Looking at the Workspace Admin page. Trying to see where the IRB can be presented.
  • More work on getting the historical data put into a reasonable format. Put together a spreadsheet with the charts for all permutations of fundcode/project/contractfor discussion tomorrow.
  • Updated the AI for social good proposal. Need to get the letter signed by mayself and Aaron tomorrow.
  • Pytorch tutorial, with better variable names than usual

Phil 1.15.19

7:00 – 3:00 ASRC NASA

  • Cool antibubbles thing: artboard 1
  • Also, I looked into a Slack version of Antibubbles. You can download conversations as JSON files, and I’d need to build (or find) a dice bot.
  • Fake News, Real Money: Ad Tech Platforms, Profit-Driven Hoaxes, and the Business of Journalism
    • Following the viral spread of hoax political news in the lead-up to the 2016 US presidential election, it’s been reported that at least some of the individuals publishing these stories made substantial sums of money—tens of thousands of US dollars—from their efforts. Whether or not such hoax stories are ultimately revealed to have had a persuasive impact on the electorate, they raise important normative questions about the underlying media infrastructures and industries—ad tech firms, programmatic advertising exchanges, etc.—that apparently created a lucrative incentive structure for “fake news” publishers. Legitimate ad-supported news organizations rely on the same infrastructure and industries for their livelihood. Thus, as traditional advertising subsidies for news have begun to collapse in the era of online advertising, it’s important to understand how attempts to deal with for-profit hoaxes might simultaneously impact legitimate news organizations. Through 20 interviews with stakeholders in online advertising, this study looks at how the programmatic advertising industry understands “fake news,” how it conceptualizes and grapples with the use of its tools by hoax publishers to generate revenue, and how its approach to the issue may ultimately contribute to reshaping the financial underpinnings of the digital journalism industry that depends on the same economic infrastructure.
  • The structured backbone of temporal social ties
    • In many data sets, information on the structure and temporality of a system coexists with noise and non-essential elements. In networked systems for instance, some edges might be non-essential or exist only by chance. Filtering them out and extracting a set of relevant connections is a non-trivial task. Moreover, mehods put forward until now do not deal with time-resolved network data, which have become increasingly available. Here we develop a method for filtering temporal network data, by defining an adequate temporal null model that allows us to identify pairs of nodes having more interactions than expected given their activities: the significant ties. Moreover, our method can assign a significance to complex structures such as triads of simultaneous interactions, an impossible task for methods based on static representations. Our results hint at ways to represent temporal networks for use in data-driven models.
  • Brandon RohrerData Science and Robots
  • Physical appt?
  • Working on getting the histories calculated and built
    • Best contracts are: contract 4 = 6, contract 5 = 9,  contract 12 = 10, contract 18 = 140
    • Lots of discussion on how exactly to do this. I think at this point I’m waiting on Heath to pull some new data that I can then export to Excel and play with to see the best way of doing things

Phil 1.14.19

7:00 – 5:00 ASRC NASA

  • Artificial Intelligence in the Age of Neural Networks and Brain Computing
    • Artificial Intelligence in the Age of Neural Networks and Brain Computing demonstrates that existing disruptive implications and applications of AI is a development of the unique attributes of neural networks, mainly machine learning, distributed architectures, massive parallel processing, black-box inference, intrinsic nonlinearity and smart autonomous search engines. The book covers the major basic ideas of brain-like computing behind AI, provides a framework to deep learning, and launches novel and intriguing paradigms as future alternatives.
  • Sent Aaron Mannes the iConference and SASO papers
  • Work on text analytics
    • Extract data by groups, group, user and start looking at cross-correlations
      • Continued modifying post_analyzer.py
      • Commenting out TF-IDF and coherence for a while?
  • Registered for iConference
  • Renew passport!
  • Current thinking on the schema. db_diagram
  • Making progress on the python to write lineitems and prediction history entries
  • Meeting with Don
    • Got most of the paperwork in line and then went over the proposal. I need to make changes to the text based on Don’t suggestions

Phil 1.11.18

7:00 – 5:00 ASRC NASA

  • The Philosopher Redefining Equality (New Yorker profile of Elizabeth Anderson)
    • She takes great pleasure in arranging information in useful forms; if she weren’t a philosopher, she thinks, she’d like to be a mapmaker, or a curator of archeological displays in museums.
  • Trolling the U.S.: Q&A on Russian Interference in the 2016 Presidential Election
    • Ryan Boyd and researchers from Carnegie Mellon University and Microsoft Research analyzed Facebook ads and Twitter troll accounts run by Russia’s Internet Research Agency (IRA) to determine how people with differing political ideologies were targeted and pitted against each other through this “largely unsophisticated and low-budget” operation. To learn more about the study and its findings, we asked Boyd the following questions:
    • Boyd is an interesting guy. Here’s his twitter profile: Social/Personality Psychologist, Computational Social Scientist, and Occasional Software Developer.
  • Applied for an invite to the TF Dev summit
  • Work on text analytics?
    • Extract data by groups, group, user and start looking at cross-correlations
      • Started modifying post_analyzer.py
    • PHP “story” generator?
    • Updating IntelliJ
  • More DB work

Phil 12.17.18

7:00 – 4:30 ASRC NASA/PhD

  • Ted Radio Hour interview with Margaret Heffernan, who spoke about her book, Willful Blindness:
    • “Companies that have been studied for willful blindness can be asked questions like, are there issues at work that people are afraid to raise? And when academics have done studies like this of corporations in the United States, what they find is 85 percent of people say yes. Eighty-five percent of people know there’s a problem, but they won’t say anything. And when I duplicated the research in Europe, asking all the same questions, I found exactly the same number. And what’s really interesting is that when I go to companies in Switzerland, they tell me this is a uniquely Swiss problem. And when I go to Germany, they say, oh yes, this is the German disease. And when I go to companies in England they say, oh yeah, the British are really bad at this. And the truth is, this is a human problem. We’re all, under certain circumstances, willfully blind.”
    • I’ve been thinking about this a lot because when I say, well, why don’t people speak up? What I get is, oh, it’s the culture. And I think, well, what is the culture? The culture is the accumulation of everybody’s actions. And in many of the organizations I work with, change starts in very unexpected places because people just decide, I want to do this or I want to try this. And then they discover they don’t get shot. And then they discover that, actually, now, they’ve got a really exciting project. You know, I think the most dangerous thing in organizations is silence. It’s all those brains whizzing around full of observations and insight and ideas that are not being articulated.
    • I think that that the 15% who do speak out are Nomads. They are mis-aligned with the culture and as such it’s 1) Easier to see problems and solutions. 2) an inability to not behave independently.
  • Bayesian Layers: A Module for Neural Network Uncertainty
    • We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with layers capturing uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations (“stochastic output layers”), and the function itself (Gaussian processes). With reversible layers, one can also propagate uncertainty from input to output such as for flow-based distributions and constant-memory backpropagation. Bayesian Layers are a drop-in replacement for other layers, maintaining core features that one typically desires for experimentation. As demonstration, we fit a 10-billion parameter “Bayesian Transformer” on 512 TPUv2 cores, which replaces attention layers with their Bayesian counterpart.
  • Continuing with Normal Accidents
  • Nice interactive on disinformation on Twitter
  • The universal decay of collective memory and attention
    • Collective memory and attention are sustained by two channels: oral communication (communicative memory) and the physical recording of information (cultural memory). Here, we use data on the citation of academic articles and patents, and on the online attention received by songs, movies and biographies, to describe the temporal decay of the attention received by cultural products. We show that, once we isolate the temporal dimension of the decay, the attention received by cultural products decays following a universal biexponential function. We explain this universality by proposing a mathematical model based on communicative and cultural memory, which fits the data better than previously proposed log-normal and exponential models. Our results reveal that biographies remain in our communicative memory the longest (20–30 years) and music the shortest (about 5.6 years). These findings show that the average attention received by cultural products decays following a universal biexponential function.
  • Zach walkthough
    • Yarn Workspaces
    • NextJS – Tools for developing React Apps – check the github repo to see, for example, how to roll your own web server
    • REACT hooks api
  • Got the basic recursion piece of the optimizer working right. Works for ints, floats, and strings:
    def cascading_step(self):
        self.cur_val = self.range_array[self.index]
        print("{} cur_val = {}".format(self.name, self.cur_val))
    
        child_complete = True
        if self.child:
            child_complete = self.child.cascading_step()
    
        if child_complete:
            self.index += 1
            if self.index >= len(self.range_array):
                self.index = 0
                return True
        return False
  • And here’s the first working test:
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_0
    step 0 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_1
    step 1 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_2
    step 2 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_0
    v1 cur_val = v1_3
    step 3 -----------
    v3 cur_val = v3_0
    v2 cur_val = v2_1
    v1 cur_val = v1_0

     

Phil 12.13.18

7:00 – 4:00 ASRC PhD/NASA

  • BBC Business Daily on making decisions under uncertainty. In particular, David Tuckett (Scholar), professor and director of the Centre for the Study of Decision-Making Uncertainty at University College London talks about how we reduce our sense of uncertainty by telling ourselves stories that we can then align with. This reminds me of how conspiracy theories develop, in particular the remarkable storyline of QAnon.
  • More Normal Accident review
  • NYTimes on frictionless design being a problem
  • Dungeon processing – broke out three workbooks for queries with all players, no dm, and just the dm. Also need to write up some code that generates the story on html.
  • Backprop debugging. I think it works? class_error
  • Here’s the core of the forward (train) and backpropagation (learn) code:
    def train(self):
        if self.source != None:
            src = self.source
            self.neuron_row_array = np.dot(src.neuron_row_array, src.weight_row_mat)
            if(self.target != None): # No activation function to output layer
                self.neuron_row_array = relu(self.neuron_row_array) # TODO: use passed-in activation function
            self.neuron_col_array = self.neuron_row_array.T
    
    def learn(self, alpha):
        if self.source != None:
            src = self.source
            delta_scalar = np.dot(self.delta, src.weight_col_mat)
            delta_threshold = relu2deriv(src.neuron_row_array) # TODO: use passed in derivative function
            src.delta = delta_scalar * delta_threshold
            mat = np.dot(src.neuron_col_array, self.delta)
            src.weight_row_mat += alpha * mat
            src.weight_col_mat = src.weight_row_mat.T
  • And here’s the evaluation:
  • --------------evaluation
    input: [[1. 0. 1.]] = pred: 0.983 vs. actual:[1]
    input: [[0. 1. 1.]] = pred: 0.967 vs. actual:[1]
    input: [[0. 0. 1.]] = pred: -0.020 vs. actual:[0]
    input: [[1. 1. 1.]] = pred: 0.000 vs. actual:[0]

Phil 12.12.18

7:00 – 4:30 ASRC NASA/PhD

  • Do a dungeon analytic with new posts and DM for Aaron – done!
  • Send email to Shimei for registration and meeting after grading is finished
  • Start review of Normal Accidents – started!
  • Debug NN code – in process. Very tricky figuring out the relationships between the layers in backpropagation
  • Sprint planning
  • NASA meeting
  • Talked to Zach about the tagging project. Looks good, but I wonder how much time we’ll have. Got a name though – TaggerML

Phil 12.11.18

7:00 – 4:30 ASRC PhD/NASA

mercator_projection

Somehow, this needs to get into a discussion of the trustworthiness of maps

  • I realized that we can hand-code these initial dungeons, learn a lot and make this a baseline part of the study. This means that we can compare human and machine data extraction for map making. My initial thoughts as to the sequence are:
    • Step 1: Finish running the initial dungeon
    • Step 2: researchers determine a set of common questions that would be appropriate for each room. Something like:
      • Who is the character?
      • Where is the character?
      • What is the character doing?
      • Why is the character doing this?
    • Each answer should also include a section of the text that the reader thinks answers that question. Once this has been worked out on paper, a simple survey website (simpler) can be built that automates this process and supports data collection at moderate scales.
    • Use answers to populate a “Trajectories” sheet in an xml file and build a map!
    • Step 3: Partially automate the extraction to give users a generated survey that lets them select the most likely answer/text for the who/where/what/why questions. Generate more maps!
    • Step 4: Full automation
  • Added these thoughts to the analysis section of the google doc
  • The 11th International Natural Language Generation Conference
    • The INLG conference is the main international forum for the presentation and discussion of all aspects of Natural Language Generation (NLG), including data-to-text, concept-to-text, text-to-text and vision to-text approaches. Special topics of interest for the 2018 edition included:
      • Generating Text with Affect, Style and Personality,
      • Conversational Interfaces, Chatbots and NLG, and
      • Data-driven NLG (including the E2E Generation Challenge)
  • Back to grokking DNNs
    • Still building a SimpleLayer class that will take a set of neurons and create a weight array that will point to the next layer
    • array formatting issues. Tricky
    • I think I’m done enough to start debugging. Tomorrow
  • Sprint review

Phil 12.7.18

7:00 – 4:30 ASRC NASA/PhD

Phil 12.6.18

7:00 – 4:00 ASRC PhD/NASA

  • Looks like Aaron has added two users
  • Create a “coherence” matrix, where the threshold is based on an average of one or more previous cells. The version shown below uses the tf-idf matrix as a source and checks to see if there are any non-zero values within an arbitrary span. If there are, then the target matrix (initialized with zeroes) is incremented by one on that span. This process iterates from a step of one (the default), to the specified step size. As a result, the more contiguous nonzero values are, the larger and more bell-curved the row sequences will be: spreadsheet3
  • Create a “details” sheet that has information about the database, query, parameters, etc. Done.
  • Set up a redirect so that users have to go through the IRB page if they come from outside the antibubbles site
  • It’s the End of News As We Know It (and Facebook Is Feeling Fine)
    • And as the platforms pumped headlines into your feed, they didn’t care whether the “news” was real. They didn’t want that responsibility or expense. Instead, they honed in on engagement—did you click or share, increasing value to advertisers?
      • Diversity (responsibility, expense), Stampede (engagement, share)
  • Finished Analyzing Discourse and Text Complexity for Learning and Collaborating, and created this entry for the notes.
  • Was looking at John Du Bois paper Towards a dialogic syntax, which looks really interesting, but seems like it might be more appropriate for spoken dialog. Instead, I think I’ll go to Claire Cardie‘s presentation on chat argument analysis at UMD tomorrow and see if that has better alignment.
    • Argument Mining with Structured SVMs and RNNs
      • We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizations, can enforce structure constraints (e.g., transitivity), and can express dependencies between adjacent relations and propositions. Our approaches outperform unstructured baselines in both web comments and argumentative essay datasets.

Phil 12.5.18

7:00 – 4:30 ASRC PhD/NASA

Phil 11.29.18

7:00 – 4:30 ASRC PhD/NASA

    • Listening to repeat of America Abroad Sowing Chaos: Russia’s Disinformation Wars. My original notes are here
    • Finished World without End: The Delta Green Open Campaign Setting, by A. Scott Glancey
      • Overall, this describes the creation of the cannon of the Delta Green playspace. The goal as described was to root the work in existing fiction (Lovecraft’s Cthulhu) and historical fact. This provides the core of the space that players can move out from or fill in. Play does not produce more cannon, so it produces a trajectory that may have high influence for the actual players, but may not move beyond that. The article discusses Agent Angela, as an example of a thumbnail sketch that has become a mythical character, independent of the work of the authors with respect to Cannon. My guess is as the Agent Angela space became “stiffer” that it could also be shared more.
      • As a role-playing game, Delta Green’s narrative differs from the traditional narratives of literature, theater, and film because it offers only plot without characters to drive the story forward. It’s up to the role-players to provide the characters. Role-playing game settings are narratives not built around any specific protagonist, yet capable of accommodating multiple protagonists. Thus, role-playing games, particularly the classic paper-and-dice ones, are by their very nature vast narratives. (page 77)
      • During the designing of the Delta Green vast narrative it was decided that we would publish more open-ended source material than scenarios. Source material is usually built around an enemy of Delta Green with a particular agenda or set of goals, much like a traditional role-playing game scenario is set up, only without the framework of scenes and set pieces designed to channel the players through to a resolution of the scenario. The reason for emphasizing open ended source material over scenarios is that we were trying to encourage Keepers to design their own scenarios without pinning them down with too much canon. That is always a danger with creating a role-playing game background. You want to create a rich environment, but you don’t want to fill in so many details that there is nothing new for the players and Keepers to create with their own games. (Page 81)
      • If the players in a role-playing game campaign start to think that their characters are more disposable than the villain, they are going to feel marginalized After all, whose story is this-theirs or a non-player character’s? The fastest way to alienate a group of players is to give them the impression that they are not the center of the story. If they are not the ones driving the action forward, then what’s the point in playing a role-playing game? They might as well be watching a movie if they cannot affect the pacing, action, and outcome of a story. (Page 83)
    • Going to create a bag of words collection for post subjects and posts that are not from the DM, and then plot the use of the words over time (by sequential post). I think that once stop words are removed, that patterns might be visible.
      • Pulling out the words
      • Have the overall counts
      • Building the count mats
      • Stop words worked, needed to drop punctuation and caps
    • Yoast has an array that looks immediately usable:
      [ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do", "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's", "its", "itself", "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd", "she'll", "she's", "should", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "we", "we'd", "we'll", "we're", "we've", "were", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "would", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself", "yourselves" ]
    • Good, progress. I’m using TF-IDF to determine the importance of the term in the timeline. That’s ok, but not great. Here’s a plot: room_terms
    • You can see the three rooms, but they don’t stand out all that well. Maybe a low-pass filter on top of this? Anyway, done for the day.