Monthly Archives: April 2018

Phil 4.30.18

7:00 – 4:30 ASRC MKT

  • Some new papers from ICLR 2018
  • Need to write up a quick post for communicating between Angular and a (PHP) server, with an optional IntelliJ configuration section
  • JuryRoom this morning and then GANs + Agents this afternoon?
  • Next steps for JuryRoom
    • Start up the AngularPro course
    • Set up PHP access to DB, returning JSON objects
  • Starting Agent/GAN project
    • Need to set up an ACM paper to start dumping things into – done.
    • Looking for a good source for Jack London. Gutenberg looks nice, but there is a no-scraping rule, so I guess, we’ll do this by hand…
    • We will need to check for redundant short stories
    • We will need to strip the front and back matter that pertains to project Gutenburg
      • *** START OF THIS PROJECT GUTENBERG EBOOK BROWN WOLF AND OTHER JACK ***
      • *** END OF THIS PROJECT GUTENBERG EBOOK BROWN WOLF AND OTHER JACK ***
  • Fika: Accessibility at the Intersection of Users and Data
    • Nice talk and followup discussion with Dr. Hernisa Kacorri, who’s combining machine learning and HCC
      • My research goal is to build technologies that address real-world problems by integrating data-driven methods and human-computer interaction. I am interested in investigating human needs and challenges that may benefit from advancements in artificial intelligence. My focus is both in building new models to address these challenges and in designing evaluation methodologies that assess their impact. Typically my research involves application of machine learning and analytics research to benefit people with disabilities, especially assistive technologies that model human communication and behavior such as sign language avatars and independent mobility for the blind.

Phil 4.27.18

7:00 – 4:00 ASRC MKT

  • Call Charlestown about getting last two years of payments – done. Left a message
  • Get parking from StubHub
  • I saw James Burnham’s interview on the Daly Show last night. Roughly, I think that his thoughts on how humans haven’t changed, but our norms and practices is true. Based on listening to him talk, I think he’s more focussed on the symptoms than the cause. The question that he doesn’t seem to be asking is “why did civilization emerge when it did?”, and “why does it seem to be breaking now?”
    Personally, I think it’s tied up with communication technology. Slow communication systems like writing, mail, and the printing press lead to civilization. Rapid, frictionless forms of communication from radio to social media disrupt this process by changing how we define, perceive and trust our neighbors. The nice thing is that if technology is the critical element, then technology can be adjusted. Not that it’s easier, but it’s probably easier than changing humans.
  • Continuing From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum. Done and posted in Phlog
  • Tweaking the Angular and PHP code.
  • I got the IntelliJ debugger to connect to the Apache PHP server! Here’s the final steps. Pay particular attention to the highlighted areas:
    • File->Settings->Languages & Frameworks->PHP->Debug Debug1
    • Validate: Debug2
  • Objects are now coming back in the same way, so no parsing on the Angular side
  • Sprint planning

Phil 4.26.18

Too much stuff posted yesterday, so I’m putting Kate Starbird’s new paper here:

  • Ecosystem or Echo-System? Exploring Content Sharing across Alternative Media Domains
    • This research examines the competing narratives about the role and function of Syria Civil Defence, a volunteer humanitarian organization popularly known as the White Helmets, working in war-torn Syria. Using a mixed-method approach based on seed data collected from Twitter, and then extending out to the websites cited in that data, we examine content sharing practices across distinct media domains that functioned to construct, shape, and propagate these narratives. We articulate a predominantly alternative media “echo-system” of websites that repeatedly share content about the White Helmets. Among other findings, our work reveals a small set of websites and authors generating content that is spread across diverse sites, drawing audiences from distinct communities into a shared narrative. This analysis also reveals the integration of government funded media and geopolitical think tanks as source content for anti-White Helmets narratives. More broadly, the analysis demonstrates the role of alternative newswire-like services in providing content for alternative media websites. Though additional work is needed to understand these patterns over time and across topics, this paper provides insight into the dynamics of this multi-layered media ecosystem.

7:00 – 5:00 ASRC MKT

  • Referencing for Aanton at 5:00
  • Call Charlestown about getting last two years of payments
  • Benjamin D. Horne, Sara Khedr, and Sibel Adali. “Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape” ICWSM 2018
  • Continuing From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum
  • Anchor-Free Correlated Topic Modeling
    • In topic modeling, identifiability of the topics is an essential issue. Many topic modeling approaches have been developed under the premise that each topic has an anchor word, which may be fragile in practice, because words and terms have multiple uses; yet it is commonly adopted because it enables identifiability guarantees. Remedies in the literature include using three- or higher-order word co-occurence statistics to come up with tensor factorization models, but identifiability still hinges on additional assumptions. In this work, we propose a new topic identification criterion using second order statistics of the words. The criterion is theoretically guaranteed to identify the underlying topics even when the anchor-word assumption is grossly violated. An algorithm based on alternating optimization, and an efficient primal-dual algorithm are proposed to handle the resulting identification problem. The former exhibits high performance and is completely parameter-free; the latter affords up to 200 times speedup relative to the former, but requires step-size tuning and a slight sacrifice in accuracy. A variety of real text copora are employed to showcase the effectiveness of the approach, where the proposed anchor-free method demonstrates substantial improvements compared to a number of anchor-word based approaches under various evaluation metrics.
  • Cleaning up the Angular/PHP example. Put on GitHub?

Phil 4.25.18

7:00 – 3:30 ASRC MKT

  • Google’s Workshop on AI/ML Research and Practice in India:
    Ganesh Ramakrishnan (IIT Bombay) presented research on human assisted machine learning.
  • From I to We: Group Formation and Linguistic Adaption in an Online Xenophobic Forum
    • Much of identity formation processes nowadays takes place online, indicating that intergroup differentiation may be found in online communities. This paper focuses on identity formation processes in an open online xenophobic, anti-immigrant, discussion forum. Open discussion forums provide an excellent opportunity to investigate open interactions that may reveal how identity is formed and how individual users are influenced by other users. Using computational text analysis and Linguistic Inquiry Word Count (LIWC), our results show that new users change from an individual identification to a group identification over time as indicated by a decrease in the use of “I” and increase in the use of “we”. The analyses also show increased use of “they” indicating intergroup differentiation. Moreover, the linguistic style of new users became more similar to that of the overall forum over time. Further, the emotional content decreased over time. The results indicate that new users on a forum create a collective identity with the other users and adapt to them linguistically.
    • Social influence is broadly defined as any change – emotional, behavioral, or attitudinal – that has its roots in others’ real or imagined presence (Allport, 1954). (pg 77)
    • Regardless of why an individual displays an observable behavioral change that is in line with group norms, social identification with a group is the basis for the change. (pg 77)
    • In social psychological terms, a group is defined as more than two people that share certain goals (Cartwright & Zander, 1968). (pg 77)
    • Processes of social identification, intergroup differentiation and social influence have to date not been studied in online forums. The aim of the present research is to fill this gap and provide information on how such processes can be studied through language used on the forum. (pg 78)
    • The popularity of social networking sites has increased immensely during the last decade. At the same time, offline socializing has shown a decline (Duggan & Smith, 2013). Now, much of the socializing actually takes place online (Ganda, 2014). In order to be part of an online community, the individual must socialize with other users. Through such socializing, individuals create self-representations (Enli & Thumim, 2012). Hence, the processes of identity formation, may to a large extent take place on the Internet in various online forums. (pg 78)
    • For instance, linguistic analyses of American Nazis have shown that use of third person plural pronouns (they, them, their) is the single best predictor of extreme attitudes (Pennebaker & Chung, 2008). (pg 79)
    • Because language can be seen as behavior (Fiedler, 2008), it may be possible to study processes of social influence through linguistic analysis. Thus, our second hypothesis is that the linguistic style of new users will become increasingly similar to the linguistic style of the overall forum over time (H2). (pg 79)
    • This indicates that the content of the posts in an online forum may also change over time as arguments become more fine-tuned and input from both supporting and contradicting members are integrated into an individual’s own beliefs. This is likely to result (linguistically) in an increase in indicators of cognitive complexity. Hence, we hypothesize that the content of the posts will change over time, such that indicators of complex thinking will increase (H3a). (pg 80)
      • I’m not sure what to think about this. I expect from dimension reduction, that as the group becomes more aligned, the overall complex thinking will reduce, and the outliers will leave, at least in the extreme of a stampede condition.
    • This result indicates that after having expressed negativity in the forum, the need for such expressions should decrease. Hence, we expect that the content of the posts will change such that indicators of negative emotions will decrease, over time (H3b). (pg 80)
    • the forum is presented as a “very liberal forum”, where people are able to express their opinions, whatever they may be. This “extreme liberal” idea implies that there is very little censorship the forum is presented as a “very liberal forum”, where people are able to express their opinions, whatever they may be. This “extreme liberal” idea implies that there is very little censorship, which has resulted in that the forum is highly xenophobic. Nonetheless, due to its liberal self-presentation, the xenophobic discussions are not unchallenged. For example, also anti-racist people join this forum in order to challenge individuals with xenophobic attitudes. This means that the forum is not likely to function as a pure echo chamber, because contradicting arguments must be met with own arguments. Hence, individuals will learn from more experienced users how to counter contradicting arguments in a convincing way. Hence, they are likely to incorporate new knowledge, embrace input and contribute to evolving ideas and arguments. (pg 81)
      • Open debate can lead to the highest level of polarization (M&D)
      • There isn’t diverse opinion. The conversation is polarized, with opponents pushing towards the opposite pole. The question I’d like to see answered is has extremism increased in the forum?
    • Natural language analyses of anonymous social media forums also circumvent social desirability biases that may be present in traditional self-rating research, which is a particular important concern in relation to issues related to outgroups (Maass, Salvi, Arcuri, & Semin, 1989; von Hippel, Sekaquaptewa, & Vargas, 1997, 2008). The to-be analyzed media uses “aliases”, yielding anonymity of the users and at the same time allow us to track individuals over time and analyze changes in communication patterns. (pg 81)
      • After seeing “Ready Player One”, I also wonder if the aliases themselves could be looked at using an embedding space built from the terms used by the users? Then you get distance measurements, t-sne projections, etc.
    • Linguistic Inquiry Word Count (LIWC; Pennebaker et al., 2007; Chung & Pennebaker, 2007; Pennebaker, 2011b; Pennebaker, Francis, & Booth, 2001) is a computerized text analysis program that computes a LIWC score, i.e., the percentage of various language categories relative to the number of total words (see also www.liwc.net). (pg 81)
      • LIWC2015 ($90) is the gold standard in computerized text analysis. Learn how the words we use in everyday language reveal our thoughts, feelings, personality, and motivations. Based on years of scientific research, LIWC2015 is more accurate, easier to use, and provides a broader range of social and psychological insights compared to earlier LIWC versions
    • Figure 1c shows words overrepresented in later posts, i.e. words where the usage of the words correlates positively with how long the users has been active on the forum. The words here typically lack emotional content and are indicators of higher complexity in language. Again, this analysis provides preliminary support for the idea that time on the forum is related to more complex thinking, and less emotionality.
      • WordCloud
    • The second hypothesis was that the linguistic style of new users would become increasingly similar to other users on the forum over time. This hypothesis is evaluated by first z-transforming each LIWC score, so that each has a mean value of zero and a standard deviation of one. Then we measure how each post differs from the standardized values by summing the absolute z-values over all 62 LIWC categories from 2007. Thus, low values on these deviation scores indicate that posts are more prototypical, or highly similar, to what other users write. These deviation scores are analyzed in the same way as for Hypothesis 1 (i.e., by correlating each user score with the number of days on the forum, and then t-testing whether the correlations are significantly different from zero). In support of the hypothesis, the results show an increase in similarity, as indicated by decreasing deviation scores (Figure 2). The mean correlation coefficient between this measure and time on the forum was -.0086, which is significant, t(11749) = -3.77, p < 0.001. (pg 85)
      • ForumAlignmentI think it is reasonable to consider this a measure of alignment
    • Because individuals form identities online and because we see this in the use of pronouns, we also expected to see tendencies of social influence and adaption. This effect was also found, such that individuals’ linguistic style became increasingly similar to other users’ linguistic style over time. Past research has shown that accommodation of communication style occurs automatically when people connect to people or groups they like (Giles & Ogay 2007; Ireland et al., 2011), but also that similarity in communicative style functions as cohesive glue within a group (Reid, Giles, & Harwood, 2005). (pg 86)
    • Still, the results could not confirm an increase in cognitive complexity. It is difficult to determine why this was not observed even though a general trend to conform to the linguistic style on the forum was observed. (pg 87)
      • This is what I would expect. As alignment increases, complexity, as expressed by higher dimensional thinking should decrease.
    • This idea would also be in line with previous research that has shown that expressing oneself decreases arousal (Garcia et al., 2016). Moreover, because the forum is not explicitly racist, individuals may have simply adapted to the social norms on the forum prescribing less negative emotional displays. Finally, a possible explanation for the decrease in negative emotional words might be that users who are very angry leave the forum, because of its non-racist focus, and end up in more hostile forums. An interesting finding that was not part of the hypotheses in the present research is that the third person plural category correlated positively with all four negative emotions categories, suggesting that people using for example ‘they’ express more negative emotions (pg 87)
    • In line with social identity theory (Tajfel & Turner, 1986), we also observe linguistic adaption to the group. Hence, our results indicate that processes of identity formation may take place online. (pg 87)
  • Me, My Echo Chamber, and I: Introspection on Social Media Polarization
    • Homophily — our tendency to surround ourselves with others who share our perspectives and opinions about the world — is both a part of human nature and an organizing principle underpinning many of our digital social networks. However, when it comes to politics or culture, homophily can amplify tribal mindsets and produce “echo chambers” that degrade the quality, safety, and diversity of discourse online. While several studies have empirically proven this point, few have explored how making users aware of the extent and nature of their political echo chambers influences their subsequent beliefs and actions. In this paper, we introduce Social Mirror, a social network visualization tool that enables a sample of Twitter users to explore the politically-active parts of their social network. We use Social Mirror to recruit Twitter users with a prior history of political discourse to a randomized experiment where we evaluate the effects of different treatments on participants’ i) beliefs about their network connections, ii) the political diversity of who they choose to follow, and iii) the political alignment of the URLs they choose to share. While we see no effects on average political alignment of shared URLs, we find that recommending accounts of the opposite political ideology to follow reduces participants’ beliefs in the political homogeneity of their network connections but still enhances their connection diversity one week after treatment. Conversely, participants who enhance their belief in the political homogeneity of their Twitter connections have less diverse network connections 2-3 weeks after treatment. We explore the implications of these disconnects between beliefs and actions on future efforts to promote healthier exchanges in our digital public spheres.
  • What We Read, What We Search: Media Attention and Public Attention Among 193 Countries
    • We investigate the alignment of international attention of news media organizations within 193 countries with the expressed international interests of the public within those same countries from March 7, 2016 to April 14, 2017. We collect fourteen months of longitudinal data of online news from Unfiltered News and web search volume data from Google Trends and build a multiplex network of media attention and public attention in order to study its structural and dynamic properties. Structurally, the media attention and the public attention are both similar and different depending on the resolution of the analysis. For example, we find that 63.2% of the country-specific media and the public pay attention to different countries, but local attention flow patterns, which are measured by network motifs, are very similar. We also show that there are strong regional similarities with both media and public attention that is only disrupted by significantly major worldwide incidents (e.g., Brexit). Using Granger causality, we show that there are a substantial number of countries where media attention and public attention are dissimilar by topical interest. Our findings show that the media and public attention toward specific countries are often at odds, indicating that the public within these countries may be ignoring their country-specific news outlets and seeking other online sources to address their media needs and desires.
  • “You are no Jack Kennedy”: On Media Selection of Highlights from Presidential Debates
    • Our findings indicate that there exist signals in the textual information that untrained humans do not find salient. In particular, highlights are locally distinct from the speaker’s previous turn, but are later echoed more by both the speaker and other participants (Conclusions)
      • This sounds like dimension reduction and alignment
  • Algorithms, bots, and political communication in the US 2016 election – The challenge of automated political communication for election law and administration
    • Philip N. Howard (Scholar)
    • Samuel C. Woolley (Scholar)
    • Ryan Calo (Scholar)
    • Political communication is the process of putting information, technology, and media in the service of power. Increasingly, political actors are automating such processes, through algorithms that obscure motives and authors yet reach immense networks of people through personal ties among friends and family. Not all political algorithms are used for manipulation and social control however. So what are the primary ways in which algorithmic political communication—organized by automated scripts on social media—may undermine elections in democracies? In the US context, what specific elements of communication policy or election law might regulate the behavior of such “bots,” or the political actors who employ them? First, we describe computational propaganda and define political bots as automated scripts designed to manipulate public opinion. Second, we illustrate how political bots have been used to manipulate public opinion and explain how algorithms are an important new domain of analysis for scholars of political communication. Finally, we demonstrate how political bots are likely to interfere with political communication in the United States by allowing surreptitious campaign coordination, illegally soliciting either contributions or votes, or violating rules on disclosure.
  • Ok, back to getting HTTPClient posts to play with PHP cross domain
  • Maybe I have to make a proxy?
    • Using the proxying support in webpack’s dev server we can highjack certain URLs and send them to a backend server. We do this by passing a file to --proxy-config
    • Well, that fixes the need to have all the server options set, but the post still doesn’t send data. But since this is the Right way to do things, here’s the steps:
    • To proxy localhost:4200/uli -> localhost:80/uli
      • Create a proxy.conf.json file in the same directory as package.json
        {
          "/uli": {
            "target": "http://localhost:80",
            "secure": false
          }
        }

        This will cause any explicit request to localhost:4200/uli to be mapped to localhost:80/uli and appear that they are coming from localhost:80/uli

      • Set the npm start command in the package.json file to read as
        "scripts": {
          "start": "ng serve --proxy-config proxy.conf.json",
          ...
        },

        Start with “npm start”, rather than “ng serve”

      • Call from Angular like this:
        this.http.post('http://localhost:4200/uli/script.php', payload, httpOptions)
      • Here’s the PHP code (script.php): it takes POST and GET input and feeds it back with some information about the source :
        function getBrowserInfo(){
             $browserData = array();
             $ip = htmlentities($_SERVER['REMOTE_ADDR']);
             $browser = htmlentities($_SERVER['HTTP_USER_AGENT']);
             $referrer = "No Referrer";
             if(isset($_SERVER['HTTP_REFERER'])) {
                 //do what you need to do here if it's set
                 $referrer = htmlentities($_SERVER['HTTP_REFERER']);         if($referrer == ""){
                     $referrer = "No Referrer";
                 }
             }
             $browserData["ipAddress"] = $ip;
             $browserData["browser"] = $browser;
             $browserData["referrer"] = $referrer;
             return $browserData;
         }
         function getPostInfo(){
             $postInfo = array();
             foreach($_POST as $key => $value) {
                if(strlen($value) < 10000) {               $postInfo[$key] = $value;           }else{               $postInfo[$key] = "string too long";           }       }       return $postInfo;   }   function getGetInfo(){       $getInfo = array();       foreach($_GET as $key => $value) {
                if(strlen($value) < 10000) {
                    $getInfo[$key] = $value;
                }else{
                    $getInfo[$key] = "string too long";
                }
            }
            return $getInfo;
        }
        
        /**************************** MAIN ********************/
        $toReturn = array();
        $toReturn['getPostInfo'] = getPostInfo();
        $toReturn['getGetInfo'] = getGetInfo();
        $toReturn['browserInfo'] = getBrowserInfo();
        $toReturn['time'] = date("h:i:sa");
        $jstr =  json_encode($toReturn);
        echo($jstr);
      • And it arrives at localhost:80/uli/script.php. The following is the javascript console of the Angular CLI code running on localhost:4200
        {getPostInfo: Array(0), getGetInfo: {…}, browserInfo: {…}, time: "05:17:16pm"}
        browserInfo:
        	browser:"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"
        	ipAddress:"127.0.0.1"
        	referrer:"http://localhost:4200/"
        getGetInfo:
        	message:"{"title":"foo","body":"bar","userId":1}"
        getPostInfo:[]
        time:"05:17:16pm"
        
      • Got the pieces parsing in @Component and displaying, so the round trip is done. Wan’t expecting to wind up using GET, but until I can figure out what the deal is with POST, that’s what it’s going to be. Here are the two methods that send and then parse the message:
        doGet(event) {
          let payload = {
            title: 'foo',
            body: 'bar',
            userId: 1
          };
          let message = 'message='+encodeURIComponent(JSON.stringify(payload));
          let target = 'http://localhost:4200/uli/script.php?';
        
          //this.http.get(target+'title=\'my title\'&body=\'the body\'&userId=1')
          this.http.get(target+message)
            .subscribe((data) => {
              console.log('Got some data from backend ', data);
              this.extractMessage(data, "getGetInfo");
            }, (error) => {
              console.log('Error! ', error);
            });
        }
        
        extractMessage(obj, name: string){
          let item = obj[name];
          try {
            if (item) {
              let mstr = item.message;
              this.mobj = JSON.parse(mstr);
            }
          }catch(err){
            this.mobj = {};
            this.mobj["message"] = "Error extracting 'message' from ["+name+"]";
          }
          this.mkeys = Object.keys(this.mobj);
        }
      • And here’s the html code: html
      • Here’s a screenshot of everything working: PostGetTest

Phil 4.24.18

7:00 – 5:00 ASRC MKT

  • Aaron’s ot BoP today
  • Working on JuryRoom, particularly hooking up PHP to Angular
  • Here’s the hello world php app that’s working:
    <?php
    header('Access-Control-Allow-Origin: *');
    echo '{"message": "hello"}';
  • And here’s the Angular side:
    uploadFile(event) {
      const elem = event.target;
      if (elem.files.length > 0) {
        const f0 = elem.files[0];
        console.log(f0);
        const formData = new FormData();
        formData.append('file', f0);
    
        this.http.post('http://localhost/uploadImages/script.php', formData)
          .subscribe((data) => {
    
            const jsonResponse = data.json();
    
            // this.gallery.gotSomeDataFromTheBackend(jsonResponse.file);
    
            console.log('Got some data from backend ', data);
          }, (error) => {
            console.log('Error! ', error);
          });
      }
    }
  • Here’s how to connect to the deployment server for debugging (I hope!). From Importing settings from a server access (deployment) configurationDebugPhpServer
  • Can’t see the post info coming back, so I really need to get the debugger set up to talk to the server. Following these directions: Web Server Debug Validation Dialog. Here’s the dialog with some warnings to be corrected: EnablePhpDebug
  • Note that you HAVE TO RESTART APACHE for any php.ini changes to take
  • Had to Add XDebug Helper Chrome Extension. That helped with the php running in the browser, but not in the call to PHP from angular XDebugHelper
  • Works in Postman, but it doesn’t fire the debugger. Still, at least I know that the data can get to the php. Not sure if angular is sending it. Here’s the postman results: Postman
  • Here’s the debugger view. The data appears to be going up (formData), but it’s not coming back in the echo like it does in postman. I’ve played around with Content-type, and that doesn’t seem to help: Debugger
  • In the network view, we can see that the payload is there: Payload
  • So it must not be getting accepted in the PHP….

Phil 4.23.18

7:00 – 4:30 ASRC MKT/BD

  • Adding in comments to proposal. Done and in Sharepoint. The “red team” version was so full of old text that most of the comments were inapplicable.
  • Teaching Amy’s class. Went pretty well. Maybe the “rule of thirds” stuck?
  • Meeting with Wayne. Quick review

Phil 4.21.18

Today’s ride

“Writing is Thinking”—an annotated twitter thread

  • Another in the series of State, Orientation and Velocity. In this case discussing the differences between stories and maps:
  • It is really incredible the amount of pushback I see from companies, startups to big, about writing. In particular around the notion that writing is the antithesis of agile. Writing ossifies and cements decision or plans that should change, it is said. My view is that agility comes from planning. Without plans, activities are just brownian motion. And you can’t have plans, especially shared plans, without writing.

Ervin Staub

  • In The Roots of Goodness and Resistance to Evil, Ervin Staub draws on his extensive experiences in scholarship and intervention to illuminate the socializing experiences, education, and trainings that lead children and adults to become helpers/active bystanders and rescuers, acting to prevent violence and create peaceful and harmonious societies. The book collects Staub’s most important and influential articles and essays in the field together with newly written chapters, with wide-ranging examples of helping behaviors as well as discussions of why we should help and not harm others. He addresses many examples of such behaviors, from helping people in everyday physical or psychological distress, to active bystandership in response to harmful actions by youth toward their peers (bullying), to endangering one’s life to save someone in immediate danger, or rescuing intended victims of genocide.

Train Your Machine Learning Models on Google’s GPUs for Free — Forever

  • The first step is to download the notebook (or another notebook of your choice)

  • Then, head over to Google Colab, sign into your google account (or create one if you somehow made it this far through life without one)

Phil 4.20.18

7:00 – ASRC MKT

  • Executing gradient descent on the earth
    • But the important question is: how well does gradient descent perform on the actual earth?
    • This is nice, because it suggests that we can compare GD algorithms on recognizable and visualizable terrains. Terrain locations can have multiple visualizable factors, height and luminance could be additional dimensions
  • Minds is the anti-facebook that pays you for your time
    • In a refreshing change from Facebook, Twitter, Instagram, and the rest of the major platforms, Minds has also retained a strictly reverse-chronological timeline. The core of the Minds experience, though, is that users receive “tokens” when others interact with their posts, or simply by spending time on the platform.
  • Continuing along with the Angular/PHP tutorial here. Nicely, there is also a Git repo
    • Had to add some styling to get the upload button to show
    • The HttpModule is deprecated, but sticking with it for now
    • Will need to connect/verify PHP server within IntelliJ, described here.
    • How to connect Apache, to IntelliJ
  • Installing and Configuring XAMPP with PhpStorm IDE. Don’t forget about deployment path: deploy

Phil 4.19.18

8:00 – ASRC MKT/BD

    • Good discussion with Aaron about the agents navigating embedding space. This would be a great example of creating “more realistic” data from simulation that bridges the gap between simulation and human data. This becomes the basis for work producing text for inputs such as DHS input streams.
      • Get the embedding space from the Jack London corpora (crawl here)
      • Train a classifier that recognizes JL using the embedding vectors instead of the words. This allows for contextual closeness. Additionally, it might allow a corpus to be trained “at once” as a pattern in the embedding space using CNNs.
      • Train an NN(what type?) to produce sentences that contain words sent by agents that fool the classifier
      • Record the sentences as the trajectories
      • Reconstruct trajectories from the sentences and compare to the input
      • Some thoughts WRT generating Twitter data
        • Closely aligned agents can retweet (alignment measure?)
        • Less closely aligned agents can mention/respond, and also add their tweet
    • Handed off the proposal to Red Team. Still need to rework the Exec Summary. Nope. Doesn’t matter that the current exec summary does not comply with the requirements.
    • A dog with high social influence creates an adorable stampede:
    • Using Machine Learning to Replicate Chaotic Attractors and Calculate Lyapunov Exponents from Data
      • This is a paper that describes how ML can be used to predict the behavior of chaotic systems. An implication is that this technique could be used for early classification of nomadic/flocking/stampede behavior
    • Visualizing a Thinker’s Life
      • This paper presents a visualization framework that aids readers in understanding and analyzing the contents of medium-sized text collections that are typical for the opus of a single or few authors.We contribute several document-based visualization techniques to facilitate the exploration of the work of the German author Bazon Brock by depicting various aspects of its texts, such as the TextGenetics that shows the structure of the collection along with its chronology. The ConceptCircuit augments the TextGenetics with entities – persons and locations that were crucial to his work. All visualizations are sensitive to a wildcard-based phrase search that allows complex requests towards the author’s work. Further development, as well as expert reviews and discussions with the author Bazon Brock, focused on the assessment and comparison of visualizations based on automatic topic extraction against ones that are based on expert knowledge.

 

Phil 4.18.18

7:00 – 6:30 ASRC MKT/BD

  • Meeting with James Foulds. We talked about building an embedding space for a literature body (The works of Jack London, for example) that agents can then navigate across. At the same time, train an LSTM on the same corpora so that the ML system, when given the vector of terms from the embedding (with probabilities/similarities?), produce a line that could be from the work that incorporates those terms. This provides a much more realistic model of the agent output that could be used for mapping. Nice paper to continue the current work while JuryRoom comes up to speed.
  • Recurrent Neural Networks for Multivariate Time Series with Missing Values
    • Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRUD, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
  •  The fall of RNN / LSTM
    • We fell for Recurrent neural networks (RNN), Long-short term memory (LSTM), and all their variants. Now it is time to drop them!
  • JuryRoom
  • Back to proposal writing
  • Done with section 5! LaTex FTW!
  • Clean up Abstract, Exec Summary and Transformative Impact tomorrow

Phil 4.17.18

7:00 – ASRC MKT

  • Listening to an interview with Nial Ferguson this morning where he talks about how the Chinese IT model aligns more closely with developing countries because they have solved the payment problem. And the surveillance state apparatus comes along for free. A ML/AI trained in that population will provide even closer alignment and will feel more “native”.
  • A ML/AI trained in that population will feel more “native”, and increase the traction of the Chinese IT. The Chinese approach expands its footprint in the developing world because it feels better and solves problems.
  • This sets up a conflict between corporate systems in the US and EU and China? In sheer demographics that means that it’s more likely that the dominant ML/AI perspective would reflect the surveillance biases of the Chinese government.
  • Payment systems are Socio-cultural user interfaces
  • Submitted to SASO. Submission #32. Updated the ArXiv file too. ArXiv “forgets” all the attachments too, so the tarball approach is soooooo much nicer.
  • Alt text for screen readers using LaTex
    \documentclass{article}
    \usepackage{graphicx}
    \usepackage{pdfcomment}
    \pagestyle{empty}
    
    \begin{document}
    one two three
    
    \pdftooltip{\includegraphics{img.png}}{This is the ALT text}%
    
    four five six
    \end{document}

     

Phil 4.16.18

9:00 – ASRC MKT

  • Finished up and submitted the CI 2018 and also put up on ArXive. Probably 90 minutes total?
  • SASO deadlines got extended:
    • Abstract submission (extended)  April 23, 2018
    • Submission (extended) April 30, 2018
  • Some diversity injection: Report for America Supports Journalism Where Cutbacks Hit Hard
    • Report for America, a nonprofit organization modeled after AmeriCorps, aims to install 1,000 journalists in understaffed newsrooms by 2022. Now in its pilot stage, the initiative has placed three reporters in Appalachia. It has chosen nine more, from 740 applicants, to be deployed across the country in June.
  • An information-theoretic, all-scales approach to comparing networks
    • As network research becomes more sophisticated, it is more common than ever for researchers to find themselves not studying a single network but needing to analyze sets of networks. An important task when working with sets of networks is network comparison, developing a similarity or distance measure between networks so that meaningful comparisons can be drawn. The best means to accomplish this task remains an open area of research. Here we introduce a new measure to compare networks, the Portrait Divergence, that is mathematically principled, incorporates the topological characteristics of networks at all structural scales, and is general-purpose and applicable to all types of networks. An important feature of our measure that enables many of its useful properties is that it is based on a graph invariant, the network portrait. We test our measure on both synthetic graphs and real world networks taken from protein interaction data, neuroscience, and computational social science applications. The Portrait Divergence reveals important characteristics of multilayer and temporal networks extracted from data.

3:00 – 4:00 Fika

Phil 4.14.18

Text Embedding Models Contain Bias. Here’s Why That Matters.

  • It occurs to me that bias may be a way of measuring societal dimension reduction. Need to read this carefully.
  • Neural network models can be quite powerful, effectively helping to identify patterns and uncover structure in a variety of different tasks, from language translation to pathology to playing games. At the same time, neural models (as well as other kinds of machine learning models) can contain problematic biases in many forms. For example, classifiers trained to detect rude, disrespectful, or unreasonable comments may be more likely to flag the sentence “I am gay” than “I am straight” [1]; face classification models may not perform as well for women of color [2]; speech transcription may have higher error rates for African Americans than White Americans [3].

Visual Analytics for Explainable Deep Learning

  • Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions.

Submitted final version of the CI 2018 paper and also put a copy up on ArXive. Turns out that you can bundle everything into a tar file and upload once.

Phil 4.13.18

7:00 – ASRC MKT/BD

  • That Politico article on “news deserts” doesn’t really show what it claims to show
    • Its heart is in the right place, and the decline of local news really is a big threat to democratic governance.
  • Firing up the JuryRoom effort again
    • Unsurprisingly, there are updates
    • And a lot of fixing plugins. Big update
    • Ok, back to having PHP and MySQL working. Need to see how to integrate it with the Angular CLI
      • Updated CLI as per stackoverflow
        • In order to update the angular-cli package installed globally in your system, you need to run:

          npm uninstall -g angular-cli
          npm cache clean
          npm install -g @angular/cli@latest
          

          Depending on your system, you may need to prefix the above commands with sudo.

          Also, most likely you want to also update your local project version, because inside your project directory it will be selected with higher priority than the global one:

          rm -rf node_modules
          npm uninstall --save-dev angular-cli
          npm install --save-dev @angular/cli@latest
          npm install
          

          thanks grizzm0 for pointing this out on GitHub.

           

        • Updated my work environment too. Some PHP issues, and the Angular CLI wouldn’t update until I turned on the VPN. Duh.
      • Angular 4 + PHP: Setting Up Angular And Bootstrap – Part 2
    • Back to proposal writing