Category Archives: Machine Learning

Phil 8.16.18

7:00 – 4:30 ASRC MKT

  • R2D3 is an experiment in expressing statistical thinking with interactive design. Find us at @r2d3usR2D3
  • Foundations of Temporal Text Networks
    • Davide Vega (Scholar)
    • Matteo Magnani (Scholar)
    • Three fundamental elements to understand human information networks are the individuals (actors) in the network, the information they exchange, that is often observable online as text content (emails, social media posts, etc.), and the time when these exchanges happen. An extremely large amount of research has addressed some of these aspects either in isolation or as combinations of two of them. There are also more and more works studying systems where all three elements are present, but typically using ad hoc models and algorithms that cannot be easily transferred to other contexts. To address this heterogeneity, in this article we present a simple, expressive and extensible model for temporal text networks, that we claim can be used as a common ground across different types of networks and analysis tasks, and we show how simple procedures to produce views of the model allow the direct application of analysis methods already developed in other domains, from traditional data mining to multilayer network mining.
      • Ok, I’ve been reading the paper and if I understand it correctly, it’s pretty straightforward and also clever. It relates a lot to the way that I do term document matrices, and then extends the concept to include time, agents, and implicitly anything you want to. To illustrate, here’s a picture of a tensor-as-matrix: tensorIn2DThe important thing to notice is that there are multiple dimensions represented in a square matrix. We have:
        • agents
        • documents
        • terms
        • steps
      • This picture in particular is of an undirected adjacency matrix, but I think there are ways to handle in-degree and out-degree, though I think that’s probably better handled by having one matrix for indegree and one for out.
      • Because it’s a square matrix, we can calculate the steps between any node that’s on the matrix, and the centrality, simply by squaring the matrix and keeping track of the steps until the eigenvector settles. We can also weight nodes by multiplying that node’s row and column by the scalar. That changes the centrality, but ot the connectivity. We can also drop out components (steps for example) to see how that changes the underlying network properties.
      • If we want to see how time affects the development of the network, we can start with all the step nodes set to a zero weight, then add them in sequentially. This means, for example, that clustering could be performed on the nonzero nodes.
      • Some or all of the elements could be factorized using NMF, resulting in smaller, faster matrices.
      • Network embedding could be useful too. We get distances between nodes. And this looks really important: Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec
      • I think I can use any and all of the above methods on the network tensor I’m describing. This is very close to a mapping solution.
  • The Shifting Discourse of the European Central Bank: Exploring Structural Space in Semantic Networks (cited by the above paper)
    • Convenient access to vast and untapped collections of documents generated by organizations is a valuable resource for research. These documents (e.g., Press releases, reports, speech transcriptions, etc.) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of such large document corpora does not come without challenges. Two of these challenges are 1) the need for appropriate automated methods for text mining and analysis and 2) the redundant and predictable nature of the formalized discourse contained in these collections of texts. Our article proposes an approach that performs well in overcoming these particular challenges for the analysis of documents related to the recent financial crisis. Using semantic network analysis and a combination of structural measures, we provide an approach that proves valuable for a more comprehensive analysis of large and complex semantic networks of formal discourse, such as the one of the European Central Bank (ECB). We find that identifying structural roles in the semantic network using centrality measures jointly reveals important discursive shifts in the goals of the ECB which would not be discovered under traditional text analysis approaches.
  • Comparative Document Analysis for Large Text Corpora
    • This paper presents a novel research problem, Comparative Document Analysis (CDA), that is, joint discovery of commonalities and differences between two individual documents (or two sets of documents) in a large text corpus. Given any pair of documents from a (background) document collection, CDA aims to automatically identify sets of quality phrases to summarize the commonalities of both documents and highlight the distinctions of each with respect to the other informatively and concisely. Our solution uses a general graph-based framework to derive novel measures on phrase semantic commonality and pairwise distinction, where the background corpus is used for computing phrase-document semantic relevance. We use the measures to guide the selection of sets of phrases by solving two joint optimization problems. A scalable iterative algorithm is developed to integrate the maximization of phrase commonality or distinction measure with the learning of phrase-document semantic relevance. Experiments on large text corpora from two different domains—scientific papers and news—demonstrate the effectiveness and robustness of the proposed framework on comparing documents. Analysis on a 10GB+ text corpus demonstrates the scalability of our method, whose computation time grows linearly as the corpus size increases. Our case study on comparing news articles published at different dates shows the power of the proposed method on comparing sets of documents.
  • Social and semantic coevolution in knowledge networks
    • Socio-semantic networks involve agents creating and processing information: communities of scientists, software developers, wiki contributors and webloggers are, among others, examples of such knowledge networks. We aim at demonstrating that the dynamics of these communities can be adequately described as the coevolution of a social and a socio-semantic network. More precisely, we will first introduce a theoretical framework based on a social network and a socio-semantic network, i.e. an epistemic network featuring agents, concepts and links between agents and between agents and concepts. Adopting a relevant empirical protocol, we will then describe the joint dynamics of social and socio-semantic structures, at both macroscopic and microscopic scales, emphasizing the remarkable stability of these macroscopic properties in spite of a vivid local, agent-based network dynamics.
  • Tensorflow 2.0 feedback request
    • Shortly, we will hold a series of public design reviews covering the planned changes. This process will clarify the features that will be part of TensorFlow 2.0, and allow the community to propose changes and voice concerns. Please join developers@tensorflow.org if you would like to see announcements of reviews and updates on process. We hope to gather user feedback on the planned changes once we release a preview version later this year.

Phil 8.8.18

7:00 – 4:00 ASRC MKT

  • Oh, look, a new Tensorflow (1.10). Time to break things. I like the BigTable integration though.
  • Learning Meaning in Natural Language Processing — A Discussion
    • Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread. Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end. Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:

      1. a summary of the discussion that you will find below,
      2. an interactive view to explore the trees of tweets, and
      3. commented map to get an overview of the main points discussed:
  • The Current Best of Universal Word Embeddings and Sentence Embeddings
    • This post is thus a brief primer on the current state-of-the-art in Universal Word and Sentence Embeddings, detailing a few

      • strong/fast baselines: FastText, Bag-of-Words
      • state-of-the-art models: ELMo, Skip-Thoughts, Quick-Thoughts, InferSent, MILA/MSR’s General Purpose Sentence Representations & Google’s Universal Sentence Encoder.

      If you want some background on what happened before 2017 😀, I recommend the nice post on word embeddings that Sebastian wrote last year and his intro posts.

  • Treeverse is a browser extension for navigating burgeoning Twitter conversations. right_pane
  • Detecting computer-generated random responding in questionnaire-based data: A comparison of seven indices
    • With the development of online data collection and instruments such as Amazon’s Mechanical Turk (MTurk), the appearance of malicious software that generates responses to surveys in order to earn money represents a major issue, for both economic and scientific reasons. Indeed, even if paying one respondent to complete one questionnaire represents a very small cost, the multiplication of botnets providing invalid response sets may ultimately reduce study validity while increasing research costs. Several techniques have been proposed thus far to detect problematic human response sets, but little research has been undertaken to test the extent to which they actually detect nonhuman response sets. Thus, we proposed to conduct an empirical comparison of these indices. Assuming that most botnet programs are based on random uniform distributions of responses, we present and compare seven indices in this study to detect nonhuman response sets. A sample of 1,967 human respondents was mixed with different percentages (i.e., from 5% to 50%) of simulated random response sets. Three of the seven indices (i.e., response coherence, Mahalanobis distance, and person–total correlation) appear to be the best estimators for detecting nonhuman response sets. Given that two of those indices—Mahalanobis distance and person–total correlation—are calculated easily, every researcher working with online questionnaires could use them to screen for the presence of such invalid data.
  • Continuing to work on SASO slides – close to done. Got a lot of adversarial herding FB examples from the House Permanent Committee on Intelligence. Need to add them to the slide. Sobering.
  • And this looks like a FANTASTIC ride out of Trento: ridewithgps.com/routes/27552411
  • Fixed the border menu so that it’s a toggle group

Phil 8.7.18

8:00 – ASRC MKT

  • Looking for discussion transcripts.
  • Podcasts
    • Do you get your heart broken by the Nationals, Wizards, Caps and Redskins every single year but you still come back for more? The DMV Sports Roundtable is the podcast for you – Washington’s sports teams from the fans’ perspective – and plenty of college coverage too.
    • Join UCB Theatre veterans Cody Lindquist & Charlie Todd as they welcome a panel of NYC’s most hilarious comedians, journalists, and politicians to chug two beers on stage and discuss the politics of the week. It’s like Meet The Press, but funnier and with more alcohol. Theme song by Tyler Walker.
    • Rasslin Roundtable: Wrestling podcast centered around the latest PPV
    • TSN 1290 Roundtable: Kevin Olszewski hosts the Donvito Roundtable, airing weekdays from 11am-1pm CT on TSN 1290 Winnipeg. Daily discussion about the Winnipeg Jets, the NHL, and whatever else is on his mind!
    • The Game Design Round Table Focusing on both digital and tabletop gaming, The Game Design Round Table provides a forum for conversation about critical issues to game design.
    • Story Works Round Table Before you can be a successful author, you have to write a great story. Each week, co-hosts, Alida Winternheimer, author and writing coach at Word Essential, Kathryn Arnold, emerging writer, & Robert Scanlon, author of the Blood Empire series, have conversations about the craft of writing fiction. They bring diverse experiences and talents to the table from both the traditional and indie worlds. Our goal is for each episode to be a fun, lively discussion of some aspect of story craft that that enlightens, as well as entertains.
  • Some good pix of bike-share graveyards in China that would be good stampede pix from The Atlantic (set 1) (set 2) Bicycles of various bike-sharing services are seen in Shanghai.
  • Starting back on the SASO slides. Based on Wayne’s comments, I’m reworking the Stephens’ slide
    • Flashes of Insight: Whole-Brain Imaging of Neural Activity in the Zebrafish (video)(paper)(paper)

Phil 8.3.18

7:00 – 3:30 ASRC MKT

  • Slides and walkthrough – done!
  • Ramping up on SASO
  • Textricator is a tool for extracting text from computer-generated PDFs and generating structured data (CSV or JSON). If you have a bunch of PDFs with the same format (or one big, consistently formatted PDF) and you want to extract the data to CSV or JSON, _Textricator_ can help! It can even work on OCR’ed documents!
  • LSTM links for getting back to things later
  • Who handles misinformation outbreaks?
    • Misinformation attacks— the deliberate and sustained creation and amplification of false information at scale — are a problem. Some of them start as jokes (the ever-present street sharks in disasters) or attempts to push an agenda (e.g. right-wing brigading); some are there to make money (the “Macedonian teens”), or part of ongoing attempts to destabilise countries including the US, UK and Canada (e.g. Russia’s Internet Research Agency using troll and bot amplification of divisive messages).

      Enough people are writing about why misinformation attacks happen, what they look like and what motivates attackers. Fewer people are activelycountering attacks. Here are some of them, roughly categorised as:

      • Journalists and data scientists: Make misinformation visible
      • Platforms and governments: Reduce misinformation spread
      • Communities: directly engage misinformation
      • Adtech: Remove or reduce misinformation rewards

Phil 7.31.18

7:00 – 6:00 ASRC MKT

  • Thinking that I need to forward the opinion dynamics part of the work. How heading differs from position and why that matters
  • Found a nice adversarial herding chart from The EconomistBrexit
  • Why Do People Share Fake News? A Sociotechnical Model of Media Effects
    • Fact-checking sites reflect fundamental misunderstandings about how information circulates online, what function political information plays in social contexts, and how and why people change their political opinions. Fact-checking is in many ways a response to the rapidly changing norms and practices of journalism, news gathering, and public debate. In other words, fact-checking best resembles a movement for reform within journalism, particularly in a moment when many journalists and members of the public believe that news coverage of the 2016 election contributed to the loss of Hillary Clinton. However, fact-checking (and another frequently-proposed solution, media literacy) is ineffectual in many cases and, in other cases, may cause people to “double-down” on their incorrect beliefs, producing a backlash effect.
  • Epistemology in the Era of Fake News: An Exploration of Information Verification Behaviors among Social Networking Site Users
    • Fake news has recently garnered increased attention across the world. Digital collaboration technologies now enable individuals to share information at unprecedented rates to advance their own ideologies. Much of this sharing occurs via social networking sites (SNSs), whose members may choose to share information without consideration for its authenticity. This research advances our understanding of information verification behaviors among SNS users in the context of fake news. Grounded in literature on the epistemology of testimony and theoretical perspectives on trust, we develop a news verification behavior research model and test six hypotheses with a survey of active SNS users. The empirical results confirm the significance of all proposed hypotheses. Perceptions of news sharers’ network (perceived cognitive homogeneity, social tie variety, and trust), perceptions of news authors (fake news awareness and perceived media credibility), and innate intentions to share all influence information verification behaviors among SNS members. Theoretical implications, as well as implications for SNS users and designers, are presented in the light of these findings.
  • Working on plan diagram – done
  • Organizing PhD slides. I think I’m getting near finished
  • Walked through slides with Aaron. Need to practice the demo. A lot.

Phil 7.27.18

Ted Underwood

  • my research is as much about information science as literary criticism. I’m especially interested in applying machine learning to large digital collections
  • Git repo with code for upcoming book: Distant Horizons: Digital Evidence and Literary Change
  • Do topic models warp time?
    • The key observation I wanted to share is just that topic models produce a kind of curved space when applied to long timelines; if you’re measuring distances between individual topic distributions, it may not be safe to assume that your yardstick means the same thing at every point in time. This is not a reason for despair: there are lots of good ways to address the distortion. The mathematics of cosine distance tend to work better if you average the documents first, and then measure the cosine between the averages (or “centroids”).
  • The Historical Significance of Textual Distances
    • Measuring similarity is a basic task in information retrieval, and now often a building-block for more complex arguments about cultural change. But do measures of textual similarity and distance really correspond to evidence about cultural proximity and differentiation? To explore that question empirically, this paper compares textual and social measures of the similarities between genres of English-language fiction. Existing measures of textual similarity (cosine similarity on tf-idf vectors or topic vectors) are also compared to new strategies that use supervised learning to anchor textual measurement in a social context.

7:00 – 8:00 ASRC MKT

  • Continued on slides. I think I have the basics. Need to start looking for pictures
  • Sent response to the SASO folks about who’s presenting what.

9:00 – ASRC IRAD

Phil 7.25.18

7:00 – 3:00 ASRC

  • Send out email with meeting time
  • Rather than excerpts from the talks, do a demo of the relevant bits with conclusions and implications. Get the laptop running all the pieces. That means Python and TF and all the other bits.
  • Submitted tuition expenses
  • Submitted Fall 2018 approval
  • Got SASO travel approval!
  • More DNN study
    • Finished CNNs
    • Working on embeddings and W2V. Thought I’d try it on the laptop, but keras can’t find it’s back end and I’m getting other weird errors. One of the big ones was that I didn’t install tk with python. Here’s the answer from stackoverflow: python_fix
    • And now we’re waiting a very long time for a tf ‘hello world’ to run… But it did!
    • Had to also install pydot and graphviz-2.38.msi. Then add the graphviz bin directory to the path.
    • But now everything runs on the laptop, which will help with the demos!
    • Skipped the GloVe and pre-trained embeddings. Ready to start on DNNs tomorrow.

Phil 7.20.18

Listening to We Can’t Talk Anymore? Understanding the Structural Roots of Partisan Polarization and the Decline of Democratic Discourse in 21st Century America. Very Tajfel

  • David Peritz
  • Political polarization, accompanied by negative partisanship, are striking features of the current political landscape. Perhaps these trends were originally confined to politicians and the media, but we recently reached the point where the majority of Americans report they would consider it more objectionable if their children married across party lines than if they married someone of another faith. Where did this polarization come from? And what it is doing to American democracy, which is housed in institutions that were framed to encourage open deliberation, compromise and consensus formation? In this talk, Professor David Peritz will examine some of the deeper forces in the American economy, the public sphere and media, political institutions, and even moral psychology that best seem to account for the recent rise in popular polarization.

Sent out a Doodle to nail down the time for the PhD review

Went looking for something that talks about the cognitive load for TIT-FOR-TAT in the Iterated Prisoner’s Dilemma and can’t find anything. Did find this though, that is kind of interesting: New tack wins prisoner’s dilemma. It’s a collective intelligence approach:

  • Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These, Jennings explained, were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume “master and slave” roles – one would sacrifice itself so the other could win repeatedly.
  • Nick Jennings
    • Professor Jennings is an internationally-recognized authority in the areas of artificial intelligence, autonomous systems, cybersecurity and agent-based computing. His research covers both the science and the engineering of intelligent systems. He has undertaken fundamental research on automated bargaining, mechanism design, trust and reputation, coalition formation, human-agent collectives and crowd sourcing. He has also pioneered the application of multi-agent technology; developing real-world systems in domains such as business process management, smart energy systems, sensor networks, disaster response, telecommunications, citizen science and defence.
  • Sarvapali D. (Gopal) Ramchurn
    • I am a Professor of Artificial Intelligence in the Agents, Interaction, and Complexity Group (AIC), in the department of Electronics and Computer Science, at the University of Southampton and Chief Scientist for North Star, an AI startup.  I am also the director of the newly created Centre for Machine Intelligence.  I am interested in the development of autonomous agents and multi-agent systems and their application to Cyber Physical Systems (CPS) such as smart energy systems, the Internet of Things (IoT), and disaster response. My research combines a number of techniques from Machine learning, AI, Game theory, and HCI.

7:00 – 4:30 ASRC MKT

  • SASO Travel request
  • SASO Hotel – done! Aaaaand I booked for August rather than September. Sent a note to try and fix using their form. If nothing by COB try email.
  • Potential DME repair?
  • Starting Deep Learning with Keras. Done with chapter one
  • Two seedbank lstm text examples:
    • Generate Shakespeare using tf.keras
      • This notebook demonstrates how to generate text using an RNN with tf.keras and eager execution.This notebook is an end-to-end example. When you run it, it will download a dataset of Shakespeare’s writing. The notebook will then train a model, and use it to generate sample output.
    • CharRNN
      • This notebook will let you input a file containing the text you want your generator to mimic, train your model, see the results, and save it for future use all in one page.

 

Phil 7.19.18

7:00 – 3:00 ASRC MKT

  • More on augmented athletics: Pinarello Nytro electric road bike review m2_0229_670
  • WhatsApp Research Awards for Social Science and Misinformation ($50k – Applications are due by August 12, 2018, 11:59pm PST)
  • Setting up meeting with Don for 3:30 Tuesday the 24th. He also gave me some nice leads on potential people for Dance my PhD:
    • Dr. Linda Dusman
      • Linda Dusman’s compositions and sonic art explore the richness of contemporary life, from the personal to the political. Her work has been awarded by the International Alliance for Women in Music, Meet the Composer, the Swiss Women’s Music Forum, the American Composers Forum, the International Electroacoustic Music Festival of Sao Paulo, Brazil, the Ucross Foundation, and the State of Maryland in 2004, 2006, and 2011 (in both the Music: Composition and the Visual Arts: Media categories). In 2009 she was honored as a Mid- Atlantic Arts Foundation Fellow for a residency at the Virginia Center for the Creative Arts. She was invited to serve as composer in residence at the New England Conservatory’s Summer Institute for Contemporary Piano in 2003. In the fall of 2006 Dr. Dusman was a Visiting Professor at the Conservatorio di musica “G. Nicolini” in Piacenza, Italy, and while there also lectured at the Conservatorio di musica “G. Verdi” in Milano. She recently received a Maryland Innovation Initiative grant for her development of Octava, a real-time program note system (octavaonline.com).
    • Doug Hamby
      • A choreographer who specializes in works created in collaboration with dancers, composers, visual artists and engineers. Before coming to UMBC he performed in several New York dance companies including the Martha Graham Dance Company and Doug Hamby Dance. He is the co-artistic director of Baltimore Dance Project, a professional dance company in residence at UMBC. Hamby’s work has been presented in New York City at Lincoln Center Out-of-Doors, Riverside Dance Festival, New York International Fringe Festival and in Brooklyn’s Prospect Park. His work has also been seen at Fringe Festivals in Philadelphia, Edinburgh, Scotland and Vancouver, British Columbia, as well as in Alaska. He has received choreography awards from the National Endowment for the Arts, Maryland State Arts Council, New York State Council for the Arts, Arts Council of Montgomery County, and the Baltimore Mayor’s Advisory Committee on Arts and Culture. He has appeared on national television as a giant slice of American Cheese.
  • Sent out a note with dates and agenda to the committee for the PhD review thing. Thom can open up August 6th
  • Continuing extraction of seed terms for the sentence generation. And it looks like my tasking for next sprint will be to put together a nice framework for plugging in predictive patterns systems like LSTM and multi-layer perceptrons.
  • This seems to be working:
    agentRelationships GreenFlockSh_1
    	 sampleData 0.0
    		 cell cell_[4, 6]
    		 influences AGENT
    			 influence GreenFlockSh_0 val =  0.8778825396520958
    			 influence GreenFlockSh_2 val =  0.8859173062045552
    			 influence GreenFlockSh_3 val =  0.9390368569108515
    			 influence GreenFlockSh_4 val =  0.9774328763377834
    		 influences SOURCE
    			 influence UL_point val =  0.032906293611796644
  • Sprint planning
    • VP-613: Develop general TensorFlow/Keras NN format
      • LSTM
      • MLP
      • CNN
    • VP-616: SASO Preparation
      • Slides
      • Poster
      • Demo

 

Phil 7.18.18

divylmzuyaeqjbk

There was no colusion“…”Anyone involved in that meddling to justice.

Premises for Data Science Magical Realism

  • What follows are some premises for data science magical realism stories based (very, very loosely) on experiences I’ve had or heard about — premises, that is, for stories about impossible, absurd, magical things happening to data scientists in ordinary data science situations. Enjoy!
  • More from David Masad

Program Synthesis in 2017-18

  • A high-level overview of the recent ideas and representative papers in program synthesis as of mid-2018.
  • Alex (Oleksandr) Polozov, a researcher in the Deep Procedural Intelligence group at Microsoft Research AI, Redmond. I work on neural program synthesis from input-output examples and natural language, intersections of machine learning and software engineering, and neuro-symbolic architectures. I am particularly interested in combining neural and symbolic techniques to tackle the next generation of AI problems, including program synthesis, planning, and reasoning.

UMAP Uniform Manifold Approximation and Projection for Dimension Reduction | SciPy 2018 |(video) (paper)

  • UMAP (Uniform Manifold Approximation and Projection) is a novel manifold learning technique for dimension reduction. UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data. The UMAP algorithm is competitive with t-SNE for visualization quality, and arguably preserves more of the global structure with superior run time performance. Furthermore, UMAP as described has no computational restrictions on embedding dimension, making it viable as a general purpose dimension reduction technique for machine learning.
  • This could be nice for building maps

7:00 – 5:00 ASRC MKT

  • Progress on getting my keys back!
  • Got everyone’s response on the Doodle, but only 4 of the 5 line up…
  • Finish first pass through PhD review slides
  • Start SASO slides and poster?
  • Continue with exporting terms from the sim and importing them into python. One of the things that will matter is the tagging of the data with the seed terms from the sim as well as the cell name so that reconstructions can be compared for accuracy.
  • Added the cell location to each <sampleData> so that there can be some kind of tagging/ground truth about the maps we’re inferring.
  • Working on iterating through the etree hierarchy. I can now read in the file, parse it and get elements that I’m looking for.
  • Tomorrow will be pulling the seed words out of the code in an ordered list. Generated sentences will need to be timestamped to that conversations can be reconstructed. That being said, it could be interesting to take seed words out of a generated sentence and add them to the embedding seed words. Something to think about.

Phil 7.1.18

On vacation, but oddly enough, I’m back on my morning schedule, so here I am in Bormio, Italy at 4:30 am.

I forgot my HDMI adaptor for the laptop. Need to order one and have it delivered to Zurich – Hmmm. Can’t seem to get it delivered from Amazon to a hotel. Will have to buy in Zurich

Need to add Gamerfate to the lit review timeline to show where I started to get interested in the problem – tried it but didn’t like it. I’d have to redo the timeline and I’m not sure I have the excel file

Add vacation pictures to slides – done!

Some random thoughts

  • When using the belief space example of the table, note that if we sum up all the discussions about tables, we would be able to build a pretty god map of what matters to people with regards to tables
  • Manifold learning is what intelligent systems do as a way of determining relationships between things (see curse of dimensionality). As groups of individuals, we need to coordinate our manifold learning activities so that we can us the power of group cognition. When looking at how manifold learning schemes like t-sne and particularly embedding systems such as word2vec create their own unique embeddings, it becomes clear that our machines are not yet engaged in group cognition, except in the simplest way of re-using trained networks and copied hyperparameters. This is very prone to stampedes
  • In conversation at dinner, Mike M mentioned that he’d like a language app that is able to indicate the centrality of a term an order that list so that it’s possible to learn a language in a “prioritized” way that can be context-dependent. I think that LMN with a few tweaks could do that.

Continuing the Evolution of Cooperation. A thing that strikes me is that once a TIT FOR TAT successfully takes over, then it becomes computationally easier to ALWAYS COOPERATE. That could evolve to become dominant and be completely vulnerable to ALWAYS DEFECT

Phil 6.11.18

7:00 – 6:00 ASRC MKT

  • More Bit by Bit. Reading the section on ethics. It strikes me that simulation could be a way to cut the PII Gordion Knot in some conditions. If a simulation can be developed that generates statistically similar data to the desired population, then the simulated data and the simulation code can be released to the research community. The dataset becomes infinite and adjustable, while the PII data can be held back. Machine learning systems trained on the simulated data can then be evaluated on the confidential data. The differences in the classification by the ML systems between real data and simulated data can also provide insight into the gaps in fidelity of the simulated data, which would provide an ongoing improvement to the simulation, which could in turn be released to the community.
  • Continuing with the cleanup of the SASO paper. Mostly done but some trimming of redundent bits and the “Ose Simple Trick” paragraph.
  • SASO travel link
    • Monday prices: SASO
  • Fika
    • Come up with 3-5 options for a finished state for the dissertation. It probably ranges from “pure theory” through “instance based on theory” to “a map generated by the system that matches the theory”
    • Once the SASO paper is in, set up a “wine and cheese” get together for the committee to go over the current work and discuss changes to the next phase
    • Start on a new IRB. Emphasize how everyone will have the same system to interact with, though their interactions will be different. Emphasize that the system has to allow open interaction to provide the best chance to realize theoretical results.
    • Will and I are on the hook for a Fika about LaTex

Phil 6.7.18

7:00 – 4:30 ASRC MKT

  • Che Dorval
  • Done with the whitepaper! Submitted! Yay! Add to ADP
  • The SLT meeting went well, apparently. Need to determine next steps
  • Back to Bit by Bit. Reading about mass collaboration. eBird looks very interesting. All kinds of social systems involved here.
    • Research
      • Deep Multi-Species Embedding
        • Understanding how species are distributed across landscapes over time is a fundamental question in biodiversity research. Unfortunately, most species distribution models only target a single species at a time, despite strong ecological evidence that species are not independently distributed. We propose Deep Multi-Species Embedding (DMSE), which jointly embeds vectors corresponding to multiple species as well as vectors representing environmental covariates into a common high-dimensional feature space via a deep neural network. Applied to bird observational data from the citizen science project \textit{eBird}, we demonstrate how the DMSE model discovers inter-species relationships to outperform single-species distribution models (random forests and SVMs) as well as competing multi-label models. Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling. An important domain contribution of the DMSE model is the ability to discover and describe species interactions while simultaneously learning the shared habitat preferences among species. As an additional contribution, we provide a graphical embedding of hundreds of bird species in the Northeast US.
  • Start fixing This one Simple Trick
    • Highlighted all the specified changes. There are a lot of them!
    • Started working on figure 2, and realized (after about an hour of Illustrator work) that the figure is correct. I need to verify each comment before fixing it!
  • Researched NN anomaly detection. That work seems to have had its heyday in the ’90s, with more conventional (but computationally intensive) methods being preferred these days.
  • I also thought that Dr. Li’s model had a time-orthogonal component for prediction, but I don’t think that’s true. THe NN is finding the frequency and bounds on its own.
  • Wrote up a paragraph expressing my concerns and sent to Aaron.