Transformers from Scratch

• Transformers are a very exciting family of machine learning architectures. Many good tutorials exist, but in the last few years transformers have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work. This post is an attempt to explain directly how modern transformers work, and why, without some of the historical baggage.

Dissertation

• Folding in Wayne’s edits
• Made the Arendt paragraph of velocity less reflective and more objective.
• TODO: Defend facts to opinion with examples of language, framing, what is interesting, etc.-done
• TODO: Heavy thoughts, light and frivolous, etc. We ascribe these, but they are not there – done
• TODO: We have a MASSIVE physical bias. Computers don’t. Done
• TODO: COmputers and people must work together
• Title case all refs (Section, Table, etc) – done
• \texttt all urls (reddit, etc) – done
• search for and / or slashes
• Fix underlines as per here– done!
% for better underlining
\usepackage[outline]{contour}
\usepackage{ulem}
\normalem % use classical emph

\newcommand \myul[4]{%
\begingroup%
\renewcommand \ULdepth {#1}%
\renewcommand \ULthickness {#2}%
\contourlength{#3}%
\uline{\phantom{#4}}\llap{\contour{white}{#4}}%
\endgroup%
}

# Phil 1.17.20

An ant colony has memories that its individual members don’t have

• Like a brain, an ant colony operates without central control. Each is a set of interacting individuals, either neurons or ants, using simple chemical interactions that in the aggregate generate their behaviour. People use their brains to remember. Can ant colonies do that?

Optuna: An open source hyperparameter optimization framework to automate hyperparameter search

• Medium writeup. It looks like this is Bayesian, and is better than hyperopt?

• Dissertation
• Finished the intro, starting motivation
• NSOF Meeting with Isaac & Bruce
• Still looking at the optimal scenario to use the current simulators (running over a weekend) to generate data
• Data sets are used to train and evaluate, then progressively simplified until they can no longer recognize the source data. This will let us estimate the fidelity of the simulations we need.
• JuryRoom meeting. Looking into adding UX faculty. Meeting is expanding to 6:00 – 8:00

I got invited to the TF Dev conference!

The HKS Misinformation Review is a new format of peer-reviewed, scholarly publication. Content is produced and “fast-reviewed” by misinformation scientists and scholars, released under open access, and geared towards emphasizing real-world implications. All content is targeted towards a specialized audience of researchers, journalists, fact-checkers, educators, policy makers, and other practitioners working in the information, media, and platform landscape.

• For the essays, a length of 1,500 to 3,000 words (excluding footnotes and methodology appendix) is appropriate, but the HKS Misinformation Review will consider and publish longer articles. Authors of articles with more than 3,000 words should consult the journal’s editors before submission.

•  Dissertation
• It looks like I fixed my LaTeX problems. I went to C:\Users\phil\AppData\Roaming\MiKTeX\2.9\tex\latex, and deleted the ifvtex folder. Re-ran, things installed, and all is better now
• Slides
• GOES
• Pinged Isaac about the idea of creating scenarios that incorporate the NASA simulators
• Meeting
• GSAW
• Slides
• Speakers presenting in a plenary session are scheduled to speak for 15 minutes, with five additional minutes allowed for questions and answers from the audience
• Our microphones work best when the antenna unit is clipped to a belt and the microphone is attached near the center of your chest.
• We are NOT providing network capabilities such as WiFi. If you require WiFi, you are responsible for purchasing it from the hotel and ensuring that it works for the presentation.
• Charts produced by the PC version of Microsoft PowerPoint 2013, 2016 or 365 are preferred
• . In creating your slides, note that the presentation room is large and you should consider this in your selection of larger fonts, diagram size, etc. At a minimum, a 20-point font is recommended
• GPT-2 – Maybe do something with Aaron today?

• Finishing touches on the dissertation. Need to lint the bibtex – done
• The work machine is not behaving. Had to move to Overleaf
• Call commonvision to schedule printing and binding – done
• Order some thumb drives – done
• Meeting with Don. Discovered that he’s a digital format guy. Discovered that the Lit Review was missing from the exec summary
• Corresponding with Thom. Hardcopy. Meeting still on Friday?

• Dissertation
• GOES
• New board is not showing up. Yay, it shows up if I remove the old board and put it in the old position
• Ordered a 1,000 watt power supply

On the Relationship between Self-Attention and Convolutional Layers

• Recent trends of incorporating attention mechanisms in vision have led researchers to reconsider the supremacy of convolutional layers as a primary building block. Beyond helping CNNs to handle long-range dependencies, Ramachandran et al. (2019) showed that attention can completely replace convolution and achieve state-of-the-art performance on vision tasks. This raises the question: do learned attention layers operate similarly to convolutional layers? This work provides evidence that attention layers can perform convolution and, indeed, they often learn to do so in practice. Specifically, we prove that a multi-head self-attention layer with sufficient number of heads is at least as powerful as any convolutional layer. Our numerical experiments then show that the phenomenon also occurs in practice, corroborating our analysis. Our code is publicly available.
• I’ve just started to think about how machines and humans could serve as different attention heads, which is why we concentrate into populations with shared features. Attention, given the right conditions, may be an emergent phenomena. Need to look at Kauffman.

Dissertation

• More Forward – done!
• Dedication – done
• Acknowledgements – started!
• Sometime between the end of the forward and meeting with Aaron, move over to the new template

• Dissertation
• Stampedes are a form of runaway attention, and precision/recall aid that process
• Starting on forward. Using the Arab Spring and GamerGate as the framing
• 11:00 VOLPE Meeting
• Pursuing the resilience proposal was well received. Next, go up and meet with the folks?
• Install card – done! Passed the smoke test

Dissertation

• Fix H3a-c – look at the heatmaps to see if there is some way of showing cell visitation as trustworthy, low border cells as safe, and stampede conditions as untrustworthy. Otherwise, use DTW
• Helpful information on Excel Histograms

Nomad, flocking, and stampeding heatmaps

• A border/core ratio explains this nicely. when border dwell time (BDT) > 1,  dangerous stampede. When BDT = 1, then nomads, When BDT < 1, flocking.
• Updated the simulation results section. Now I need to update the conclusion hypothesis. – done!

Got my graphics card!

• Dissertation
• Finishing discussion – done
• Rolling in TACJ from introduction – done
• Adding conclusions – done
• Fix H3a-c
• Reimbursement for fall – done
• Mission Drive meeting (need to get time for dissertation and GSAW prep)

• Dissertation
• Started the exec summary. I think the formatting is fine and it doesn’t show up in the TOC
• Started the discussion overview
• Fixed a bunch of orphan numbers, figure references and other formatting

• Dr. Yueh is Fellow in Economics at St Edmund HallOxford University and Adjunct Professor of Economics at London Business School.
• Dissertation
• Adding more chapter summaries
• Maps – done
• Human Study – done
• Discussion
• Conclusions
• Long chat with Aaron M
• The front matter is your cover letter
• Search and replace et. al. -> at al., “. -> .”, and check all footnotes
• Exec summary can be done as a renumber after main doc

• Roger pointed me at ‘Most advanced, yet acceptable’: Typicality and novelty as joint predictors of aesthetic preference in industrial design
• Typicality and novelty have often been shown to be related to aesthetic preference of human artefacts. Since a typical product is rarely new and, conversely, a novel product will not often be designated as typical, the positive effects of both features seem incompatible. In three studies it was shown that typicality (operationalized as ‘goodness of example’) and novelty are jointly and equally effective in explaining the aesthetic preference of consumer products, but that they suppress each other’s effect. Direct correlations between both variables and aesthetic preference were not significant, but each relationship became highly significant when the influence of the other variable was partialed out. In Study 2, it was furthermore demonstrated that the expertise level of observers did not affect the relative contribution of novelty and typicality. It was finally shown (Study 3) that a more ‘objective’ measure of typicality, central tendency — operationalized as an exemplar’s average similarity to all other members of the category — yielded the same effect of typicality on aesthetic preference. In sum, all three studies showed that people prefer novel designs as long as the novelty does not affect typicality, or, phrased differently, they prefer typicality given that this is not to the detriment of novelty. Preferred are products with an optimal combination of both aspects.
• Trust is earned in the smallest of moments. It is earned not through heroic deeds, or even highly visible actions, but through paying attention, listening, and gestures of genuine care and connection. Brené Brown
• If we share group membership with other across a range of social settings it becomes more likely that the actors will face future exchanges with reversed roles (Resnick, 2002). Repeated interactions with stable identities also allow the trustor to accumulate knowledge about the trustee and to make better predictions about his behavior. Thus, by extrapolating from past behavior trust in future encounters can grow. The mechanics of trust: A framework for research and design
• Dissertation
• Adding more chapter summaries
• Simulation – done
• Adversarial Herding – done
• Maps
• Human Study
• Discussion
• Conclusions
• Read “I Just Google It”: Folk Theories of Distributed Discovery

• Diversity promotes collective intelligence in large groups but harms small ones
• Diverse groups are often said to be less susceptible to decision errors resulting from herding and polarization. Thus, the fact that many modern interactions happen in a digital world, where filter bubbles and homophily bring people together, is an alarming yet poorly understood phenomenon. But online interactions are also characterized by unprecedented scale, where thousands of individuals can exchange ideas simultaneously. Evidence in collective intelligence however suggests that small (rather than large) groups tend to do better in complex information environments. Here, we adopt the well-established framework of social learning theory (from the fields of ecology and cultural evolution) to explore the causal link between diversity and performance as a function of group size. In this pre-registered study, we experimentally manipulate both group diversity and group size, and measure individual and group performance in realistic geo-political judgements. We find that diversity hinders the performance of individuals in small groups, but improves it in large groups. Furthermore, aggregating opinions of modular crowds composed of small independent but homogeneous groups achieves better results than using non-modular diverse ones. The results are explained by greater conflict of opinion in diverse groups, which negatively impacts small (but not large) groups. The present work sheds light on the causal mechanisms underlying the success (or lack thereof) of diverse groups in digital environments, and suggests that diversity research can benefit from adopting a wider social learning perspective.
• “I Just Google It”: Folk Theories of Distributed Discovery
• A significant minority of people do not follow news regularly, and a growing number rely on distributed discovery (especially social media and search engines) to stay informed. Here, we analyze folk theories of news consumption. On the basis of an inductive analysis of 43 in-depth interviews with infrequent users of conventional news, we identify three complementary folk theories (“news finds me,” “the information is out there,” and “I don’t know what to believe”) that consumers draw on when making sense of their information environment. We show that the notion of folk theories help unpack the different, complementary, sometimes contradictory cultural resources people rely on as they navigate digital media and public affairs, and we argue that studying those who rarely engage directly with news media but do access information via social media and search provides a critical case study of the dynamics of an environment increasingly defined by platforms.
• Dissertation
• Working on Lit Review overview
• Fixed the margins for blockquotes by creating a more flexible changemargin command
\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist
• Which is used like this
\begin{changemargin}{1.5cm}{1.5cm}
They were one man, not thirty. For as the one ship that held them all; though it was put together of all contrasting things-oak, and maple, and pine wood; iron, and pitch, and hemp-yet all these ran into each other in the one concrete hull, which shot on its way, both balanced and directed by the long central keel; even so, all the individualities of the crew, this man’s valor, that man’s fear; guilt and guiltiness, all varieties were welded into oneness, and were all directed to that fatal goal which Ahab their one lord and keel did point to.
\end{changemargin}
• Fixed a bunch of things, including blockquotes
• Biological Basis – done
• Human Belief Spaces – done
• Dimension Reduction – done
• Orientation – done
• Velocity – done
• Social Influence Horizon – done
• Bones in a hut – started
• 1:00 Dentist