Phil 1.20.20

Transformers are a very exciting family of machine learning architectures. Many good tutorials exist, but in the last few years transformers have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work. This post is an attempt to explain directly how modern transformers work, and why, without some of the historical baggage.

Dissertation

Folding in Wayne’s edits
- Made the Arendt paragraph of velocity less reflective and more objective.
- TODO: Defend facts to opinion with examples of language, framing, what is interesting, etc.-done
  - Typoglycemia tool
- TODO: Heavy thoughts, light and frivolous, etc. We ascribe these, but they are not there – done
- TODO: We have a MASSIVE physical bias. Computers don’t. Done
- TODO: COmputers and people must work together
Title case all refs (Section, Table, etc) – done
\texttt all urls (reddit, etc) – done
search for and / or slashes

Fix underlines as per here– done!

% for better underlining
\usepackage[outline]{contour}
\usepackage{ulem}
\normalem % use classical emph

\newcommand \myul[4]{%
	\begingroup%
	\renewcommand \ULdepth {#1}%
	\renewcommand \ULthickness {#2}%
	\contourlength{#3}%
	\uline{\phantom{#4}}\llap{\contour{white}{#4}}%
	\endgroup%
}

viztales

Dimension reduction, State, Orientation, and Speed

Phil 1.20.20

Share this:

Related