Phil 1.20.20

Transformers from Scratch

  • Transformers are a very exciting family of machine learning architectures. Many good tutorials exist, but in the last few years transformers have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work. This post is an attempt to explain directly how modern transformers work, and why, without some of the historical baggage.


  • Folding in Wayne’s edits
    • Made the Arendt paragraph of velocity less reflective and more objective.
    • TODO: Defend facts to opinion with examples of language, framing, what is interesting, etc.-done
    • TODO: Heavy thoughts, light and frivolous, etc. We ascribe these, but they are not there – done
    • TODO: We have a MASSIVE physical bias. Computers don’t. Done
    • TODO: COmputers and people must work together
  • Title case all refs (Section, Table, etc) – done
  • \texttt all urls (reddit, etc) – done
  • search for and / or slashes
  • Fix underlines as per here– done!
    % for better underlining
    \normalem % use classical emph
    \newcommand \myul[4]{%
    	\renewcommand \ULdepth {#1}%
    	\renewcommand \ULthickness {#2}%