Happy PI day, for all those irrational folks
Modern language models refute Chomsky’s approach to language
- The rise and success of large language models undermines virtually every strong claim for the innateness of language that has been proposed by generative linguistics. Modern machine learning has subverted and bypassed the entire theoretical framework of Chomsky’s approach, including its core claims to particular insights, principles, structures, and processes. I describe the sense in which modern language models implement genuine theories of language, including representations of syntactic and semantic structure. I highlight the relationship between contemporary models and prior approaches in linguistics, namely those based on gradient computations and memorized constructions. I also respond to several critiques of large language models, including claims that they can’t answer “why” questions, and skepticism that they are informative about real life acquisition. Most notably, large language models have attained remarkable success at discovering grammar without using any of the methods that some in linguistics insisted were necessary for a science of language to progress.
Alpaca: A Strong Open-Source Instruction-Following Model
- We introduce Alpaca 7B, a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. Alpaca behaves similarly to OpenAI’s text-davinci-003, while being surprisingly small and easy/cheap to reproduce (<$600).
- Sprint planning – done
- Generate a set of prompts for stories using Sun Zu and Clausewitz and save them out in prompts – done
- Test the prompts in Narrative generator. Don’t forget about the later Babbage models
- Ran prompt1 using the Clausewitz context
- Run 100 stories each, cluster and label
- Write the code that will connect the sentences for each of the stories and see if they pass through clusters in meaningful ways. I still wonder if just looking for distances between sentence (or summary) vectors would make more sense. Something to evaluate.
- Another thought is to average the narrative trajectory using an adjustable window. Something to try
- If things look reasonable, write code that creates a network graph out of the connected clusters, lay them out in Gephi, and render them. This all needs to be done before the 20th!
- GPT BD tagup
- I’d like to get the tweet text for the excel export, but I need to not plot that in the boxplot. This seems to be the answer?