This seems like it might be important for the limits of what we want to do with LLMs. CoT doesn’t work outside of the training distribution. Which I thin is what we all thought, but I think there are some deep implications for models that are running in impossible to crawl environments (exploration, classified, proprietary) environments that they have not been trained on. Much more likely to be outside the training distribution
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
- Chain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providing answers (a.k.a., CoT reasoning), which often leads to the perception that they engage in deliberate inferential processes. However, some initial findings suggest that CoT reasoning may be more superficial than it appears, motivating us to explore further. In this paper, we study CoT reasoning via a data distribution lens and investigate if CoT reasoning reflects a structured inductive bias learned from in-distribution data, allowing the model to conditionally
Tasks
- Finish review of paper 599 – DONE. That was hard
- Download ATHENE proposals – done
- More pix of trailer, then put it back in the driveway. Forgot to take the pix. I do think I’ll hang onto the trailer for a while longer though. I’ll need to move things into storage
- Remove lines from under the deck – nope
- Lube stove switches – nope
- Start making a list of agents (Nomad Century, Gutenberg, Sentient Cell, Bomber Mafia, etc) – nope
SBIRs
- Meeting with Aaron to rework stories – done
- Start on white paper for the gaming work. Include the Mirage paper, the Delusion Spiral paper, Cultural Fidelity in Large-Language Models, and the Generative Social Simulation paper – nope
