This is the key to making trustworthy models:
The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning
- If you reduce the parameter count in an LLM, it tends to lose recall of facts before it gets worse at learning from examples in the prompt. This holds for parameter count reductions via both pruning and using a smaller dense model.
- How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques — weight pruning and simply training a smaller or larger model, which we refer to as dense scaling — and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60–70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.
The thing is that for sociology, the large pretrained (not finetuned) models will probably be best.
SBIRs
- Add a 3 point Research Council story – done
- 9:00 standup – done
- 1:00 Dr. Banerjee – done. Fun!
- 2:00 BMD – done. Did a slide walkthrough and got some action items
2:30 AI Ethics3:00 AIMSS?
GPT Agents
- Thinking more about how to watch the changes of the model under prompting. I think a ring buffer prompt, where the oldest tokens drop off while new ones are added makes the most sense. I checked, and the Llama-2 models do come in pretrained and finetuned (chat) flavors.
- Put in a request for Llama-2 access – got it! That was quick. Yep pretrained and chat

- My talk is back at its original time!
- The atproto sdk looks very nice!
from atproto import Client, models
def main():
client = Client()
profile = client.login('my-handle', 'my-password')
print('Welcome,', profile.display_name)
response = client.send_post(text='Hello World from Python!')
client.like(models.create_strong_ref(response))
if __name__ == '__main__':
main()
