State media control influences large language models | Nature
- Millions of people around the world query large language models (LLMs) for
information. Although several studies have compellingly documented the persuasive
potential of these models, there is limited evidence of who or what influences the
models themselves, leading to a flurry of concerns about which companies and
governments build and regulate the models. Here we show through six studies that
government control of the media across the world already influences the output of
LLMs via their training data. We use a cross-national audit to show that LLMs exhibit
a stronger pro-government valence in the languages of countries with lower media
freedom than in those with higher media freedom. This result is correlational, so to
triangulate the specific mechanism of how state media control can influence LLMs,
we develop a multi-part case study on China’s media. We demonstrate that media
scripted and curated by the Chinese state appears in LLM training datasets. To evaluate
the plausible effect of this inclusion, we use an open-weight model to show that
additional pretraining on Chinese state-coordinated media generates more positive
answers to prompts about Chinese political institutions and leaders. We link this
phenomenon to commercial models through two audit studies demonstrating that
prompting models in Chinese generates more positive responses about China’s
institutions and leaders than do the same queries in English. The combination of
influence and persuasive potential across languages suggests the troubling conclusion
that states and powerful institutions have increased strategic incentives to leverage
media control in the hopes of shaping LLM output.
Tasks
- Chores – done
- Dishes – done
- Bills
- Ride up to Mike’s – done
- Suz at 2:15 – done
SBIRs
- Finish Q5 report – done
- 3:00 tagup meeting – cancelled
- 4:00 ADS – done
