Digital censorship in China is not a static issue; it’s a constantly adapting system. Recent research confirms that Chinese artificial intelligence (AI) models actively self-censor to a far greater extent than their American counterparts, revealing how the government maintains control over emerging technologies. A study by Stanford and Princeton University researchers compared the responses of four Chinese and five American large language models (LLMs) when asked 145 politically sensitive questions, repeated 100 times to ensure reliability.
Quantifiable Censorship
The results were stark: Chinese models refused to answer a significantly higher percentage of questions. For example, DeepSeek rejected 36% of prompts, while Baidu’s Ernie Bot refused 32%. In contrast, OpenAI’s GPT and Meta’s Llama had refusal rates below 3%. When Chinese models did respond, their answers were shorter and less accurate than those from American models. This isn’t merely a difference in training data; the censorship is deliberate.
The Source of Bias: Training Data vs. Intervention
Researchers investigated whether this bias stemmed from pre-training on censored Chinese internet data or from direct developer intervention. Jennifer Pan, a Stanford political science professor, explains that “given that the Chinese internet has already been censored for decades, there’s a lot of missing data.” However, even when tested in English—where training data would theoretically be more diverse—Chinese LLMs still exhibited more censorship, suggesting manual intervention plays a key role.
The Illusion of Honesty: Hallucinations and Lies
One challenge in studying AI censorship is distinguishing between outright lies and “hallucinations”—where the model fabricates information because it doesn’t know the answer. For example, when asked about Liu Xiaobo, a Chinese dissident, one model falsely claimed he was a Japanese scientist specializing in nuclear weapons. It’s unclear whether this was intentional misdirection or a result of missing data from its training set. Pan notes that less detectable censorship is often the most effective.
Extracting Hidden Instructions
Researchers are also developing methods to extract the hidden instructions that govern these models’ behavior. Alex Colville, studying AI propaganda at the China Media Project, found that prompts can force Alibaba’s Qwen to reveal its underlying guidelines. Qwen consistently admitted to being instructed to “focus on China’s achievements” and “avoid negative statements.” This subtle manipulation ensures that even when the model answers, it does so within pre-approved parameters.
The Race Against Time
The field of AI censorship research is still young, and faces significant hurdles: researchers risk losing access to models for asking too many sensitive questions, and advanced models demand considerable computational resources for testing. Most importantly, the rapid pace of model development means that any conclusions are likely to be outdated quickly.
The current focus on AI safety is skewed towards future risks, rather than the dangers already present in systems like those operating within China’s digital landscape.
The study underscores that AI censorship is not a theoretical concern, but an active practice. The findings highlight the need for further research into the methods used to manipulate these models and the broader implications for global information control.



























