It’s been a wild week and I need to share quick thoughts. I have to ask this question: did a newly created whale just trash a stargate?
All right, to explain while I’m writing this in haste. Over the past few days I have been in San Diego and before that Washington, DC, where the temperature was actually wintry. This was to work at three professional events. I’ve been giving talks about AI and higher education and also listening to same, among other things. I also had the great good fortune to receive an awesome award, then to give a speech about climate change. I’ve had some good conversations, did my best on several panels, stumbled into one invigorating shouting match, signed a lot of books, and, of course, networked like mad. I’ve also hosted two Future Trends Forums and wrote book chapters like mad for the next book. Did I mention teaching a seminar?
Meanwhile, politically, the United States has been in a chaotic whirl as the newly installed Trump administration fires off all kinds of executive orders, international statements, domestic ideas, and the first wave of immigrant deportations. I don’t usually touch on that topic in this newsletter (see my blog for more) but it’s relevant context for the topic as well as for my life.
Nevertheless, disregarding politics and my work schedule, the AI world decided to intrude by offering up some major developments. I’ll describe them here and share some future-oriented thoughts.
A caveat: these stories are happening now, in a hurry. This post is a hot take, a quick assembly of materials in mid-flight. Kinda like a whale leaping through a stargate:
Let’s start with the basics. Two things happened.
A group of companies led by OpenAI and supported by the new Trump administration announced Project Stargate, a massive build out of computing infrastructure to support more generative AI. Proponents used the figure of $500 billion.
A Chinese investment firm launched an open source AI called Deepseek, which rapidly raced up rankings, conquered apps stores, and scared the heck out of Silicon Valley, with tech stocks falling in massive losses.
Let’s poke into each story then see how they connect and what they might portend for AI’s future.
Stargate (fans of the movie and/or tv series might appreciate the image above) would invest in a huge amount of data centers and power plants to keep them going. Besides OpenAI, investors and/or partners vowing to commit hefty amounts of money and/or technology include Arm, MGX, Microsoft, NVIDIA, Oracle, and SoftBank. There was political support from the new US president, as well as mockery from Elon Musk.
It’s enormously ambitious. The amount of money is immense. The key idea here is that continuing to grow and improve generative AI requires immense amounts of computer power. Trump’s support is evidence of the new administration backing American generative AI as a kind of industrial policy.
Then a Chinese project took off and knocked Stargate off its pedestal.
Deepseek is an AI company spun out of a hedge fund, High-Flyer (幻方). They made a big splash in Chinese AI in 2024, kicking off a price war while offering high quality tech. In December the company posted version 3 of its core AI online in open source under an MIT license, although there are conflicting views as to whether or not the published code truly constitutes an open source release. In early January the company launched an AI chat web application, based on v3, along with mobile device apps. Two new versions of V3 appeared in two open source, R1 and R1zero. Deepseek developers also posted an open access technical paper to arXiv.
These efforts caught the attention of those of us tracking AI, but the iOS app becoming the most downloaded app in the Apple store really pushed Deepseek into the mainstream. (Note that the apps store is a consumer thing, not a business one.) Reactions were generally very positive, with OpenAI’s CEO deeming the tech “impressive” and investor Marc Andreesen calling it “one of the most amazing and impressive breakthroughs I’ve ever seen.“ The story of a (comparatively) little company successfully competing with giant companies hit the latters’ stock market values hard. One estimate saw a $1 trillion drop. Chip behemoth NVIDIA alone lost $600 billion in value. There were downstream effects as well, as uranium dropped (because of potential lost nuclear power plants builds.)
In contrast to the industrial ambitions of Stargate, Deepseek produced V3 with far fewer resources: much less money, less computing hardware, although there are disagreements about precise quantities. One reason is did so was because its technologists could build upon other open source AI. Another is that escalating American trade sanctions, notably restrictions on some powerful chips, forced the Chinese team to innovate on the (relative) cheap.
Where does this leave us? Where might AI go in the near future?
Geopolitics Deepseek’s triumph plays a part in the deepening US-China cold war on several levels. For one, we can see this as a story of national competition and pride, with China as the clever David outfacing a Goliath America. More than a few people compared the Deepseek takeoff as akin to the Sputnik shock.
We can easily imagine respective governments and companies throwing more resources into a rising generative AI arms race. Perhaps we’ll see a “code for your nation” attitude surface. We might also see trade barriers rise further still as each nation seeks to constrain the other’s AI development. For example, Anthropic’s CEO calls for the US and allies to cut off China’s chip access so as to create a US-led unipolar AI world order. Washington could go further and seek to block access to High-Flyer’s tools, as it has done with Huawei gear and Tiktok. Such a move would do better to focus on hardware than software, according to Noah Smith.
A second geopolitical dimension involves concerns about Chinese companies owning and exploiting user data. Beijing’s massive surveillance enterprise depends on this kind of data gathering. Anxieties about national ownership underpinned American opposition to Tiktok. Will some call for users to avoid Deepseek because they see it as accumulating data for Cold War 2.0 purposes? Australia’s science minister issued just such a warning as have some British experts. At the same time, some non-Chinese users want to benefit from Deepseek; Matteo Wong has compared this to American divides over Tiktok, opposing China hawks against content creators.
Note that the Deepseek web application censors some results in the ways of modern China - i.e., mentions of the 1989 Tiananmen Square massacre, criticisms of president Xi, etc. hit guardrails. Some questions yield official line responses. For example,
and
I do wonder who’s behind the cyberattacks on Deepseek over the past few days.
Some technical details Deepseek used GPUs with lower capacity than was otherwise available, due to American-led export controls. They were at half power of what other countries could use, according to an MIT Technology Review article. Helpfully, CEO Liang Wenfeng snagged a stock of Nvidia A100 chips before restrictions went into effect.
Deepseek relies on reinforcement learning, which might inspire other projects not using that technique to do so now. Deepseek also uses chain of reasoning, where it starts to answer a question then reflects on its answer to produce a refinement, and so on. Chain of reasoning might continue to structure more AI offerings. Further, Deepseek relies on a mixture of experts approach, as the software creates internal “experts” to bounce ideas off of and to refine output. The mixture is uneven, with experts spawned and consulted only at certain times, reducing computational usage compared to having them always on.
I am curious about how Deepseek approaches agentic AI. American AI firms are pushing this computing form very hard. Will this Chinese competitor’s next release showcase an agent function, or might Deepseek avoid that for now, splitting the AI world?
Troubling generative AI economics. The stock market’s reaction to Deepseek’s rocketing performance was brutal. Does it represent a turn away from the American AI model of massive scaling? If so, how do the big AI companies response? Maybe some investors, having been clobbered, will withdraw or collapse. Perhaps, as critic Brian Merchant writes, we are seeing an enormous bubble pop.
Alternatively, if the big US AI companies can use cheaper technology and don’t fail in the process they can reduce their own costs and come out ahead in the medium term, as Ben Thompson writes.
On the flip side, what’s Deepseek’s business model? So far they’ve been innovating on the cheap and often making materials available for free. How much longer can they continue to be a fun scientific offshoot of a hedge fund? How might Beijing craft official policies and unofficial pressures as part of its AI industrial policy? Casey Newton ponders:
For the moment, DeepSeek doesn’t seem to have a business model to match its ambitions. For most big US AI labs, the (yet unrealized) business model is to develop the best service and to sell it at a profit. To date, DeepSeek has positioned itself as a kind of altruistic giveaway.
Open source success story Yann LeCun asks us to focus on the Deepseek story as one of this particular way of sharing and building upon code:
To people who see the performance of DeepSeek and think: "China is surpassing the US in AI." You are reading this wrong. The correct reading is: "Open source models are surpassing proprietary ones."
DeepSeek has profited from open research and open source (e.g. PyTorch and Llama from Meta) They came up with new ideas and built them on top of other people's work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.
We could see AI costs and prices drop significantly. Microsoft’s CEO thinks so, but also expects resulting demand to soar.
Now, it is unfair to imply that American generative AI work is all closed-source and proprietary. Meta has had some success publishing some or most of its Llama software in open source. Llama is a bigger and more expensive thing than Deepseek, from what I can tell, and so it seems they are switching to a war footing in order to compete. At the same time Meta seems pleased that their open source route was a good one.
I am curious about tempo. My impression unsupported by data is that generative AI R&D is speeding up. There are more projects and ideas in the air. Perhaps Deepseek will shock the technology world into doing even more and more quickly. If that intuition is correct, doing AI safety work is going to become even harder.
On a related note, users can modify the open source versions of Deepseek to remove guardrails, including political ones.
Other players The picture of Deepseek vs OpenAI (and Microsoft and Google) leaves out other important work. Anthropic’s Claude has been having some successes producing AI for less money and on small hardware bases. The open source AI world in general doesn’t get nearly the attention it should. For example, Hugging Face is starting to build a fully open version of Deepseek’s R1. It is:
an initiative to systematically reconstruct DeepSeek-R1’s data and training pipeline, validate its claims, and push the boundaries of open reasoning models. By building Open-R1, we aim to provide transparency on how reinforcement learning can enhance reasoning, share reproducible insights with the open-source community, and create a foundation for future models to leverage these techniques.
That’s one note I’d like to conclude on, tentatively - that Deepseek has shown groups other than the US AI giants can make and build on LLMs. As I’ve been forecasting for a few years, we might see medium-sized, then smaller-sized groups making and running their own. Perhaps this is ultimately moving in a more democratic direction.
This ended up being longer than I expected, as Substack warned me I broke a limit. More to come.
(Thanks to a lot of people for conversation, like Steven Kaye, Tom Lairson, Ruben Puentedura)
Good job, Bryan.
DeepSeek is indeed a wake-up call. It'll take awhile for the implications to cascade through the infosphere. I agree with you that this is a good thing for smaller companies and countries. As a comparison case you might take a look at Nollywood, a name for the African film-industry centered around Nigeria. I believe that it's now the third largest in the world as measured by annual sales. Read that again, third largest in the world. https://en.wikipedia.org/wiki/Cinema_of_Nigeria
The absolutely crucial point is that it leap-frogged ahead, jumping over the standard analog cinema based on celluloid film and chemical pigments. Oh, there were a few theaters in Africa from the colonial era, but not much. The technology in Nollywood is now all digital and the movies are produced for home consumption and showing in small venues on TV screens and monitors.
The barriers to entry are higher for AI, in both technological terms, and knowledge base. But there is a growing AI scene in sub-Saharan Africa.
On overall compute, there's a nuance pointed out by the ubiquitous Dread Pirate Marcus. The reduction is mainly during the training phase. That's where DeepSeek achieved its economies. The compute requirements for inference remain much the same. This is an important qualification, but I don't think it changes the overall picture.
Go go go Hugging Face ;)