DeepSeek — What does it mean?
The recent release of DeepSeek has caused a flurry of activity in the AI space, some impact in the stock markets, and has spawned about five million blog posts ranging from, “It’s not a big deal”, to “It’s the most important thing to happen in AI”.
My first instinct was to not add to the confusion with yet ANOTHER blog post about DeepSeek. Recently I have been answering questions from family and friends about “what is this DeepSeek thing? Should we care?”. I feel like I did a poor job of answering their questions, rambling and kind of unorganized. So I caught my breath, answered some customer questions on how IBM is treating DeepSeek, and thought about it some more. There is a need for some real-world perspective, and I need to be able to provide a coherent response for my friends and family.
What Happened?
Recently the announcement of the DeepSeek model availability was made. It was notable for a few different reasons, which we will dive into later. It caused a huge ripple in the US Stock market, a denial of service cyber attack, some political upheavals, and it came right on the heels of the Stargate announcement.
All of this activity and information ended up being spun by various media outlets in a variety of different ways. Most people that I’ve talked to have felt like they are not getting the “full story” from their typical news source. They feel like there is a side to the story that they are not receiving. Since they know I am an “AI nerd”, they ask me about my thoughts.
What I find incredibly interesting is the reaction of many people to this news. They seem to be questioning the stories, and beginning to wonder if this news is real and impactful to their lives, or just more “AI hype”. They are beginning to worry about job displacement, as well as many of the other typical fears about the maturation and spread of AI into the lives of ordinary people. I am beginning to sense more fear, and more apprehension, in the questions that I get from people. At some point, my industry is going to need to deal with the emotions and fears of people — and stop talking about the technical and business challenges of the technology. That seems like a good topic for a future blog….
The Geo-Political Impacts
I haven’t seen much written on the geopolitical impacts of the DeepSeek release. It was definitely a wake up call for AI community in North America and Europe. Many had thought that the AI efforts of China were far behind what we see in North America. Recent US restrictions on the shipping of special GPU units (which are impactful for the speed and ability to train AI models) were supposed to have limited Chinese AI efforts. Instead, it spurred the Chinese to develop more efficient methods for building and training AI models, and had the opposite effect. I’m sure that this will not be the last time that some political action will lead to unexpected consequences.
There now seems to be three main AI “camps” — from a geopolitical perspective:
- There is the Chinese model, which now includes DeepSeek, that uses AI as a broader governmental led effort to expand AI and have it serve the interests of the government. China provides some stringent controls on content and information, which many see as censorship and propaganda, with broad authority over what data is shared and how it is shared.
- There is the American ecosystem , which is more laissez-faire, where the government does not impose strong controls and oversight to private companies that look to push AI forward in an effort to pursue profit motivated business models. This tends to foster competition and faster innovation, with fewer regulatory controls and oversight. This can lead to some questionable outcomes, as competitors cut ethical corners in an effort to “be first”.
- The European model seems to be more aligned with the American model, in that private companies are pursuing AI advances that will lead to economic success. What is different is the amount of oversight and regulation in the European model — with companies liable for pervasive governance of their AI models by the EU AI Act. The EU also has imposed serious fines for violations of the GDPR, which looks to safeguard personal information and data for citizens in the European Union, which also has an impact on AI.
It still remains to be seen if this is a “Cold War/Sputnik” moment. Unlike the Cold War, the sides do not seem to be as clearly defined in the current technological race. Open Source models and AI tooling that crosses boundaries currently exists, and continues to be developed, which tends to keep these three different approaches a mixture of cooperation and competition. DeepSeek itself is “open source”, meaning that the code and approaches have been shared with the broader AI community.
The Technical Impacts
The announcement of DeepSeek has led to a number of interesting technical conversations in the AI community. Some are on the model building side of things, and others are on the “how you do it” side of things. If you’re not into the technical details of AI, skip the rest of this section, and focus on the business impacts of DeepSeek. If you’re interested in some of the technical information, read through the remainder of this section. I’ll highlight what I think are the most important technical details of DeepSeek. If you are an AI expert who likes to dive into the technical details, go and read through the links and decide for yourself, and let me know what you think.
The first things that is mentioned in their paper is that this is a “Mixture of Experts (MoE)” type of model. These have been used in the past, and it has become a standard practice for teams building LLMs. MoE’s have some definite positives for organizations doing model creation. MoE models will route input training to a variety of experts, using an initial classifier to decide on which expert some particular piece of training data goes to. This effectively creates a small team of “expert” models that are highly effective within their own small knowledge domain. Compared with traditional MoE architectures, the DeepSeek MoE uses more focused experts, and also isolates some of these MoE expert models as “shared” experts.
When you ask the typical LLM a question, it will give you an answer. This is a “black box” operation — you have no idea of anything that happened in transforming your question into an answer. You have no idea of what the model was “thinking” (I use that term loosely). What happens with a “chain of thought” design is that it makes an LLM explain itself. In the process of doing this, it breaks your request down and begins to look at different possible answers. It then responds and tells you about the steps of its response. So a question about the best way to fish in Texas may lead to an answer that looks at where Texas is, determines the types of fishing in the state, and then go down the paths of freshwater and saltwater fishing. As it goes, it explains its reasoning — so both you and your model can see where you might have made a mistake. The model has time to check its own answers as it goes along, and can try different paths. DeepSeek uses the Chain of Thought (CoT) design to help tune and improve the accuracy of the model.
DeepSeek also makes use of reinforcement learning, which is different from the more typical training done for LLM’s. Most LLMs are trained in a teacher/student type of mode, where examples of correct question and answer pairs are presented to the model, and the model will adjust it’s response to look like the expected response. This is done over and over again, and the model begins to get the hang of providing the correct types of answers.
Reinforcement learning is a bit different. It learns more like a human baby would. It makes decisions repeatedly in given scenarios, and makes hundreds of attmepts to respond, and learns to perform a task through trial-and-error. It does this without any human supervision. Its decides what works and what doesn’t work based upon some award criteria.
The efficient use of model distillation is the probably the most impactful thing I saw. Model distillation is taking large models, with billions of parameters, and using them to train smaller models. They can then train these smaller models, and help those smaller models achieve performance close to that of the original large model. This means that the resulting model is smaller, more compact, and easier/cheaper to deploy. DeepSeek didn’t invent mode distillation, but they did demonstrate that their particular distilled models can have accuracy that is very close to that of their “teacher” models — at a fraction of the size.
There are a lot of other technical innovations and approaches that the DeepSeek team used to get their model trained using 10% of the power and processing needed for the usual LLM. Some of the modifications of the way certain GPU components are used, like their DualPipe feature, are quite innovative. Much of the detail of the announcement paper on arXiv goes right over my head. It will be interesting to see if the other LLM vendors will be able to reproduce these results in their own environments. It would be the ultimate irony if the DeepSeek team let the model write up their announcement, and it just hallucinated all of the data and claims about it’s accuracy and costs.
The Business Impacts
The business impacts of DeepSeek are still yet to be fully felt. While short-term impacts to the markets have been newsworthy, the true impacts of DeepSeek may take more time to evolve. Their approach to using distilled models and the focus on deployment of smaller inferencing engines is one that will be welcome in many different environments. Getting AI “to the edge” (closer to user’s, and out of the data centers) has been a blocker for many of the more interesting AI scenarios that I have discussed with customers. Having a smaller model allows it to be deployed closer to where it is needed, and may eliminate the need for an “always on” connection to the internet.
The cost component of DeepSeek is one of the more impactful aspects to business. Using LLMs can get expensive, as many of the applications of this technology require multiple uses of an LLM to complete some task, and each use o the inferencing engine of a model involves a cost. If the costs of training and inference are reduced with DeepSeek, then the costs passed along to end users will also be reduced. It makes the business case for applications make sense. For example, a “Digital copilot” application might have cost a couple of million dollars to develop and train. If those costs are reduced by 10x to $200,000, the business case for developing them may now make sense for an organization. This also holds true for the operational costs as well — if inferencing is expected to run $2000 a month per user, a 10x reduction would bring this down to $200. So I might not spend two thousand dollars to make my employees more efficient and effective, but I might be willing to do this if it only costs me $200 a month per employee.
The fact that the DeepSeek model has been made “open source” may make it more attractive to developers and businesses. Being open source means that other organizations and teams can distill their own models from DeepSeek, and they can then use those distilled models themselves, without having to reference DeepSeek. This also helps make certain business applications more attractive, since smaller models can be deployed behind your own firewall, within your own security boundary, and your data can remain secure.
The Reality
The reality here is that most people (including me) are guessing. The “gloom and doom” group will say this makes AGI closer to reality, and now is the time to pull the plug on ALL AI activities. The “anti-China” crowd will claim the DeepSeek is too good, and that we should continue to limit ANY/ALL technology exports to China. The “AI Hype” gang will talk about how great this is and how it has the ability to do ANYTHING and EVERYTHING.
The reality is that we will need to wait and see what happens. Will it remain available to the general public? Will it be used in commercially successful solutions? Will the costs remain as low as they are today? These are all questions that have yet to be answered. My guess is that while this is a significant event, and may be a pivotal point in the history of AI, it will be forgotten by most people in 6 to 12 months. AI keeps coming up with new approaches, new innovations and new applications. It is important for people to know and understand the impacts of something like this, without all of the fear and hype that is typically attached to events like this.
There is always something new in this field. If you’re not a fan of the latest trend in AI, just wait 6 months, something new will come up.