How to play with self-developed chips?

4 minute read 589 Views 10 Comment 1426 Share

Microsoft has been in the spotlight in the recent wave of artificial intelligence, from investing heavily in the acquisition of OpenAI to integrating ChatGPT into the Bing search engine, it has been at the forefront of the entire field's development. Just a few days ago, news emerged that Microsoft is collaborating with AMD to develop its own AI chips. The story has been full of twists and turns, and here we outline the general trajectory of Microsoft's self-developed AI chips.

Firstly, about half a month ago, media reports stated that Microsoft was developing its own chips for large language models (LLMs, currently the most cutting-edge AI technology and also the model technology behind ChatGPT), with the internal codename Athena. Then, on May 2nd, during the analyst call following AMD's release of its financial report for the first quarter of 2023, an analyst asked AMD about its views on internet cloud computing companies developing their own chips and whether there were plans to cooperate with related companies to develop semi-custom chips. AMD CEO Lisa Su stated that AMD currently has a very comprehensive IP library in the fields of CPUs, GPUs, FPGAs, and DPUs, and also has a strong semi-custom chip team, so the company plans to further invest in this field to cooperate with major customers. Two days later, Bloomberg reported that AMD is collaborating with Microsoft on AI chips, with Microsoft providing R&D support in AI for AMD, and AMD developing the Athena chip for Microsoft. After the report was released, AMD's stock price once rose by 6%. Following the Bloomberg report, a Microsoft spokesperson stated that AMD is an important partner for Microsoft, but the Athena chip is not currently being developed by AMD. However, Microsoft did not deny the reports of cooperation with AMD in the field of AI.

We believe that summarizing the existing reports, on the one hand, AMD's semi-custom chip field will be one of the key investment directions for companies in the future AI field, as large customers of AI applications (mainly internet technology giants) have a great interest in this field. On the other hand, although the Athena chip may not be directly developed by AMD, the possibility of Microsoft cooperating with AMD in the field of AI hardware is very high. At present, the most likely situation is that Microsoft is cooperating with AMD to develop a complete set of hardware solutions for accelerating AI large language models, which includes Microsoft's self-developed Athena chip and also includes AMD's CPUs and other chips. During the development process of the Athena chip, Microsoft is very likely to consider adding interfaces and optimizations related to AMD's chipset (and may even use some of AMD's IP), and AMD may also consider adding some semi-custom elements defined by Microsoft in the design of the cooperative hardware solution (such as data interfaces, storage bandwidth, optimizations for Microsoft's AI framework, etc.).

Finally, in terms of chip system integration, it would be a natural result if Microsoft uses AMD's highly experienced advanced packaging technology to integrate Athena and AMD's chips together, and in terms of upper-level software integration, it is expected that Microsoft and AMD will cooperate deeply to ensure that the entire AI system can run efficiently in the system.

Seeing the development here, it is hard not to sigh at the changes of time: 30 years ago, it was the deep cooperation between Microsoft and Intel in the Wintel alliance that ignited the rapid development of the entire PC market, and both Microsoft and Intel achieved rapid growth in the process, while AMD was a dispensable role in the market at that time, and there were even sayings that Intel kept AMD mainly to avoid triggering antitrust laws and being split up; but today, AMD's market value has surpassed Intel, and Microsoft has chosen to cooperate with AMD in the hottest field of AI. On the other hand, we believe that Microsoft and AMD's deep cooperation in the field of hardware and chips has also opened a new chapter in the history of self-developed chips by technology giants, that is, from emphasizing self-made chips to emphasizing cooperation with traditional chip companies - note that this cooperation is not just cooperation in the aspects of OEM or design services, but deep cooperation in the fields of design indicators, IP, software and hardware interfaces, etc.

The history of internet companies developing their own chips

Let's review the history of internet companies developing their own chips. The history of internet companies developing their own chips is almost synchronized with the AI boom that started in 2016. The rise of AI has had a decisive impact on the business of the internet. In the cloud, AI technology has greatly improved the core businesses of internet companies such as recommendation systems and advertising systems, and on the terminal, AI has also empowered many important computer vision and voice technologies. The companies that develop their own chips for AI-related businesses include almost all technology giants, including Google, Microsoft, Amazon, Alibaba, ByteDance, Baidu, etc. Looking at the starting point of developing their own chips, in the past, internet technology companies mainly considered two aspects when developing their own chips, namely cost and functionality.

From the perspective of cost, because AI computing requires a very large amount of computing power, the cost is also very high. From the supply chain perspective, Nvidia is the most mainstream cloud AI chip supplier, and the price of its GPUs is high on the one hand, and on the other hand, for technology giants, over-reliance on a single supplier also has supply chain risk costs (especially for Chinese internet giants, the risk of relying on Nvidia is even higher due to geopolitical influences and has a lot of uncertainty). Another aspect is that the power efficiency ratio of GPUs when running AI applications is not perfect. In fact, in cloud data center applications, a large part of the electricity cost is for paying for AI applications. Therefore, the main purpose of internet technology giants developing their own chips in the field of cloud AI chips is to reduce dependence on Nvidia on the one hand, and on the other hand, to achieve a better power efficiency ratio than Nvidia, so that when deployed on a large scale, from a comprehensive cost perspective, it can be lower than the cost of directly purchasing Nvidia's GPUs. In this regard, Google's TPU is a famous example. After several iterations, we see that the performance of Google TPU is usually similar to that of Nvidia's GPUs, but in terms of power efficiency ratio and other cost-affecting aspects, it can achieve better than Nvidia.

How do software simulation, hardware simulation, and prototype verification work

This chip hides the future of AI!

Grab the back power supply, the new trump card of chip manufacturing.