AI Model Battle: From Academic Innovation to Engineering Technology Competition

2025-07-13 08:07:40

The "Battle of the Hundred Models" in the AI Field: From Academic Innovation to Engineering Technology

Last month, the AI industry erupted in an "Animal War."

On one side is Llama (the llama) launched by Meta. Due to its open-source nature, it is very popular among the developer community. After studying the Llama paper and source code, Nippon Electric Company quickly developed a Japanese version of ChatGPT, solving Japan's technical bottlenecks in the AI field.

The other party is a large model named Falcon. In May of this year, Falcon-40B was released, surpassing Alpaca to take the top spot on the "Open Source LLM Ranking."

This ranking is produced by the open-source model community and provides standards for evaluating LLM capabilities. The leaderboard is basically a competition between Llama and Falcon taking turns at the top.

After the launch of Llama 2, the llama family surpassed; but by early September, the Falcon released the 180B version, achieving an even higher ranking.

Interestingly, the developer of "Falcon" is the Technology Innovation Institute in Abu Dhabi, the capital of the UAE. Government officials stated that they are involved in this field to disrupt the core players.

On the second day after the release of version 180B, the UAE Minister of Artificial Intelligence was selected as one of the "100 Most Influential People in AI"; among those selected with him are the "Godfather of AI" Geoffrey Hinton, OpenAI's Sam Altman, and Baidu founder Robin Li.

Today, the AI field has entered a "battleground" stage: countries and companies with certain financial resources are, to varying degrees, building their own large language models. Within the Gulf countries, there is more than one player—In August, Saudi Arabia just purchased over 3,000 H100 chips for domestic universities to train LLMs.

An investor complained on social media: "Back then, I looked down on the innovation of internet business models, thinking there were no barriers: the Hundred Groups War, the Hundred Cars War, the Hundred Broadcasts War; I didn't expect that in hard technology, the large model entrepreneurship is still a Hundred Models War..."

How did hard technology, which was originally considered to be highly difficult, become a field that everyone can participate in?

Transformer Changes the Game

Startups in the United States, tech giants in China, and oil tycoons in the Middle East have all been able to dive into the field of large models thanks to that famous paper: "Attention Is All You Need."

In 2017, eight computer scientists from Google publicly introduced the Transformer algorithm in this paper. This paper is currently the third most cited paper in the history of AI, and the emergence of the Transformer has become the catalyst for this wave of AI enthusiasm.

Current various large models, including the globally sensational GPT series, are built on the foundation of Transformers.

Before this, "teaching machines to read" has been recognized as an academic challenge. Unlike image recognition, human reading not only focuses on the current words and sentences but also understands them in context. Early neural networks had inputs that were independent of each other, making it impossible to comprehend lengthy texts or even entire articles, leading to frequent translation errors.

In 2014, computer scientist Ilya Sutskever, who moved from Google to OpenAI, made a breakthrough. He used recurrent neural networks (RNN) to process natural language, allowing Google Translate's performance to quickly surpass its competitors.

RNN proposed the "recurrent design", allowing each neuron to accept both the current input and the previous input, thereby possessing the ability to "connect context". The emergence of RNN ignited research enthusiasm in the academic community, and later the author of the Transformer paper, Noam Shazeer(, also conducted in-depth research.

However, developers soon realized that RNNs have serious flaws: the algorithm uses sequential computation, which, while addressing the context problem, is not very efficient and struggles to handle a large number of parameters.

The cumbersome design of RNN quickly bored Shahzal. Therefore, starting in 2015, Shahzal and seven like-minded individuals began developing an alternative to RNN, which resulted in the creation of the Transformer.

Compared to RNN, Transformer has two major innovations:

First, it replaces recurrent designs with positional encoding to achieve parallel computing, significantly improving training efficiency, thus enabling the processing of massive amounts of data and pushing AI into the era of large models; second, it further strengthens the ability to understand context.

With the Transformer solving numerous problems all at once, it has gradually become the mainstream solution for natural language processing, giving a sense of "If Transformers were not born, NLP would be as long as the night." Even Ilya has abandoned the RNN he pioneered and turned to support Transformers.

In short, Transformers have turned large models from theoretical research into purely engineering problems.

In 2019, OpenAI developed GPT-2 based on Transformer, which amazed the academic community. In response, Google quickly launched a more powerful AI named Meena.

Compared to GPT-2, Meena does not have any innovations in the underlying algorithm; it only increases the parameter count by 8.5 times and the computing power by 14 times. The author of the Transformer paper, Ashish Vaswani, was shocked by this "brute force stacking" approach and immediately wrote a memo titled "Meena Swallows the World."

The emergence of Transformers has significantly slowed the pace of innovation in underlying algorithms within the academic community. Engineering elements such as data engineering, computing power scale, and model architecture have increasingly become key factors in the AI competition, allowing any tech company with a certain level of technical capability to develop large models.

Therefore, computer scientist Andrew Ng stated during his speech at Stanford University: "AI is a collection of tools, including supervised learning, unsupervised learning, reinforcement learning, and now generative artificial intelligence. All of these are general technologies, similar to other general technologies such as electricity and the internet."

Although OpenAI is still the benchmark for LLMs, semiconductor analysis firms believe that GPT-4's competitiveness mainly comes from engineering solutions - if open-sourced, any competitor could quickly replicate it.

Some analysts predict that it may not be long before other large tech companies are able to develop large models comparable to the performance of GPT-4.

Fragile Competitive Barriers

Currently, the "Hundred Model War" is no longer a rhetorical device, but an objective reality.

Relevant reports indicate that as of July this year, the number of large models in China has reached 130, surpassing the 114 in the United States, and various myths and legends are no longer sufficient for domestic tech companies to use for naming.

Apart from China and the United States, some relatively wealthy countries have also initially realized the "one country, one model": Japan, the UAE, the Bhashini led by the Indian government, and HyperClova X developed by the South Korean internet company Naver.

The current situation seems to have returned to the era of the internet bubble, with various capitals pouring in wildly.

As mentioned earlier, Transformers have turned large models into purely an engineering problem. As long as there are talent, funding, and computing power, large models can be produced. However, a low entry barrier does not mean that everyone can become a giant in the AI era.

The "Animal War" mentioned at the beginning of the article is a typical case: although Falcon has surpassed the llama in ranking, it is hard to say how much impact it has had on Meta.

As we all know, companies open source their research achievements not only to share technological dividends with society but also to leverage the wisdom of the crowd. As professors from various universities, research institutions, and small and medium-sized enterprises continue to use and improve Llama, Meta can apply these achievements to its own products.

For open-source large models, an active developer community is its core competitiveness.

As early as 2015, when the AI lab was established, Meta set the tone for open source; Zuckerberg, who started with social media, understands the importance of "maintaining public relations" even better.

For example, in October, Meta held a special event called "AI Creator Incentive": developers using Llama 2 to solve social issues such as education and the environment have the opportunity to receive a $500,000 grant.

Today, Meta's Llama series has become a benchmark for open-source LLMs.

As of early October, among the top 10 in a certain open-source LLM ranking, 8 are based on Llama 2 development, all utilizing its open-source protocol. On this platform alone, there are already over 1500 LLMs using the Llama 2 open-source protocol.

Of course, improving performance like Falcon is certainly possible, but there is still a significant gap between most LLMs on the market and GPT-4.

For example, recently, GPT-4 won first place in the AgentBench test with a score of 4.41. AgentBench is jointly launched by Tsinghua University and several universities in the United States to evaluate the reasoning and decision-making capabilities of LLMs in multi-dimensional open generation environments. The test content includes tasks from 8 different environments such as operating systems, databases, knowledge graphs, and card battles.

The test results show that the second place, Claude, only scored 2.77 points, a significant gap. As for those impressive open-source LLMs, their test scores are mostly around 1 point, which is less than a quarter of GPT-4.

It is important to know that GPT-4 was released in March of this year, which is the result of global competitors catching up after more than half a year. The reason for this gap is the excellent team of scientists at OpenAI and the experience accumulated from long-term research on LLMs, allowing them to maintain their lead.

In other words, the core capability of large models is not parameters, but ecological construction ) open source ( or purely reasoning ability ) closed source (.

As the open-source community becomes increasingly active, the performance of various LLMs may converge, as everyone is using similar model architectures and datasets.

Another more intuitive question is: besides Midjourney, it seems that no large model has been able to really turn a profit.

Anchors of Value

In August this year, an article titled "OpenAI may go bankrupt by the end of 2024" attracted attention. The main point of the article can almost be summarized in one sentence: OpenAI is burning through cash too quickly.

The article mentions that since the development of ChatGPT, OpenAI's losses have rapidly expanded, with a loss of approximately $540 million in 2022 alone, and can only wait for investors to foot the bill.

Although the article title is sensational, it also reveals the current situation of many large model providers: a severe imbalance between costs and revenues.

The high costs currently mean that only Nvidia is making big money from artificial intelligence, and possibly Broadcom as well.

According to estimates from consulting firms, Nvidia sold more than 300,000 H100s in the second quarter of this year. This is an AI chip that has a very high efficiency for training AI, and it is being snapped up by tech companies and research institutions around the world. If these 300,000 H100s were stacked together, they would weigh as much as 4.5 Boeing 747 airplanes.

NVIDIA's performance soared, with a year-on-year revenue growth of 854%, shocking Wall Street. It is worth mentioning that the current price of H100 in the second-hand market has been driven up to $40,000 to $50,000, while its material cost is only about $3,000.

The high cost of computing power has become an obstacle to industry development to some extent. Some capital institutions have estimated that global tech companies are expected to spend $200 billion annually on large model infrastructure; in contrast, large models can generate a maximum of $75 billion in revenue each year, leaving a gap of at least $125 billion.

In addition, with a few exceptions like Midjourney, most software companies have not yet figured out their profit models after investing huge costs. This is especially true for the two industry leaders, Microsoft and Adobe, whose steps have been somewhat unsteady.

The AI code generation tool GitHub Copilot, developed by Microsoft in collaboration with OpenAI, charges $10 per month, but due to facility costs, Microsoft actually incurs a loss of $20 per month. Heavy users can even cause Microsoft to lose $80 per month. Based on this, it is speculated that the $30 priced Microsoft 365 Copilot could incur even greater losses.

Similarly, Adobe, which has just launched the Firefly AI tool, quickly introduced a points system to prevent users from overusing and causing losses to the company. Once users exceed their monthly allocated points, Adobe will reduce the service speed.

It is important to know that Microsoft and Adobe are already software giants with clear business scenarios and a large number of paying users. Meanwhile, the main application scenario for most large models with numerous parameters is still chatting.

It is undeniable that without the emergence of OpenAI and ChatGPT, this AI revolution might not have happened at all; however, the value created by training large models is still debatable.

Moreover, as homogenization competition intensifies and the number of open-source models increases, the development space for pure large model suppliers may become more limited.

The success of the iPhone 4 did not come from the 45nm process A4 processor, but rather because it could play games like Plants vs. Zombies and Angry Birds.

GPT0.2%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

16 Likes