生成式AI突破数据中心:预计到 2028 年,数据中心基础设施和运营成本将增加至超过 760 亿美元
生成式AI突破数据中心:预计到 2028 年,数据中心基础设施和运营成本将增加至超过 760 亿美元
Generative AI Breaks The Data Center: Data Center Infrastructure And Operating Costs Projected To Increase To Over $76 Billion By 2028
生成式AI已经变得越来越令人着迷,人工智能视乎有无限的潜力,但是人工智能在数据处理性能和能耗方面都有很高的要求,势必会带来成本的迅速增长。文中介绍了Tirias研究公司正在进行的课程,课题预测了到2028年,生成式AI数据中心服务器基础设施和运营成本将超过760亿美元,生成式AI的成本和规模将要求在优化神经网络方面进行创新,并有可能将计算负荷从数据中心推到PC和智能手机等客户端设备上。
随着生成式AI(GenAI)的大型语言模型(LLMs)的推出,世界已经变的既令人迷恋又令人关注人工智能的潜力。进行对话、通过测试、开发研究论文或编写软件代码的能力是人工智能的巨大壮举,但它们只是生成式AI在未来几年能够完成的任务的开始。所有这些创新的能力在数据处理性能和能耗方面都有很高的成本。因此,虽然人工智能的潜力可能是无限的,但处理器性能和成本可能是最终受限因素。
With the launch of Large Language Models (LLMs) for Generative Artificial Intelligence (GenAI), the world has become both enamored and concerned with the potential for AI. The ability to hold a conversation, pass a test, develop a research paper, or write software code are tremendous feats of AI, but they are only the beginning to what GenAI will be able to accomplish over the next few years. All this innovative capability comes at a high cost in terms of processing performance and power consumption. So, while the potential for AI may be limitless, physics and costs may ultimately be the boundaries.
Tirias Research公司预测,在当前的进程中,到2028年,生成式AI数据中心服务器基础设施和运营成本将超过760亿美元,随着经济增长,将挑战紧急服务的商业模式和盈利能力,例如搜索、内容创建及生成式AI的业务自动化等。从这个角度来看,这一成本是亚马逊云服务AWS估计的年运营成本的两倍多。根据Tirias Research公司的估计,AWS目前拥有云基础设施服务市场的三分之一份额。该预测包含了硬件计算性能显著提升4倍,但这一收益已经无法覆盖增加的成本,因为即使在推理算法及其效率方面的快速创新速度下,处理工作负载却增加了50倍。设计用于大规模运行的神经网络(NNs)将更加高度优化,并将随着时间的推移继续改进,这将增加每个服务器的容量。然而,这种改进被不断增加的使用、更苛刻的用例和具有参数更多数量级的更复杂的模型来抵消。生成式AI的成本和规模将要求在优化神经网络方面进行创新,并有可能将计算负荷从数据中心推到PC和智能手机等客户端设备上。
Tirias Research forecasts that on the current course, generative AI data center server infrastructure plus operating costs will exceed $76 billion by 2028, with growth challenging the business models and profitability of emergent services such as search, content creation, and business automation incorporating GenAI. For perspective, this cost is more than twice the estimated annual operating cost of Amazon’s cloud service AWS, which today holds one third of the cloud infrastructure services market according to Tirias Research estimates.This forecast incorporates an aggressive 4X improvement in hardware compute performance, but this gain is overrun by a 50X increase in processing workloads, even with a rapid rate of innovation around inference algorithms and their efficiency. Neural Networks (NNs) designed to run at scale will be even more highly optimized and will continue to improve over time, which will increase each server’s capacity. However, this improvement is countered by increasing usage, more demanding use cases, and more sophisticated models with orders of magnitude more parameters. The cost and scale of GenAI will demand innovation in optimizing NNs and is likely to push the computational load out from data centers to client devices like PCs and smartphones.
今天,绝大多数神经网络推断是在由图形或张量处理单元(GPU或TPU)加速的服务器上执行的,这些处理单元旨在执行矩阵计算的并行数学运算。每个加速器对每个“节点”(类似于是一个神经元)施加数千个系数“参数”(其类似物是一个突触)。网络按层排列,其中每一层由数千个节点组成,每个节点与前一层和后一层中的节点有数千个连接。在大型语言模型中,这些节点最终会映射到标记,或文本语言对象和符号。然后,先前生成的标记的历史——如提示和随后生成的响应——被用来分配概率,并从最有可能的下一个令牌中选择一个。
For background, today, the vast majority of NN inferences are executed on servers accelerated by Graphics or Tensor Processing Units (GPUs or TPUs), which are designed to perform the parallel math of matrix calculations. Each accelerator applies thousands of coefficient “parameters” (whose analogue is a synapse) to each “node” (whose analogue is a neuron). Networks are arranged in layers, where each layer consists of thousands of nodes, and each node has thousands of connections to nodes in the prior and subsequent layer. In LLMs, these nodes ultimately map to tokens, or text language objects and symbols. The history of previously generated tokens – such as a prompt and the subsequent generated response – are then employed to assign probabilities and choose one from among the most likely next tokens.
下一波像GPT-4这样的大型语言模型正在大量数据集上进行训练,目标是创建估计超过1万亿个参数的神经网络。如今,一个模型必须经常跨多个加速器和多个服务器运行,才能执行一个训练有素的大型语言模型,这将迅速提高成本。即使是包含数十或数千亿参数的更小的模型,也很容易超过强大的、基于云的GPU或TPU加速器的内存容量和性能要求,这些加速器具有大量的内存来有效地运行算法。
The next wave of LLMs such as GPT-4 are being trained on massive data sets with a goal of creating neural networks estimated to exceed one trillion parameters. Today, one model must often run across multiple accelerators and multiple servers to execute a trained LLM, which will drive costs up rapidly. Even smaller models ranging in tens or hundreds of billions of parameters can easily exceed the memory capacity and performance requirements of powerful, cloud-based GPU or TPU accelerators with large amounts of memory designed to run the algorithms efficiently.
为了预测生成式AI的运营成本,Tirias Research公司在各种硬件配置上应用了复杂数据中心工作负载的预测总运营成本(FTCO)模型。FTCO模型结合了技术的进步、最终用户需求的变化,以及诸如媒体流媒体、云游戏和机器学习(ML)等工作负载的变化。就生成式AI而言,这意味着考虑到处理的进步,在可预见的未来,这将继续由GPU加速器技术驱动;训练神经网络模型的数据集和参数数量呈指数增长;模型优化的改进;以及对生成式AI永不满足的需求。
To forecast the operating cost of GenAI, Tirias Research applies a Forecast Total Cost of Operations (FTCO) model of complex data center workloads on various hardware configurations. The FTCO model incorporates advances in technology, changes in end user demand, and changes to workloads like media streaming, cloud gaming, and machine learning (ML). In the case of GenAI, this means factoring in processing advances, which for the foreseeable future will continue to be driven by GPU accelerator technology; exponential increases in the data sets and the resulting number of parameters of trained NN models; improvements to model optimizations; and the insatiable demand for GenAI.
首先,让我们来解决用户的需求问题。今天,生成式AI被用于生成文本、软件代码和图像,以及包括视频、声音和3D动画等新兴应用程序。在未来,这些基础能力将为日益复杂的生成式AI应用程序提供动力,包括生成视频娱乐、创建元版本、教学,甚至为城市、工业和商业应用程序生成流程。如今,开放式人工智能的ChatGPT的月客流量迅速接近20亿,而广受欢迎的生成式AI艺术社区“Midjourney”拥有超过1500万用户。
First let’s address user demand. Today, GenAI is being used to generate text, software code, and images along with emerging applications including video and sound, and 3D animation. In the future, these foundational capabilities will power increasingly sophisticated GenAI applications including generating video entertainment, creating metaverses, teaching, and even for generating processes for urban, industrial, and business applications. Today, OpenAI’s ChatGPT is rapidly approaching 2 billion monthly visitors, and Midjourney, the popular GenAI art community, has over 15 million users.
为了预测需求,Tirias Research研究公司分析了三种基本生成式AI的功能——文本、图像和视频——并将新兴市场细分为广告驱动的消费者、付费订阅用户和自动任务。对于文本生成式AI,类似于文字或符号的代币需求预计到2023年底将超过10万亿,有超过4亿的月活跃用户集中在发达市场。该预测估计,到2028年底,用户将超过60亿,约占智能手机市场普及率的90%,年代币将超过10亿,或增长100倍。图像生成式AI,预计将显著增加400倍超过10万亿图像,由于视频的出现,这将需要生产序列的主题和视觉连接图像使用更复杂的图像生成工具和复杂的提示循环。
To forecast the demand, Tirias Research analyzed three foundational GenAI capabilities – text, imagery, and video – and segmented the emerging markets into ad-driven consumers, paid subscription users, and automated tasking. For text GenAI, demand for tokens, analogous to words or symbols, is forecast to exceed 10 trillion by the end of 2023 with over 400 million monthly active users concentrated in developed markets. By the end of 2028, the forecast estimates over 6 billion users or about 90% of smartphone market penetration and over 1 quadrillion annual tokens or a 100X increase. For image GenAI, the increase is forecast to be significantly higher at over 400X to over 10 trillion images, driven by the emergence of video, which will require the production of sequences of thematically and visually connected images using more sophisticated image generation tools and sophisticated prompting loops.
其次,让我们解决计算工作负载。随着前所未有的学术和商业知识注入机器学习(ML)和生成式AI领域,生成式AI模型的效率正在提高。生成式AI图像和标记的质量因不同的片段、分辨率和模型大小等因素而变化,付费使用分配给更高质量的输出和相应的更高的数据中心计算资源利用率。预计的工作负载将把要求较高的大型模型与更高效、计算优化、更小的神经网络相结合。“由于更复杂的神经网络训练的更高效的神经网络的出现,将是推动生成式AI实现更可行的经济和更低的环境影响的几个力量之一。”Tirias Research公司的高级分析师、FTCO模型的开发人员西蒙·索洛特科说。大规模参数网络将被用于快速训练较小的网络,能够更经济地运行,并在包括个人电脑、智能手机、车辆和移动XR等分布式平台上运行。 HuggingFace最近展示了两个新的经过训练的类似ChatGPT的大型语言模型,300亿参数vicuna-30B和130亿参数vicuna-13B,使用Facebook的LLaMA 大型语言模型框架,使用ChatGPT用户日志训练的大型语言模型框架。这种聪明的技术产生了一个类似于ChatGPT的大型语言模型,它可以在单一的消费设备上运行,其响应与训练它的更大的模型没有什么不同。高度优化,甚至更简单、更专业的模型,有望通过减少云中的模型规模,以及将云中的工作量完全挤出来,使生成式AI应用程序能够分发到智能手机和个人电脑上,从而大规模降低数据中心成本。
Second, let’s address the computational workload. GenAI models are improving in efficiency as an unprecedented amount of academic and business knowledge pours into the field of machine learning (ML) and GenAI. The quality of GenAI imagery and tokens varies across segment and by factors such as resolution and model size, with paid usage assigned to higher quality outputs and a corresponding higher utilization of data center compute resources. Projected workloads will combine demanding large models with more efficient, computationally optimized, smaller NNs. “The emergence of more efficient neural networks, trained by more sophisticated NNs, will be one of several forces that drive generative AI to more viable economics and lower environmental impact.” said Simon Solotko, Senior Analyst at Tirias Research and developer of the FTCO model. Massive parameter networks will be employed to rapidly train smaller networks, able to run more cost effectively and on distributed platforms including PCs, smartphones, vehicles, and mobile XR. HuggingFace recently demonstrated two new trained ChatGPT-like LLMs, the 30 billion parameter vicuna-30B and the 13 billion parameter vicuna-13B, using Facebook’s LLaMA LLM framework trained employing ChatGPT user logs. This clever technique resulted in a ChatGPT-like LLM that can run on a single consumer device with responses that are not dissimilar to the larger models that trained it. Highly optimized, or even simpler and more specialized models, are expected to reduce data center costs at scale, both by reducing model sizes in the cloud, and by pushing the workload out of the cloud entirely, enabling distribution of GenAI applications to smartphones and PCs.
Tirias Research公司预测,2028年数据中心的电力消耗接近4250MW,比2023年增加212倍,服务器摊销成本加上今天的运营成本超过760亿美元。该成本不包括数据中心建筑结构的成本,但包括劳动力、电力、制冷、辅助设备和3年摊销的服务器成本。FTCO模型是基于服务器基准测试,使用10个Nvidia GPU加速器,峰值功率略超过3000W,平均利用率为50%,运行功率仅在峰值的60%以上。索洛特科先生继续说“使用数据中心创新者Krambu提供的高密度10台GPU服务器,Tirias Research公司能够对多个开源生成式AI模型进行基准测试,以推导出未来更高参数模型的计算需求。”该预测包括对未来五年的GPU和TPU加速器路线图的洞察,并使用这些路线图来计算每个服务器在每个用例中可以完成的工作负载——文本、图像和视频。也许FTCO模型最大的见解是存在一种平衡——随着工作负载变得更加复杂,服务器性能提高了约4倍,每个标志或映像的服务器吞吐量逐年保持相对稳定。
Tirias Research forecasts 2028 data center power consumption of close to 4,250 megawatts, a 212X increase over 2023, at a total server amortized capital plus operational cost of over $76B dollars in today’s dollars. This cost excludes the cost of the data center building structure but includes labor, power, cooling, ancillary hardware, and 3-year amortized server costs. The FTCO model is baselined on server benchmarks utilizing 10 Nvidia GPU accelerators having a peak power of just over 3000 watts, and operating power at 50% average utilization at just over 60% of peak. “Using high density 10 GPU servers provided by data center innovator Krambu, Tirias Research is able to benchmark multiple open-source generative AI models to derive the computational demands of future, higher-parameter models.” continued Mr. Solotko. The forecast includes insights into GPU and TPU accelerator roadmaps over the next five years and uses these roadmaps to compute the workload that could be accomplished by each server in each of the use cases – text, imagery, and video. Perhaps the FTCO model’s biggest insight is that there is an equilibrium - as workloads become more complex, and server performance improves by about 4X, the server throughput per token or image remains relatively stable year over year.
数据中心生成式AI到2028年每年的总拥有成本(TCO)
Data Center Total Cost of Ownership (TCO) for Generational AI through 2028
随着对生成式AI的需求继续呈指数级增长,摩尔定律的放缓,加工或芯片设计方面的突破似乎是长期的赌注。没有免费的午餐——消费者将要求更好的生成式AI产量,这将抵消效率和性能的提高。随着消费者使用量的增加,成本将不可避免地会增加。索洛特科总结道:“我们才刚刚开始理解机器学习的数据中心经济学。通过对整个需求、处理和成本的周期进行建模,我们可以发现最终什么将使工作量和经济向有利的方向转移。将计算转移到边缘,并将其分发给个人电脑、智能手机和XR设备等客户,是降低资本和运营成本的关键途径。”
As demand for GenAI continues exponentially, breakthroughs in processing or chip design seem like long bets with the slowing of Moore’s Law. There is no free lunch – consumers will demand better GenAI output, and that will counteract efficiency and performance gains. As consumer usage increases, costs will inevitably increase. Mr. Solotko concludes, “We are just starting to understand the data center economics of machine learning. By modeling the entire cycle of demand, processing, and cost, we can discover what ultimately will shift the workload and economics in favorable directions. Moving compute down to the edge and distributing it to clients like PC’s, smartphones and XR devices is on the critical path to lowering capital and operating costs.”
五年前,在年度热芯片半导体技术大会上,各公司开始对数据中心电力消耗发出警报,他们预测全球计算需求可能在十年内超过全球总发电量。这是在生成式AI迅速被采用之前,它有可能以更快的速度增长计算需求。技术增强本身并不能克服采用生成式AI所代表的处理挑战。它将需要更改处理的执行方式,在准确性不显著影响的情况下对模型优化进行显著改进,以及建立新的业务模型,以支付仍然需要在云中处理的成本。这些点将会出现在生成式AI打破数据中心的第二部分中:移动生成式AI到边缘。
Companies began sounding the alarm about data center power consumption five years ago at the annual Hot Chips semiconductor technology conference by predicting that worldwide compute demand could exceed the total world electricity power generation within a decade. That was prior to the rapid adoption of GenAI, which has the potential to grow compute demand at an even faster rate. Technology enhancements alone will not overcome the processing challenges represented by the adoption of GenAI. It will require changes in the way that processing is performed, significant improvements in model optimization without a significant loss of accuracy, and new business models to cover the costs of what will still be required to be processed in the cloud. These points will be covered in Part 2 of GenAI Breaks The Data Center: Moving GenAI To The Edge.
翻译:
郭陆蒙
中讯邮电咨询设计院有限公司暖通工程师
DKV(DeepKnowledge Volunteer)精英成员
校对:
江秋健
广东浩云长盛网络股份有限公司 解决方案部高级经理
DKV(DeepKnowledge Volunteer)精英成员
公众号声明:
本文并非官方认可的中文版本,仅供读者学习参考,不得用于任何商业用途,文章内容请以英文原版为准,本文不代表深知社观点。文中内容来自互联网,如有侵权,将在24小时内删除。中文版未经公众号DeepKnowledge书面授权,请勿转载。
推荐阅读: