Shifting Gears: Key considerations for the development of AI-based workloads -

Hyperscalers began offering cloud services and solutions with the advent of an era of virtual machines, then transitioned to the infrastructure cloud era, which included serverless and managed platform services. However, in recent years, artificial intelligence (AI) has become essential to meet the needs of customers for the hyperscalers. This shift has led to new data centre trends and the need for workload optimised infrastructure.

Data centre trends

With the advent of digitalisation, the data centre market has witnessed the emergence of significant new trends. Edge computing is being adopted by customers across industries. Meanwhile, AI and machine learning (ML) have gained traction as businesses are seeking suitable use cases for their innovation. Generative AI is one of the more prominent examples of this, with businesses actively exploring new applications for it.

Furthermore, data centre operators are placing a strong emphasis on sustainability. Organisations are now striving to uphold their environmental, social, and governance (ESG) goals to achieve carbon net neutrality. Additionally, data centre operators are adapting their security and compliance requirements to adhere to the business needs of their customers.

Workload-optimised infrastructure

Data centre operators should be prepared to optimise workload infrastructure, given the diverse use cases of their customers.

Customers are increasingly leveraging cloud provider-managed models for AI/ ML workloads, such as computer vision, deep learning and natural language processing. For modern workloads, operators are shifting from monolithic architectures to containerised and serverless architectures. They should also be prepared for the introduction of Web3. Data centres are already self-sustained to manage hosting of traditional enterprise workloads, such as web systems, enterprise applications, and products in data processing or even high-performance computing clusters.

Technological restraints

One of the biggest challenges associated with the era of AI transformation is the rising demand for computing power. The existing data centre design and facilities are incapable of supporting AI-based workloads. During the execution of AI-based workloads, it was observed that the demand for computing power was directly proportional to the size of the model and data on which the model is being trained.

According to Moore’s Law, the number of transistors on a chip doubles approximately every two years, while the cost of producing the chip remains constant. Although there have been cases where Moore’s Law does not seem to be valid to data centres as with the era of AI transformation, which led to an increase in sudden demands of computing power and performance, it does not result in lower costs of production.

Key concerns

It is extremely difficult to forecast the capacity for AI-based workloads. Even with the introduction of generative AI, customers are still struggling to find active use cases for this technology. This, in turn, makes it difficult for hyperscalers to predict and provide the necessary capacities in different regions.

Additionally, these high-performance computing workloads demand substantial amounts of power. Modern workloads can easily run on racks with 5-10 kW of power, but AI-based workloads require powerful graphics processing units, which can consume over 50-60 kW of power.

Consequently, to support these high computing power needs, there is a need for advanced and faster cooling methods for AI-based workloads. Hyperscalers also have to focus on maintaining their ESG goals amidst the increasing demand for power. While a majority of organisations have defined their sustainability goals, only a meagre number are actually following through with them.

To execute generative AI use cases, hyperscalers must uphold their customers’ digital sovereignty requirements, irrespective of the region. The evolution of the data centre space is also increasing the demand for IT skills. Data centre providers must establish benchmarks to support the development of AI-based workloads. Therefore, the IT team should possess the necessary skills to prepare, define and publish these benchmarks.

The way forward

In order to meet the high cooling requirements, data centre providers need to shift from AI-based cooling methods to liquid cooling methods. Since the complete shift to a liquid-based cooling system poses significant challenges, data centre providers around the world are opting for air-assisted cooling methods. Similarly, there are other use cases being explored as a full transition to a liquid cooling architecture is not feasible. Data centre operators should also embrace edge computing with local points of presence, with a focus on digital sovereignty. Additionally, there is a requirement for the establishment of benchmarks and standards for data centre planning and design. Going forward, companies must emphasise the use of cleaner energy sources in order to achieve their carbon net neutrality goals. s

Based on a presentation by Tushar Gupta, Lead Customer Engineer, Google Cloud