China Has Made A Supercomputer Work Without A GPU — Here's How
Saying no to Nvidia graphics cards and focusing on self-sufficiency instead is no easy feat, but China is doing this as the AI race rages. Looking to boost its domestic technology, the country has made significant strides in many areas. While producing a lithium-metal battery with double the energy density on a three-minute charge is certainly impressive, it pales in comparison to China's latest supercomputer.
Named LineShine, the supercomputer is capable of delivering up to 1.54 exaflops with its multiple LX2 processors (built on the Armv9 architecture), each consisting of 304 CPU cores. Though most supercomputers rely heavily on GPUs for parallel computing, LineShine works without one, opting for CPUs for its coordination and general computing operations. So how does the supercomputer work without a GPU? Well, it all boils down to a clever workaround that replaces GPUs with a whole army of cores. LineShine houses 20,480 compute nodes, and each one contains two processors, bringing the core tally to 2,451,840.
The SME, or the Scalable Matrix Extension, and the Arm SVE Scalable Vector Extension, both available in each core, provide the extra push, as they support matrix and vector operations for computing and AI training. Since every core has both instruction and data 32 KB L1 caches (along with clusters sharing 28.5 MB L2 cache), it has no problem moving data to the processor smoothly. Finally, the LQLink high-speed network operating at 1.6 Tb/s per node connects all the cores, allowing LineShine to do all operations without a GPU.
How does the LineShine compare to standard supercomputers?
Though the design of the supercomputer LineShine is nothing short of a technological marvel, it's not without its flaws. Credit where it's due: the CPU-only design does eliminate any GPU memory limitations. However, as a CPU-only system, it's simply not as efficient when compared with heterogeneous CPU/GPU supercomputers. It may be effective overall, but it isn't well-equipped for high-density AI throughput.
The CPU-only supercomputer also loses some of its luster when you compare it to the likes of El Capitan. Whereas LineShine can sustain 1.54-exaflop performance, El Capitan (which remains the fastest supercomputer in the world) significantly outpaces it with its 1.809 exaflops. In isolation, LineShine's total core count of 2.45 million cores seems impressive until you stack them against El Capitan's 11.3 million cores.
In an era when Nvidia is announcing personal AI computers, the LineShine could be a necessary compromise. It may not have the power or efficiency of Western supercomputers, but it certainly makes up for it in sheer adaptability. It makes you wonder what kind of machines the Chinese National Supercomputing Center will be able to conjure up when they build up a stock of domestically made GPUs.