Specifications of the BR100
This GPU is based on the 7nm process node featuring 77 billion transistors (Just 3 billion shy of NVIDIA’s H100). TSMC’s 2.5D CoWoS design has been used for this process node. As for the memory, this monstrosity is powered by 64GB of HBM2e having a bandwidth of around 2.3TB/s. The chip size comes out to be around 1074mm².
An Architectural Overview
As stated above, the GPU features an MCM design consisting of 2 chiplets where each chiplet is powered by 16 SPC (Streaming Processing Clusters). Every SPC consists of 16 EUs (Execution Units) and 4 EUs form a Compute Unit (CU).
Chiplets : 2SPCs : 2×16 = 32EUs = 32×16 = 512CUs = 512/4 = 128
Inside the SPC, we can find 16 EUs. A more detailed insight shows that each EU consists of 16 streaming processing cores (V-core) and a T-core or a Tensor core. The x16 streaming processing cores (Or 1 V-Core) power FP32, FP16, INT32, INT16 computations.
BR100 vs A100
In comparison to last-gen’s Ampere based A100, the BR100 is around 2.6x faster in select benchmarks. This puts to show how quick China is accelerating in the GPU department. However, sorry for being a killjoy but the Hopper based H100 is around 2-3x faster in the same benchmarks. Those Tensor cores can boost this lead to around 30x in various tests.
General Use
The GPU is meant for China’s AI department and is said to mimic human behavior with its enhance AI performance. This is so that China can rely on its own technology. Featured Image Credit : ferdibtk at Freepik