How Huawei’s UCM Software Boosts AI Memory Efficiency

In a move that could significantly alter the semiconductor landscape in China, Huawei has unveiled a breakthrough software innovation in high-bandwidth memory (HBM) technology.
At the 2025 Financial AI Reasoning Application Landing and Development Forum in Shanghai, it introduced the Unified Cache Manager (UCM). The cutting-edge software reportedly boosts AI inference efficiency by intelligently managing memory resources, thereby reducing dependence on high-end foreign memory chips.
UCM operates by distributing data based on the latency requirements of different memory types, including HBM, standard dynamic random access memory (DRAM) and solid-state drives (SSDs).
The hierarchical approach shifts some memory-intensive tasks away from conventional HBM usage, enhancing overall system efficiency. According to Zhou Yuefeng, vice-president and head of Huawei’s data storage product line, UCM has demonstrated an impressive reduction of AI inference latency by up to 90%, while concurrently increasing system throughput by as much as 22 times.
Implications for China’s chip industry amid US export restrictions
China’s domestic chip production has faced severe challenges as a result of US sanctions that restrict access to advanced HBM chips essential for AI workloads. The sanctions prohibit exports of cutting-edge HBM versions such as HBM2e, HBM3, HBM3e and HBM4 to Chinese entities.
This will impact third-party manufacturers if their products rely on US technologies, because it effectively curtails China’s procurement options from global suppliers like SK Hynix, Samsung Electronics and Micron Technology.
Practical deployment and industry collaboration
UCM is more than a theoretical advance; it has undergone real-world testing in critical business scenarios such as customer voice analysis, marketing planning and office assistance applications at China UnionPay, a leading Chinese financial services provider.
These trials attest to the solution’s viability in accelerating AI inference processes and lowering operational costs in large-scale deployments.
Importantly, Huawei plans to open-source UCM in September 2025. The strategic move will likely encourage adoption by a wider array of developers and industry partners, fostering a collaborative ecosystem that extends beyond Huawei’s direct influence.
The software will first be released on Huawei’s MagicEngine community before expanding to mainstream AI inference engines and storage vendors, facilitating broad-based industry innovation and collaboration.
Long-term outlook and industry significance
While UCM offers a significant efficiency boost and a partial workaround for current hardware limitations, industry experts agree it is not a failsafe replacement for access to advanced HBM hardware. It does, however, epitomise how Chinese technology firms are leveraging software ingenuity to sidestep hardware shortages driven by geopolitical tensions.
The HBM market itself continues to grow rapidly, driven by the escalating demands of AI applications. Global revenue from HBM chips is expected to nearly double in 2025, with forecasts suggesting a rise to US$98 billion by 2030.
Huawei’s software innovation could position it and the broader Chinese industry to better tap into this expanding market, despite ongoing supply restrictions.
Huawei’s Unified Cache Manager represents a strategic advance that could ease China’s reliance on imported high-end HBM chips amid tightening US sanctions.
By optimising latency and throughput across various memory types, UCM allows for significant improvements in AI inference performance using less advanced hardware.
The upcoming open-source release is poised to reinforce China’s push towards self-sufficiency in AI hardware and software ecosystems, while enabling collaborative innovation across the telecommunications and technology sectors.

