© DEFAULT_CREDIT Samsung HBM2 Samsung is experimenting with the integration of PCUs in its HBM (high bandwidth memory) modules. It provides some performance indicators.
What if we integrated calculation capabilities into the HBM memory? Samsung is working on this topic. He will give an overview at the International Solid-State Circuits Conference. The approach is based on the integration, in each bank, of a PCU capable of executing operations in semi-precision (FP16).
The implementation of these calculation units does not require any hardware or software modification. However, their presence reduces the available surface. The embedded chips have a maximum capacity of 4 Gb, compared to 8 Gb for the latest generation standard HBM modules.
Samsung cut the pear in half. By combining four dies of 4 Gb and as many of 8 Gb, it obtains modules of 6 Gb .
Experiments are underway, in particular with the national laboratory in Argonne (United States). It is expected to finalize by the end of H1 2021. Meanwhile, Samsung delivers some performance indicators, based on PCU 20 nm.
At 2.4 Gbps per pin, the theoretical bandwidth exceeding 300 GB / s on the bus 1024 bits of HBM 2nd generation. In practice, with the Deep Speech 2 speech recognition model, the latency is nearly 3- folded on the Librispeech dataset compared to Samsung's HBM Aquabolt . We are also told a clear increase in the performance / watt ratio.
Samsung presents its technology under the name HBM-PIM ( processing-in-memory ). Its researchers talk about FIMDRAM ( function-in-memory ).
Main illustration © Samsung