Analysis of model parallelism applications on a 64-core RV64 Server CPU

Authors: Giulio Malenza, Adriano Marques Garcia, Robert Birke, Luca Benini and Marco Aldinucci.

Abstract: Massive Data Parallel workloads, driven by inference on large ML models, are pushing hardware vendors to develop efficient and cost-effective multi-core server CPUs. The RISC-V architecture plays a prominent role due to its open, extensible, and energy-friendly ISA. Despite significant progress in recent years, finding efficient methods to run parallel applications on new architectures to harness their maximum performance fully remains a challenge. In this study, we benchmark the inference of machine learning models on the SOPHON SG2042 SoC, the first server-grade CPU based on the RV64 ISA, composed of 64 cores arranged in a grid of 16 groups of 4 cores. Specifically, we aim to enhance performance via better cache hit ratios stemming from model parallelism to split and assign parts of the model to specific (groups of) cores using a pipeline execution. We orchestrate execution using FastFlow, a low-level programming framework designed for multithreaded streaming applications. By comparing the results against the standard multi-core inference and analyzing the effects of different submodel-to-core mapping strategies, we aim to provide a comprehensive understanding of how the model parallel approach can maximize efficiency and utilization of hardware resources. In our experiments, using model parallelism improved up to 8.4 times the performance over the native PyTorch parallelism.

It can also be downloaded from publisher's site: https://link.springer.com/article/10.1007/s10766-025-00802-6