Quadric introduced Chimera™, the first family of general-purpose neural processors (GPNPUs), a semiconductor intellectual property (IP) offering that blends the machine learning (ML) performance characteristics of a neural processing accelerator with the full C++ programmability of a modern digital signal processor (DSP).
Chimera GPNPUs provide one unified architecture for ML inference plus pre-and-post processing, greatly simplifying both system-on-chip (SoC) hardware design by the semiconductor developer today and subsequent software programming months and years later by application developers.
“Machine learning is infiltrating nearly all applications everywhere DSPs are traditionally used today for vision, audio, sound, communications, sensors, and so much more,” stated Veerbhan Kheterpal, co-founder and CEO of Quadric.
“Existing silicon solutions to the ML inference challenge have added accelerators as helper offload cores to existing DSPs or CPUs. The limitation of that approach is the clumsy way the programmer has to partition her code across the different cores in the system and then tune the interaction between those cores to get desired performance goals. The new Chimera GPNPU family creates a unified, single-core architecture for both ML inference and related conventional C++ processing of images, video, radar or other signals, eliminating multicore challenges.”
A significant advantage of this new architecture is that neural network graphs and C++ code are merged into a single software code stream. Only one tool chain is required for scalar, vector, and matrix computations. Memory bandwidth is optimized by a single unified compilation stack that helps result in significant power minimization.
“Silicon companies fear committing to multi-million-dollar silicon tapeouts reliant upon ML accelerator cores only to learn months later that data scientists working on fast-changing machine learning models have introduced new operators that cannot run on the existing fixed-function accelerator,” said Bryon Moyer, Sr. analyst at TechInsights. “A programmable architecture, if efficient enough, allows introduction of new operators through software rather than through a hardware change, effectively future-proofing the silicon.”
Scalable Performance: 512 MAC to 8K MAC
The QB series of the Chimera family of GPNPUs includes three cores:
- Chimera QB1 – 1 trillion operations per second (TOPS) machine learning, 64 giga operations per second (GOPs) DSP capability
- Chimera QB4 – 4 TOPS machine learning, 256 GOPs DSP
- Chimera QB16 – 16 TOPS machine learning, 1 TOPS DSP
Chimera cores can be targeted to any silicon foundry and any process technology. The entire family of QB Series GPNPUs can achieve 1 GHz operation in mainstream 16nm or 7nm processes using conventional standard cell flows and commonly available single-ported SRAM. For applications requiring even greater levels of performance two or more Chimera cores can be paired together.
High Convolution Efficiency + Custom Operator Support
The Chimera GPNPU architecture excels at convolution layers, the heart of convolutional neural networks (CNNs). Thus Quadric Chimera cores deliver ML inference performance similar to the efficiency of dedicated CNN offload engines but with full programmability. Unlike conventional accelerators that can only run a handful of predetermined ML operators, Chimera GPNPUs can run any ML operator. Custom operators can be added by the SW developer simply by writing a C++ kernel utilizing the Chimera Compute Library (CCL) application programming interface (API) then compiling that kernel using the Chimera Software Developers Toolkit (SDK).
Operators added with the Chimera SDK flow can be highly performant, utilizing the full performance of the Chimera GPNPU. Competing architecture solutions rely upon conventional CPUs or DSPs as the “fallback” programable solution for newly emergent ML operators, but those CPUs or DSP are 10x to 1000x lower performing than the accelerators they are paired with.
“Automobile market analysts have coined the term Range Anxiety to describe consumer wariness about purchasing a battery-powered automobile and getting stranded too far from scarce charging stations, or being stuck without a high-voltage, fast charging option. In the semiconductor world the term Operator Anxiety has come into vogue to describe the very real angst silicon companies fear in responding to evolving ML workloads,” noted Steve Roddy, Quadric’s chief marketing officer.
“Just as the EV car owner wants to use an 800V fast charger and avoid the slow overnight charging speed of a standard wall socket, a fully programmable Chimera GPNPU solves the Operator Anxiety problem with high-speed custom operator support, not slow CPU support, for new ML operators.”
Single Core – Not Multi-Core – Saves Power and Area
The benefits to the software developer of having a single architecture to program and fine-tune are obvious. Far more software developers are comfortable programming a single core than they are dealing with heterogenous multicore systems.
But a significant secondary benefit of combining C++ signal processing together with ML graph processing on a single Chimera GPNPU is the area and power savings of not requiring activation data (images, signals) to be shuffled back and forth between two or three (CPU, DSP, accelerator) processing engines. For legacy systems with three cores – and three associated memory subsystems used to buffer data transfers between cores – the area and related power savings realized by switching to a Chimera GPNPU solution can be substantial.
Proven in Silicon, Ready for Evaluation
The Chimera architecture has already been proven at-speed in silicon. Quadric is ready for immediate customer engagement by chip design teams looking to start an IP evaluation this fall or winter.