California-based Cerebras Systems has unveiled the Wafer Scale Engine (WSE-3), its latest artificial intelligence (AI) chip with a whopping four trillion transistors. It delivers twice the performance of its predecessor, the Cerebras WSE-2, which previously held the record for the fastest chip. Systems made using the WSE-3 will be able to fine-tune models with 70 billion parameters in just one day, a press release said.
AI models like GPT have taken the world by storm with their immense capabilities. However, tech companies know that AI models are still in their infancy and need further development to disrupt the market.
To do this, AI models need to be trained on larger data sets that will require even bigger infrastructure. Chip maker Nvidia has risen to heights thanks to the demand for newer, bigger, and more powerful chips. Its commercially available offering, H200, is used to train AI models and has 80 billion transistors. Still, with the WSE-3, Cerebras aims to top the performance by 57-fold.
Specifications of the CS-3
The WSE-3 uses the 5 nm architecture and is designed to deliver 900,000 cores optimized for AI data processing when used in the CS-3, the company’s AI supercomputer. The supercomputer has a 44GB on-chip SRAM. It can store 24 trillion parameters in a single logical memory space without partitioning or refractoring them. This is intended to “dramatically simplify” the training workflow and improve productivity of the developer, the press release said.
The external memory on the CS-3 can be scaled up from 1.5 terabytes to 1.2 petabytes, depending on the requirement of the AI model being trained. This is done to train models ten times bigger than GPT-4 or Gemini. The company claims that training a one trillion parameter model on the CS-3 is as simple as training a one billion parameter model on GPU chips.
Where required, the CS-3 can be built for enterprise or hyperscale needs. In a four-system configuration, the CS-3 can fine-tune AI models consisting of 70 billion daily parameters. When set up in 2048 system configuration, it could train the 70 billion parameter Llama model from scratch in a day.
Where will the WSE-3 be used?
At a time when the power consumption of GPUs doubles with every new generation, Cerebras has ensured that its latest chips deliver twice the performance without any increase in size or power consumption.
The AI-specific chip also requires 97 percent less code to train large language models (LLMs) when compared to GPUs. For instance, a standard implementation of a GPT-3 sized model was achieved with just 565 lines of code, and the press release was added.
Cerebras plans to deploy the WSE-3 at facilities of its long-time collaborators, the Argonne National Laboratory and the Mayo Clinic, to further the research capabilities at these institutions.
Along with G42, its partners in deploying the Condor Galaxy 1 (CG-1) and Condor Galaxy 2 (CG-2) AI supercomputers in California, Cerebras has announced that it is now building the Condor Galaxy-3 (CG-3), one of the largest AI supercomputers in the world. When ready, the CG-3 will consist of 64 CS-3 units and deliver eight exaFLOPS of AI computing prowess.
“Our strategic partnership with Cerebras has been instrumental in propelling innovation at G42, and will contribute to the acceleration of the AI revolution on a global scale,” added Kiril Evtimov, Group CTO of G42, in the press release.