The Memory Wall, first theorized in 1994, describes how CPU advancements outpace memory speed, causing delays as processors wait for data. Traditional architectures mitigate this with hierarchical memory structures, but Generative AI models like GPT-4, requiring nearly 2 trillion parameters, push these limits.
Current hardware struggles to handle such massive data loads efficiently. Running GPT-4, for example, results in just 3% efficiency, with 97% of computing time spent on data preparation. This inefficiency demands enormous hardware investments—Inflection’s supercomputer, for instance, requires 22,000 Nvidia H100 GPUs, consuming 11 MWh of power.
Jotunn introduces a new architecture that eliminates bottlenecks, ensuring data is continuously fed to processing units. This breakthrough boosts efficiency beyond 50%, making it vastly superior to current solutions.
3.2 Petaflops for any AD / ADAS application. All completely CUDA-free.
1.6 Petaflops for any AD/ADAS application. All completely CUDA-free.
800 Teraflops for any AD / ADAS application. All completely CUDA-free.
HQ
13 rue Jeanne Braconnier
Immeuble Le Pasteur
92360 Meudon-La-Forêt
France
Asia
Taipei
Taiwan
Japan
Tokyo
Japan
Korea
Seoul
Korea
USA
San Diego, CA
USA
Unmatched Performance at the Edge with Edge AI.
Fully programmable
Algorithm agnostic
Host processor agnostic
RISC-V core to offload & run AI completely on-chip
Tyr 4
fp8: 1600 Tflops
fp16: 400 Tflops
Tyr 2
fp8: 800 Tflops
fp16: 200 Tflops
Tyr 4
fp8/int8: 50 Tflops
fp16/int16: 25 Tflops
fp32/int32: 12 Tflops
Tyr 2
fp8/int8: 25 Tflops
fp16/int16: 12 Tflops
fp32/int32: 6 Tflops
Close to theory efficiency
Fully programmable
Algorithm agnostic
Host processor agnostic
RISC-V cores to offload host
& run AI completely on-chip.
fp8: 3200 Tflops
fp16: 800 Tflops
fp8/int8: 100 Tflops
fp16/int16: 50 Tflops
fp32/int32: 25 Tflops
Close to theory efficiency