Jotunn 8

The Ultimate
AI Chip

Tyr4

8 cores – 1600 Tflops

Tyr2

4 cores – 800 Tflops

THE ”MEMORY WALL”

Why Generative AI is Ready – But Hardware Isn’t

The Memory Wall, first theorized in 1994, describes how CPU advancements outpace memory speed, causing delays as processors wait for data. Traditional architectures mitigate this with hierarchical memory structures, but Generative AI models like GPT-4, requiring nearly 2 trillion parameters, push these limits.

Current hardware struggles to handle such massive data loads efficiently. Running GPT-4, for example, results in just 3% efficiency, with 97% of computing time spent on data preparation. This inefficiency demands enormous hardware investments—Inflection’s supercomputer, for instance, requires 22,000 Nvidia H100 GPUs, consuming 11 MWh of power.

Jotunn introduces a new architecture that eliminates bottlenecks, ensuring data is continuously fed to processing units. This breakthrough boosts efficiency beyond 50%, making it vastly superior to current solutions.

Our AD/ADAS Offer

Tyr4

3.2 Petaflops for any AD / ADAS application. All completely CUDA-free.

Tyr2

1.6 Petaflops for any AD/ADAS application. All completely CUDA-free.

Tyr1

800 Teraflops for any AD / ADAS application. All completely CUDA-free.

Explore Tyr

Unmatched Performance at the Edge with Edge AI.

Flexibility

Fully programmable

Algorithm agnostic

Host processor agnostic

RISC-V core to offload & run AI completely on-chip

Memory

Capacity

HBM: 36GB

Throughput

HBM: 1 TB/s

Performance

Tensorcore (dense)

Tyr 4
fp8: 1600 Tflops
fp16: 400 Tflops

Tyr 2
fp8: 800 Tflops
fp16: 200 Tflops

General Purpose

Tyr 4
fp8/int8: 50 Tflops
fp16/int16: 25 Tflops
fp32/int32: 12 Tflops

Tyr 2
fp8/int8: 25 Tflops
fp16/int16: 12 Tflops
fp32/int32: 6 Tflops

Close to theory efficiency

Flexibility

Fully programmable

Algorithm agnostic

Host processor agnostic

RISC-V cores to offload host
& run AI completely on-chip.

Memory

Capacity

HBM: 288GB

Throughput

HBM: 8 TB/s

Performance

Tensorcore (dense)

fp8: 3200 Tflops
fp16: 800 Tflops

General Purpose

fp8/int8: 100 Tflops
fp16/int16: 50 Tflops
fp32/int32: 25 Tflops

Close to theory efficiency

Explore Jotunn 8

Introducing the World’s Most Efficient AI Inference Chip.