Jotunn 8

The Ultimate
AI Chip

Where efficiency meets innovation

The magic number
0 /tflops
This is Jotunn 8

Introducing the World’s Most Efficient AI Inference Chip

In modern data centers, success means deploying trained models with blistering speed, minimal cost, and effortless scalability. Designing and operating inference systems requires balancing key factors such as high throughput, low latency, optimized power consumption, and sustainable infrastructure. Achieving optimal performance while maintaining cost and energy efficiency is critical to meeting the growing demand for large-scale, real-time AI services across a variety of applications.

Unlock the full potential of your AI investments with our high-performance inference solutions. Engineered for speed, efficiency, and scalability, our platform ensures your AI models deliver maximum impact—at lower operational costs and with a commitment to sustainability. Whether you’re scaling up deployments or optimizing existing infrastructure, we provide the technology and expertise to help you stay competitive and drive business growth.

This is not just faster inference. It’s a new foundation for AI at scale.

Ultra-low Latency

Critical for real-time applications like chatbots, fraud detection, and search.

Very High Throughput

Essential for high-demand services like recommendation engines or LLM APIs.

Cost Efficient

AI inference is often run at massive scale—reducing cost per inference is essential for business viability.

Power Efficient

Performance per watt. Power is a major operational expense and carbon footprint driver.
This is Jotunn 8

Let's Have a Look

This is Jotunn 8

AI – Demystified and Delivered

In the world of AI data centers, speed, efficiency, and scale aren’t optional—they’re everything. Jotunn8, our ultra-high-performance inference chip is built to deploy trained models with lightning-fast throughput, minimal cost, and maximum scalability. Designed around what matters most—performance, cost-efficiency, and sustainability—they deliver the power to run AI at scale, without compromise!

Llama3 405B

Jotunn 8 Outperforms the Market

Why it matters: Critical for real-time applications like chatbots, fraud detection, and search.

Different Models, Different Purposes – Same Hardware

Reasoning models, Generative AI and Agentic AI are increasingly being combined to build more capable and reliable systems. Generative AI provide flexibility and language fluency. Reasoning models provide rigor and correctness. Agentic frameworks provide autonomy and decision-making. The VSORA architecture enables smooth and easy integration of these algorithms, providing near-theory performance.

Type
Key Role
Strengths
Weaknesses
Reasoning Models
Logical inference and problem-solving
Accuracy, consistency
Limited generalization, slow
LLMs / Generative AI
Natural language generation and understanding
Versatile, broad, creative
Can hallucinate, lacks deep reasoning
Agentic AI
Goal-directed, autonomous action
Agentic AIIndependence, planning, coordination
Still experimental, hard to align and control
Cost efficient

More Speed For the Bucks

Why it matters: AI inference is often run at massive scale – reducing cost per inference is essential for business viability.

Flexibility

Fully programmable

Algorithm agnostic

Host processor agnostic

RISC-V cores to offload host
& run AI completely on-chip.

Memory

Capacity

HBM: 288GB

Throughput

HBM: 8 TB/s

Performance

Tensor core (dense)

FP16: 800 Tflops
FP8: 3200 Tflops

General Purpose

FP32: 25 Tflops
FP16: 50 Tflops
FP8: 100 Tflops
Close to theory efficiency

Explore Jotunn 8

Introducing the World’s Most Efficient AI Inference Chip.