Playing with the new NVIDIA DGX Spark

nhira · October 14, 2025, 2:22am

Authors: Param Bole; Rohan Bierneni (JAX Models & Performance, Google Cloud)

In the last few days, we’ve had the opportunity to try out the new NVIDIA DGX™ Spark device. New hardware is always fun, but what’s special about this little device is that it actually includes the NVIDIA GB10 Grace Blackwell GPU. Imagine what you would do if you had access to a full “AI Supercomputer” at your desk! Our friends at NVIDIA have somehow managed to pack in 128 GB of unified memory, a full Linux-based desktop with all the NVIDIA AI software you’d ever want, Connect-X networking so you can use two of them as a cluster and – and – up to a petaFLOP of compute at FP4 precision. Think of all that prototyping and fine-tuning you’ve wanted to do, but couldn’t justify the cluster hours. Well, fun starts now! We know you can iterate on this powerhouse and scale out to thousands of accelerators with the right software!

Unboxing

So what does it really look like, you ask? Does it have an HDMI connection? What kind of USB? We asked a friend to put together a little unboxing video to address these and that all-important question: “what’s in the box?”

Figure 1: Unboxing video for NVIDIA DGX Spark

First boot

The NVIDIA DGX Spark uses DGX OS, so the first boot is a familiar experience if you’ve used a Linux distribution like Ubuntu. Like most OS installs, there may be updates, so you may want to find something to do. Perhaps sudo make yourself a sandwich?

After that, you have a proper Linux desktop, complete with a browser and all the usual goodies (yes you can run nvidia-smi and nvcc).

NVIDIA DGX Spark desktop with some of the usual toys1600×900 141 KB

Figure 2: NVIDIA DGX Spark desktop with some of the usual toys

But does it serve?

The documentation suggested we try out serving a simple model with ollama first, so we did. We’re not comfortable just running shell scripts from the web, so please review the installation script before you run it. As an example, you might want to comment out the systemd service setup if you plan to use this device for other things.

Ollama makes it easy to try out a model, so we started with Gemma 3. It took a while to download the 18 GB checkpoint, but ollama run <model> is all it takes. (We used the quantization aware version here, but even the bf16 version was quite snappy for our taste.)

ollama run gemma3:27b-it-qat

>>> /show info
  Model
    architecture        gemma3    
    parameters          27.4B     
    context length      131072    
    embedding length    5376      
    quantization        Q4_0      

  Capabilities
    completion    
    vision        

  Parameters
    temperature    1                  
    top_k          64                 
    top_p          0.95               
    stop           "<end_of_turn>"

>>> What is a quantization-aware model checkpoint?
A quantization-aware model checkpoint is a saved version of a neural network model that includes information *specifically* for enabling post-training quantization or 
quantization-aware training.  It's more than just the weights and biases; it contains extra data about the ranges of activations and weights observed during training, which is 
crucial for achieving better accuracy after quantization.

...
/bye

What about JAX?

If you haven’t heard of JAX before, it’s the open-source Python library that we use for AI and other high-performance array computing (read: it makes distributed computing easy). For this post, the most important aspect is probably that you can write code for a single accelerator and have it scale to thousands without having to worry about things like memory orchestration. So if you’re working on a little fine-tuning, you can write your code using this one device and – when you’re ready – use that code on a big cluster.

NVIDIA and Google have been working closely for years, so you can use JAX with NVIDIA GPUs, including the new NVIDIA DGX Spark. For this quick post, we started with a standard Ubuntu Docker container to try out a little hello-world JAX example.

docker run --rm -it --gpus all ubuntu:latest
root@397b2b17811e:/# apt update && apt install python3 python3-pip
...
# we're working as root in our container so we need to use system packages
pip install --break-system-packages "jax[cuda13]"

# let's try it out
JAX_PLATFORMS=cuda ENABLE_PJRT_COMPATIBILITY=true \
  python3 -c 'import jax; jax.print_environment_info()'
jax:    0.7.2
jaxlib: 0.7.2
numpy:  2.3.3
python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
device info: NVIDIA GB10-1, 1 local devices
process_count: 1
platform: uname_result(system='Linux', node='79e0b700b0f8', release='6.11.0-1016-nvidia', version='#16-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 21 16:52:46 UTC 2025', machine='aarch64')
JAX_PLATFORMS=cuda

...

exit

Notice that “device info” line? That’s how we know that JAX sees our accelerator correctly. A simple install and you’re ready to go!

Next steps

We have at least a dozen more things we want to try with open-source projects like NVIDIA NeMo, vLLM, MaxText, MaxDiffusion … but we wanted to get this post out to show you how much fun this new device is!

Topic		Replies	Views
Optimizing LLMs serving with the new NVIDIA TensorRT-LLM container on Vertex AI Community Articles googler-article , ai-ml , community	1	449	June 30, 2025
Tutorial: Making high performance LLM training easy on Google Cloud Platform Compute Infrastructure accelerators , googler-article , compute-engine , infrastructure-general , high-performance-computing-hpc	0	644	October 29, 2025
Fast and efficient AI inference with new NVIDIA Dynamo recipe on AI Hypercomputer Community Articles googler-article , compute-engine	1	265	September 16, 2025