Hands-On with the NVIDIA Jetson Xavier NX Developer Kit

Finally available in a bundle with baseboard, does NVIDIA's Volta-based edge AI acceleration machine deliver on its promises?

6 years ago • AI & Machine Learning / HW101

A little delayed, but worth the wait! The Jetson Xavier NX Developer Kit has landed. (📷: Gareth Halfacree)

Unveiled late last year, the Jetson Xavier NX is the latest entry in NVIDIA's deep learning-accelerating Jetson family. Described by the company as "the world's smallest supercomputer" and directly targeting edge AI implementations, the Developer Kit edition which bundles the core system-on-module (SOM) board with an expansion baseboard was originally due to launch in March this year — but a last-minute delay saw the device slip to May, launching today at $399. Does it deliver on its heady promise?

The Hardware

As previously announced, the Jetson Xavier NX is based on a system-on-chip packing six of NVIDIA's in-house Carmel ARMv8.2 general-purpose processing cores with a Volta-based graphics processing unit offering 384 CUDA cores and 48 Tensor cores — the latter a feature introduced to bring hardware specifically tuned to accelerating deep learning workloads to its otherwise graphics-focused GPU designs.

The SOM is bundled with a baseboard offering several IO options, including a GPIO header. (📷: Gareth Halfacree)

For further acceleration, there's also a pair of NVIDIA's Deep Learning Accelerator (NVDLA) units. These are designed to run alongside a workload on the GPU, and push the overall claimed INT8 performance to 21 TOPS. There's 8GB of LPDDR4x memory, but in the Developer Kit edition as-reviewed no on-board storage.

The SOM itself is relatively compact, bar a sizeable heatsink and fan assembly screwed to the top, and most connectivity — apart from a microSD slot for storage — is on the baseboard. This includes four USB 3.1 and one USB 2.0 Micro-B ports, a gigabit Ethernet port, one each of HDMI and DisplayPorts, two MIPI CSI-2 camera ports, and an M.2 Key E slot pre-populated with an AzureWave 2x2 802.11ac Wi-Fi and Bluetooth module.

The underside hides two M.2 slots, one pre-populated with Wi-Fi/Bluetooth module and the other free for an SSD. (📷: Gareth Halfacree)

Flipping the board over reveals another option for storage: an M.2 Key M slot for an optional NVMe SSD. Although not provided as part of the stock package, it's an upgrade definitely worthy of consideration: In testing the microSD topped out at 89.5MB/s read and 19.1MB/s write, a USB 3.0 SSD at 401MB/s read and 360MB/s write, but a Western Digital Black 250GB NVMe SSD in the M.2 slot managed to sustain 3,086MB/s read and 1,599MB/s write throughput.

CPU: 6-core up-to-1.9GHz NVIDIA Carmel ARMv8.2 with 6MB L2 and 4MB L3 cache
GPU: NVIDIA Volta with 384 CUDA Cores, 48 Tensor Cores
Accelerators: 2x NVIDIA Deep Learning Accelerators (NVDLA)
RAM: 8GB LPDDR4x
Storage: microSD (on module), M.2 Key M NVMe (on baseboard)
USB: 4x USB 3.1, 1x USB 2.0 Micro-B
Connectivity: Gigabit Ethernet, M.2 Key E 802.11ac 2x2 2.4/5GHz, Bluetooth 5.0
Display Outputs: HDMI, DisplayPort
Camera Inputs: 2x MIPI CSI-2
GPIO: 40-pin header (populated) with UART, SPI, I2C, I2S, PWM
Video Encode (H.264/H.265): 2x 4k30, 6x 1080p60, or 14x 1080p30
Video Decode (H.265): 2x 4k60, 4x 4k30, 12x 1080p60, or 32x 1080p30
Video Decode (H.264): 2x 4k30, 6x 1080p60, or 16x 1080p30

Cloud Native

The hardware's only part of the story, though. NVIDIA's pushing the Jetson Xavier NX as key to bringing what it calls "Cloud Native Computing." Available across the Jetson range, specifications allowing, NVIDIA's Cloud Native vision aims to turn machine learning developers on to containers — literally separating the applications from the operating system, allowing either to be updated at will and new applications to be quickly rolled out.

The multi-container "Cloud Native" demo is undeniably impressive. (📷: Gareth Halfacree)

To prove its point, NVIDIA provided a preconfigured demo of its Cloud Native platform in action. Executed on the Jetson Xavier NX — a process which took a couple of minutes, an initial delay the company says can be reduced with optimizations not yet ready in time for the review — the demo loads four separate containers each with its own workload: A pose-estimation network running on one video feed, a gaze-recognition network running on another, person-detection running on a further four video feeds, and a live and local copy of the BERT natural language processing system.

All four containerised applications run at full speed on the Jetson Xavier NX. (📷: Gareth Halfacree)

All four of these run simultaneously and, impressively, wholly smoothly. That's no mean feat, considering they're running on a device only marginally larger than a Raspberry Pi — smaller, if you discount the baseboard.

With that sort of compute power under the hood, you'd expect the Jetson Xavier NX to be hungry — and you'd be right, relatively speaking. The device can be configured in two power envelopes: 10W and 15W. Measured at the wall, however, its peak power draw during the demo and configured for the latter power envelope reached 24W — far lower than a desktop computer and graphics card performing the same tasks, but a high figure for an edge device.

This power is provided by a bundled international DC power adapter with barrel-jack connector, and marks a shift for NVIDIA's Jetson range: While the Jetson Nano and Jetson Xavier AGX both use barrel-jack connectors for power, they rely on 5V supplies; the Jetson Xavier NX, by contrast, uses a 19V supply — and confuse the two at your peril.

The fan is whiny, but it does its job: The board remains relatively cool even under heavy load. (📷: Gareth Halfacree)

Thankfully, the cooling system isn't purely for show: Despite sinking 24W of power, the Jetson Xavier NX stayed comfortably below 40°C (around 104°F) — at the cost of an admittedly annoying whine from the fan.

Performance

The CPU side of the Jetson Xavier NX is, for its price, somewhat underwhelming. In synthetic, single-threaded benchmarking it's only marginally faster than the Rockchip RK3399 found on devices like the Orange Pi 4B and Rock Pi N10 at a much cheaper price.

You can improve matters considerably by switching CPU modes: Picking the two-core 15W operation mode over the six-core 15W mode boosts performance in some single-threaded workloads by nearly 50 percent, but at the cost of losing four of the CPU cores.

The Jetson Xavier NX can chew through object detection tasks with ease. (📷: Gareth Halfacree/Pexels)

For real-world workloads, the story's different: With memory bandwidth measured at 32,351MB read and 32,103MB write in 1MB blocks — more than four times that of the Orange or Rock Pis — anything which does a lot of memory operations enjoys a major speed boost.

The CPU isn't the star of the show, though: It's the GPU, and the NVDLAs, that are the reasons to buy a Jetson Xavier NX. NVIDIA claims these combined offer compute performance of 21 TOPS at INT8 precision, putting it below the 32 TOPS of the company's range-topping and twice-the-price Jetson AGX Xavier Developer Kit but streets ahead of the previous-generation Jetson TX2 it's designed to replace - a tenfold performance boost for the same power envelope in deep learning inference workloads, by NVIDIA's reckoning.

The FaceNet network is no problem for the Volta GPU, and the false-positives can't be blamed on NVIDIA's hardware. (📷: Gareth Halfacree/Pexels)

The actual performance you get will, naturally, depend on what you're doing. Running through some common workloads, loaded from the after-market 250GB NVMe SSD to keep storage performance from being a bottleneck, we had the MobileNetV1 object detection network running at an incredible 864 frames per second; switching to the YoloV3-Tiny network dropped performance to 520 frames per second.

The same pose estimation network as used in the Cloud Native demo, but this time running as the sole task, sustained 237 frames per second; the demanding VGG-19 network was the slowest, but still managed a respectable 65 frames per second — more than enough for real-time use.

Even at FP16 precision, the Inception network runs fast enough for real-time use. (📷: Gareth Halfacree)

Proving this, we switched to running the Inception and DetectNet object recognition networks, at FP16 precision, on a live feed from a Raspberry Pi Camera Module v2 in one of the board's two CSI-2 ports. Both stayed comfortably above 70 frames per second.

Does It Deliver?

For newcomers to machine learning, the Jetson Xavier NX is overkill; pick up a Jetson Nano instead and enjoy the same core platform with easy progression to the bigger models in the future. For those who need the best performance that you can get without going to an x86 system with discrete GPU, meanwhile, the Jetson Xavier AGX is still the model to beat.

The Developer Kit bundle includes the module, base board, Wi-Fi module, and a 19V PSU. (📷: Gareth Halfacree)

For anyone currently working with a Jetson TX2, the Jetson Xavier NX is a worthwhile upgrade. And for those interested in the promise of NVIDIA's Cloud Native infrastructure and containerized workloads, the $399 asking price is a sound investment rewarded with some truly impressive performance figures for its power draw — and with an easy path to commercialization through volume purchases of the module-only variant, minus baseboard.

For AI on the edge and an easy path to mass deployment, the Jetson Xavier NX is tough to beat. (📷: Gareth Halfacree)

More information on the Jetson Xavier NX can be found on the NVIDIA website, where orders are open now at $399 including the module, baseboard, Wi-Fi card, and power supply.

machine learning

artificial intelligence

development board

single board computer

computer vision

Gareth Halfacree

Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.