Accelerate ML workflows with zero code rewrites on Google Cloud

jeffnelson · February 27, 2026, 6:17pm

By Jeff Nelson (Google) and Will Hill (NVIDIA)

TL;DR: Speed up your pandas and scikit-learn machine learning workflows by up to an order of magnitude on Google Cloud’s Colab Enterprise using NVIDIA GPUs. With zero code rewrites, just a single extension-loading command.

To get hands-on, check out the learning path Accelerated Machine Learning with GPUs.

You wrote a Python script and tested it on a sample CSV. It worked. But when you ran it on the full 10GB dataset, it stopped. The progress bar crawled or the kernel crashed and returned the feared “out of memory” error.

Analyzing data and training models on large datasets can be time-consuming on CPUs. You can speed up existing workflows by 50x or more without learning new APIs or rewriting code.

Google Cloud’s Colab Enterprise and NVIDIA CUDA-X™ open-source libraries make this possible. This post shows how to use GPU acceleration for pandas and scikit-learn workflows, often with zero code changes.

Tech Stack: Colab Enterprise + NVIDIA RAPIDS

To go fast, you need powerful infrastructure and efficient software.

Colab Enterprise is Google Cloud’s managed notebook environment. It combines Colab with enterprise-grade security and compliance. It integrates with BigQuery and Vertex AI. Most importantly, it gives you access to NVIDIA GPUs (like the L4 and A100) through Runtime Templates. These templates let you define a consistent environment for a team.

NVIDIA CUDA-X Data Science is a collection of open-source libraries that accelerate popular data science libraries and platforms on NVIDIA GPUs.

NVIDIA cuDF accelerates popular data frame libraries like pandas, Polars, and Apache Spark.
NVIDIA cuML accelerates scikit-learn, UMAP, and HDBSCAN.

1. Instant data processing speedups with `cudf.pandas`

Data preparation is a common bottleneck in ML pipelines. Loading, filtering, and joining millions of rows on a CPU does not parallelize efficiently.

With cuDF pandas, you can run existing and new pandas code on the GPU. It works by intercepting pandas calls and if an operation supports GPU acceleration, it runs on the GPU. If not, it gracefully falls back to the CPU with the data frame automatically and efficiently shared between host and GPU memory.

Load the extension at the top of your notebook, before you import pandas. No other changes are required.

%load_ext cudf.pandas
import pandas as pd

Before vs. After

Standard CPU pandas:

import pandas as pd
# Takes minutes on large data
df = pd.read_parquet("large_dataset.parquet")
df = df.groupby("category").agg({"amount": "mean"})

GPU-accelerated pandas:

%load_ext cudf.pandas

import pandas as pd
# Takes seconds
df = pd.read_parquet("large_dataset.parquet")
df = df.groupby("category").agg({"amount": "mean"})

Benchmarks show this can deliver speedups of 150x or more for standard data operations compared to CPU execution.

2. Faster training with `cuml.accel` and scikit-learn

Once data is prepared, you need to train a model. Algorithms like Random Forest, Linear Regression, and t-SNE are staples of data science.

NVIDIA’s cuML library accelerates these algorithms by parallelizing training and inference execution on NVIDIA GPUs. Similar to cudf.pandas, use cuml.accel to accelerate scikit-learn functions on the NVIDIA GPU. Load the cuml extension prior to importing scikit-learn APIs:

%load_ext cuml.accel
from sklearn.ensemble import RandomForestRegressor

# This runs on the GPU automatically
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)

This acceleration lets you run experiments faster. You check results in seconds and minutes instead of hours. This rapid iteration loop allows you to run more experiments and improve your final model.

3. GPU-accelerated XGBoost

XGBoost is another high-performance, machine learning library. XGBoost has native support for NVIDIA GPUs. Setting the device parameter to cuda enables GPU acceleration.

# Train on GPU
model = xgb.XGBRegressor(
    tree_method='hist',
    device='cuda',
    n_estimators=100
)
model.fit(X_train, y_train)

Find your bottlenecks

NVIDIA cuDF and cuML also provide profilers to help debugging and isolation performance bottlenecks.

Use %%cudf.pandas.profile or %%cuml.accel.profile in a notebook cell to get a report on which operations ran on the GPU and which fell back to the CPU, and also the execution time for each function.

%%cudf.pandas.profile

# Your existing data processing code here...
df.groupby("id").apply(complex_function)

The output shows if a specific line (like a complex, non-vectorized .apply function) forces a CPU fallback. This helps you identify where to optimize your code.

Try it yourself

Speed is about staying in the flow. When code runs instantly, you can ask more questions of data and build better products.

We built a hands-on Notebook that takes you through the process using the NYC Taxi dataset. You will set up a Colab Enterprise runtime, accelerate pandas data preparation, and train models with scikit-learn and XGBoost on a Google Cloud with NVIDIA GPUs.

Get started today with the learning path, Accelerated Machine Learning with GPUs!

ABDUR_RAHMAN3 · April 20, 2026, 3:39pm

The ability to accelerate existing pandas and scikit-learn workflows without rewriting code is a major step forward for practical ML adoption. Leveraging GPU acceleration through Google Cloud Colab Enterprise together with **NVIDIA RAPIDS libraries removes one of the biggest bottlenecks in data science — scaling from prototype datasets to production-scale workloads.

The %load_ext cudf.pandas and %load_ext cuml.accel approach is particularly impactful because it lowers the barrier for engineers and analysts who already rely on pandas and scikit-learn. Faster experimentation cycles ultimately lead to better model iteration, improved feature engineering, and more reliable outcomes.

Great example of how infrastructure + open-source acceleration can reshape modern ML workflows. Looking forward to seeing broader adoption across enterprise data teams

JÉSSICA_MANQUILLO · May 9, 2026, 12:24am

Good learning!!!

Mihad_Hossain · May 13, 2026, 1:21pm

nice work

lalroshan590 · May 20, 2026, 9:32am

Awesome information

oladapodev · May 28, 2026, 1:15am

nice work thank you

diouanelink · May 28, 2026, 4:44pm

Super

Topic		Replies	Views
Fixing pandas memory errors: 3 practical solutions Community Articles googler-article , graphics-processing-units-gpus , analytics-general	10	1702	May 13, 2026
How does XGBoost fit in today? Scaling decision trees in the age of LLMs Community Articles googler-article , infrastructure-general , gke	1	347	February 27, 2026
Accelerating Reinforcement Learning on Google Cloud using NVIDIA NeMo RL Community Articles googler-article , compute-engine	1	986	September 30, 2025