By Ryan Ye Min Thein and Joshua Broyde
Ryan Ye Min Thein is a Cloud Sales Engineer at Google Cloud specializing in the architecture of AI/ML solutions within the Life Sciences and emerging technology sectors. He is an innovation-driven specialist in agentic systems and automated intelligence, dedicated to bridging the gap between advanced computer science and high-impact industry applications.
Joshua Broyde is a Customer Engineer at Google Cloud specializing in AI/ML for Healthcare and Life Sciences. He works with HCLS and MedTech companies to architect, build, and bring enterprise AI and Generative AI systems to production.
The Problem: Infinite Space, Finite Resources
The biotechnology industry is navigating a fundamental phase shift. Every domain in modern biotech faces the same reality: the design space is effectively infinite ($10^{60}$ molecules), but resources are finite. This forces a painful “Fidelity vs. Scale” trade-off:
- The Scale Trap: Fast tools (like docking or simple sequence alignment) handle millions of candidates but are often too noisy to trust.
- The Fidelity Trap: Accurate physics-based simulations are ‘Golden Standard, but too slow or too expensive to run at scale.
This problem is true not just for small molecule discovery, but is true for discovering novel enzymes or optimizing antibody therapeutics.
What is one to do?
For decades, the industry response has been the ‘Funnel’ model, a linear process of screening and discarding candidates. But funnels are inefficient, they are expensive and static; discarding negative results rather than learning from them does not improve early stages of the cycle. To break this cycle, pharma companies are moving to an active learning paradigm. However, this transition is difficult. In this blog, we will show what the key components of an active learning loop are, and how they can be deployed using Google Cloud Platform (GCP) resources.
The Solution: Active Learning
The core concept of active learning relies on a teach-student architecture. Specifically:
The Student (The proxy): A fast, lightweight ML model (e.g: XGBoost, GNN or a Surrogate model) that acts as a digital scout. It can score millions of candidates in a minute but starts with low accuracy.
The Teacher (Ground Truth): The high-fidelity validator (wet lab or physics simulation) that tests only the most critical designs, feeding results back to train the Student.
The Strategist (Pareto Navigator): An optimization engine that selects a small, high-value subset of candidates for the Teacher to validate. Crucially, it doesn’t just pick the “best” candidates, it balances Exploitation (likely hits) with Exploration (high uncertainty) candidates to map the boundaries of the design space.
By feeding both the Teacher’s success and failure results back to retrain the Student, the system gains fidelity by essentially “hallucinating less” and "learning more” with every cycle.
The diagram below summarizes the process:
The “BioOps” Infrastructure
Implementing this loop requires more than just algorithms; it requires an infrastructure capable of “BioOps” which operationalizes software in a manner akin to DevOps. This requires solving three engineering hurdles:Scale, Orchestration, and State. Here is how the Google Cloud architecture addresses them:
1. Scale: Breaking the Fidelity Barrier (Google Cloud Batch)
The primary bottleneck in Active Learning is latency. If the “Teacher” takes weeks to return the results, the loop breaks. To solve this, we use Google Cloud Batch to enable “Cloud bursting”. Instead of queuing jobs on a fixed on-prem cluster, the cloud batch sits idle (costing zero) and then dynamically bursts to thousands of cores of GPUs solely for the duration of the validation job. This massive parallelism compresses weeks of validation work into hours, turning ‘validation’ into ‘screening’.
2. Orchestration: Vertex AI pipelines and Vizier
We use Vertex AI pipelines to orchestrate the complex dependency graph between the Generative AI, the Simulation Engine and the Training Loop. The ‘brain’ of the operation is Vertex AI Vizier. Unlike simple grid searches, Vizier uses Bayesian Optimization to navigate the Pareto Frontier, simultaneously solving for conflicting objectives, like maximizing enzymatic activity while maintaining stability, without manual weighting.
3. State: Learning from Failure (BigQuery)
In many workflows, “negative results” are discarded.
We centralize the entire chemical knowledge graph in BigQuery. It acts as the system’s “Long-Term Memory.” Every generated molecule, docking score, and failed simulation is stored here, allowing the Strategist to query the entire history (not just the hits) to make informed decisions. This turns the ecosystems from a “Sparse Reward” environment (finding the needle) into “Dense Reward” where every experiment sharpens the model.
The active learning loop, as instantiated with GCP services, is shown below:
BioOps in Action — Automated Drug Discovery at Scale
To validate this, we introduce a reference implementation targeting a classic hard problem in Small Molecule Drug Discovery: Multi-Parameter Optimization (MPO). The goal: find drug candidates that balance potency, safety, and ability to be synthesized without human intervention.
The architecture of this is shown below:
Here are the steps done in this approach:
Generation: This workflow utilized a Generative Model (utilizing Pocket2Mol architecture) hallucinates a library of 100,000+ candidates.
Filtering (The Rough Sort): We apply rapid scoring using RDKit (for physical properties), Gnina (for 3D docking), and TxGemma (for ADMET scoring) to filter out “nonsense” molecules immediately.
Simulation (The Ground Truth): The Vertex AI Strategist selects 1000 diverse candidates for rigorous Free Energy Perturbation (FEP/GROMACS) analysis.
Proxy model: These ground truth results flow into BigQuery which will be used to train a proxy model (e.g: XGboost) which then score the remaining 99,000 molecules.
Closing the Loop: Crucially, we took the entire library result (1000 high-fidelity FEP results and 99,000 proxy scores) and use it to fine-tune the generator (Pocket2mol). This solved the ‘Rewards sparsity” problem. Instead of only learning from 1% of winners, the generative model received a “Dense reward” signal across the entire chemical space, preventing “Mode Collapse” and allowing it to propose chemically diverse, high-potency candidates in subsequent rounds.
The Results
95% Faster Time-to-Insight: By bursting to thousands of GPUs, we completed a simulation workload that typically takes weeks in under 36 hours, proving that shift from “finding” to “designing” is now the operational reality. Here is a table below that breaks this down:
While above reference implementation focuses on small molecules, this architecture is model-agnostic and applicable across the life sciences.
Universal Pattern
Active Learning is an essential next frontier in the biopharma space. By drastically speeding up complex multi-steps workloads by using scalable proxy models to explore and high-fidelity simulations to validate, we transform weeks of compute into hours—and turn every experiment, successful or not, into fuel for the next iteration.
In the future, we envision a BioOps pipeline where the “Strategist” doesn’t just select molecules using fixed rules, but actively writes code to invent new search strategies in real-time. By moving from the “Funnel” to an “Active Learning Loop”, we stop relying on serendipity to find the needle in a haystack. We start building the engine to design the perfect needle.
The code we used for this work can be found here.
(We thank David Henderson for peer reviewing this piece prior to submission.)




