Issues with Virtual Try-On stability: face distortion and incorrect sleeve rendering. Prompt for Try-on in Nano Banana

Hi everyone,

I’m building a production-grade virtual try-on bot where the goal is very strict:

  • The original photo of the person must remain unchanged (especially face, body proportions, skin, hands)

  • Only the clothing should change

I’ve tested Google Virtual Try-On via Vertex AI, and while results are sometimes visually acceptable, I’m facing critical stability issues that make it unusable for a real product:

  1. The photo quality deteriorates significantly, despite reducing the parameter - 0.

  2. Face and facial features are often altered
    Even with high-quality input photos, the model sometimes reshapes the face (eyes, nose, symmetry). This is a hard blocker — users immediately notice it.

  3. Incorrect handling of sleeves / arms
    If the original model photo has bare arms and the garment has long sleeves, the output often:

    • Keeps the arms bare

    • Or partially “cuts” the sleeves
      Instead of extending the garment correctly over the arms.

  4. Overall inconsistency between runs
    With similar inputs, results vary a lot. This makes it impossible to guarantee predictable output quality.

  5. In my experience with NanoBanana and NanoBanana Pro, I haven’t been able to achieve any real stability so far.
    The outputs are highly inconsistent: in many cases the model simply returns the original model image or the original garment image without applying any changes at all, and in other cases the garment is applied only partially or unpredictably.
    Because of this, I haven’t yet found a way to configure NanoBanana / NanoBanana Pro for reliable, repeatable virtual try-on results.

Does anyone here have real-world experience building a stable virtual try-on pipeline using Google Virtual Try-On without degrading the original image quality (especially face and body preservation)?

Additionally, has anyone managed to achieve highly consistent results with NanoBanana or NanoBanana Pro?
If so, are there any prompts, configurations, or processing strategies that significantly improve stability and prevent cases where the model either returns the original image unchanged or applies the garment inconsistently?

The issues you’re seeing are common with current virtual try-on models. They often alter faces or misplace sleeves because they aren’t fully deterministic. The most reliable way to improve results is to carefully preprocess images, provide clear separation between body and clothing, and use post-processing to preserve the original face and proportions. True consistency without intervention is still limited with NanoBanana or similar models.

Hi, I’d love to be able to provide some feedback! In order to do so, would you be able to share a few examples where you’re seeing these consistency issues with the Virtual Try-On model? Can you please also include an example of the parameter you’re referring to so that I can help take a look?

Hello Katie, thanks for your answer

I fixed the quality loss issue using upscaling. Facial distortion is minimal when the source photo is sufficiently high resolution, so that problem is essentially solved as well.

However, the issue with sleeves remains. I tested several examples where the original human photo had bare arms while the clothing items had long sleeves.

The Try-On model produced heavy distortions: in some cases it simply removed the sleeves altogether, and in others it generated artifacts such as a smooth, unnatural transition between the sleeve and the bare arm.

Thanks for providing these examples! Just to confirm are you using the latest “virtual-try-on-001” model? I tested some of your examples using this notebook and saw improvements in maintaining clothing items with long sleeves. If that doesn’t work feel free to let me know.

Hey Katie,
I’m working on a try-on app as well. I tried your example with virtual-try-on-001, but I got a 429 RESOURCE_EXHAUSTED error:
{'error': {'code': 429, 'message': 'Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: virtual-try-on-001. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.', 'status': 'RESOURCE_EXHAUSTED'}}

I followed the link to request a quota increase, but I couldn’t find “virtual-try-on-001.” I only found virtual-try-on-exp and virtual-try-on-preview.

Is there any way I can access this model?

Thank you

Hi Katie,
I have been using the virtual try on API and I wanted to know if Google pushed an update recently, because I have noticed quite a bit of degradation when it comes to body proportions. Specifically, pretty much any body type I input, results in a standard model body type (thin, tall, etc). I wasn’t having this issue before. Is there a new parameter or something to help with this to give more ‘honest’ outputs?

Hi, would you be able to try running the notebook again and let me know if that works for you? We had some changes roll out recently that should resolve this issue.

Hello, in order to better assist you would it be possible to provide some examples of the model and product images you’re using?

Thank you Kaite – the model is working now. I’ve been testing with different outfits and have a question about the behavior.

In the attached examples:

  • Business outfit: VTO changed both the clothing and the shoes (Image 1)

  • Summer outfit: VTO only changed the clothing, shoes stayed the same (Image 2)

Is this expected? Does VTO detect what’s in the garment image and apply everything visible, or is there a way to control which body region it modifies?

In my experience, it’s best to start with an image of a model that’s wearing the clothing item you’re attempting to replace. If you include multiple clothing items in a single image the model will attempt to replace all items. If you’d like to only replace a specific clothing item, you can use the following parameter in the RecontextImageConfig with the Gen AI SDK for Python:

 http_options=HttpOptions(extra_body={'parameters': {'productsToReplace' : ['shoes']}})

Thank you for the explanation. I now have a better understanding of how the model works. We tested this using both the Python Gen AI SDK with HttpOptions(extra_body=...) and the REST API directly.

However, we couldn’t find productsToReplace documented anywhere — not in the Virtual Try-On API reference, the VirtualTryOnModelParams, or the Generate Virtual Try-On Images guide. Could you share any documentation on this parameter and its accepted values?

Thanks!

You can set the parameter to one or more items worn by the individual in the person_image that you’d like to replace with a new item. The parameter is an array of string values like [“t-shirt”, “shoes”]. Hope this helps!

Hey Jikki_Jim
In my experience automating unstable GenAI flows, you cannot always “prompt” your way out of random failures (hallucinations, face distortion). You often need an architectural fix rather than a configuration fix.

Instead of tweaking parameters endlessly, have you considered wrapping your VTO call in a Self-Healing Loop with a separate validator?

The Logic:

  1. Generate: Call Vertex AI/NanoBanana.

  2. Validate (The Guardrail): Use a deterministic library (like dlib or a FaceNet embedding) to compare the Original Face vs. Output Face.

  3. Decision: If similarity_score < 0.95 → Discard and Retry immediately (with a slightly different seed or noise).

I treat GenAI outputs as “untrusted” by default. I detailed this “Self-Healing” approach for data extraction in my latest post (available via my profile), but the architectural principle is exactly the same for image pipelines: **Automate the quality control, don’t just automate the generation.

See u !**

Hi everyone,

I am currently using the newly GA virtual-try-on-001 model for an e-commerce website. While the generative quality is great, I’m hitting a few production bottlenecks and would love to hear your empirical insights:

1. API vs AI Studio Latency: I’m seeing ~16-18s latency via the REST API in europe-west9, but the Vertex AI Studio UI feels noticeably faster. Does routing requests to us-central1 actually reduce inference time (better GPU availability?), or is this purely API Gateway/Cold Start overhead?

2. The Resolution/Compression “Sweet Spot”: I managed to drop my API latency to 14.4s by compressing input images to 150-300kb, but the fabric details degraded too much. What is your proven “sweet spot” (e.g., specific max width, JPEG quality) that balances fast inference with high-quality output?

3. Controlling Hallucinations (productsToReplace): The model sometimes alters non-targeted clothes (e.g., turning black pants into denim shorts when trying on a denim jacket). I’ve read about the productsToReplace parameter to constrain the generation. Could someone share a working JSON payload example of how to format this in a REST call?

4. 500s and 429s Stability: Even with Exponential Backoff, we still hit occasional 500 (Internal) and 429 (Resource Exhausted) errors. Is the 50 RPM soft limit a strict reality we just have to build queues around, or are you seeing better stability since the GA release?

5. The baseSteps Trade-off: Does increasing baseSteps from 32 to 50 actually help prevent structural hallucinations, or does it only refine micro-textures at the cost of 8-10 extra seconds?

Beyond these specific points, if you’ve discovered any other undocumented tricks, workarounds, or general best practices while running this model in production, I’d love to hear them!

Thanks in advance for any tips or payload examples you can share! Also, if I can help you, do not hesitate!

Hello, Katie
I would like to raise what I believe is currently the main issue with your technology.

The model very often places garments only within the areas that are already covered by clothing in the source image of the person. In other words, it seems to rely primarily on the originally covered regions as the placement area for the new garment.

This creates a significant limitation in practical use. For example, if I want to try on a long-sleeve item but the input photo shows the person wearing a sleeveless top, the result often truncates the sleeves instead of preserving their full length. Similarly, if the input image shows a person in a swimsuit and the target outfit is something larger (e.g., a sweater or a full outfit with pants), the model tends to fill only the swimsuit area, effectively recoloring it instead of generating the full garment. This is a very noticeable and critical error.

This becomes a critical issue in real product environments, as it is not realistic to expect users to upload photos that strictly meet specific requirements. In practice, users will always upload whatever photos are most convenient for them. As a result, these limitations directly degrade the user experience and can significantly undermine trust in the technology.

Could you clarify whether there are any ways to address this issue without introducing additional models? Also, are there any ongoing efforts to improve or resolve this limitation, and is it something that can realistically be fixed?

From my perspective, this seems like a relatively minor limitation for a company like Google to solve, and addressing it could significantly improve the overall performance and usability of the technology.

If necessary, I can send photo examples, but you can find it above.

Thanks

Hi Katie,

Quick question on virtual-try-on-001 (Vertex AI).

The google-genai v2.3.0 SDK exposes a prompt field on RecontextImageSource, but the docstrings disagree:

  • Method (recontext_image): “prompt is behind an allowlist”
  • Field (RecontextImageSource.prompt): “Not supported for Virtual Try-On”

When I send a prompt I get 400 INVALID_ARGUMENT — “Invalid field: prompt”.

What’s the correct way to pass a natural-language refinement to virtual-try-on-001? I’d like to support short stylist edits like:

  • “Add green earrings.”
  • “The pants should fall down to the floor.”
  • “Make the sweater oversized.”
  • “Add a leather crossbody bag.”

Is it prompt on the instance, productImageConfig.productDescription per product, or something else entirely?

Thanks!