mismatch between training and prediction prebuilt containers scikit-learn versions

I am currently using the following prebuilt container images in my pipeline:

CONTAINER_IMAGE=europe-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-0:latest
SERVING_IMAGE=europe-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest

I have only built decision tree and random forest using sklearn pipeline. However, when deploying the built model to an endpoint I got version mismatch with some errors like:

Trying to unpickle estimator OneHotEncoder from version 1.0.2 when using version 1.0

I was able to confirm this locally by overriding the entry point commands in the prebuilt training image using:

docker run -it --entrypoint /bin/bash europe-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-0:latest

And checking the scikit learn version using:
python -c “import sklearn; print(sklearn.version)”

Which confirms the version as 1.0.2

I will check previous versions prior to the latest but is this an intended change as it is leading to a breaking code especially when I used with the prebuilt prediction container which has scikit learn 1.0. Also, shouldnt this be reflected in the tag? So it is something like europe-docker.pkg.dev/vertex-ai/training/sklearn-cpu.1-0-2:latest

The main purpose of providing these containers, in my understanding, is mainly because they are supposed to have the similar version and results in less engineering set up.

1 Like