Dataflow Flex Template: 'No JVM Shared Library File' Error When Using Custom Docker Image

Hi,

I’m working on a Dataflow batch job to fetch data from an SQL server using the JayDeBeApi package, which requires a JDBC connection. To handle this, I need to install Java and other necessary dependencies on the worker nodes. I decided to use a Dataflow Flex template, where I included all the required installation steps in a Dockerfile (shown below), and then stored the Flex template in Artifact Registry.

FROM gcr.io/dataflow-templates-base/python39-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY main.py .
COPY setup.py .

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

# Install apache-beam and other dependencies to launch the pipeline
RUN pip install apache-beam[gcp]==2.58.0

# Install required dependencies for Java setup
RUN apt-get update && apt-get install -y wget tar

# Copy JDK and JDBC jar files from GCS
COPY download_jars.sh /tmp/download_jars.sh
RUN chmod +x /tmp/download_jars.sh && /tmp/download_jars.sh

# Set JAVA_HOME and update alternatives for Java
# ENV JAVA_HOME=/usr/lib/jvm/jdk
ENV JAVA_HOME=/usr/lib/jvm/jre1.8.0_351
ENV PATH=$JAVA_HOME/bin:$PATH

RUN update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jre1.8.0_351/bin/java" 1
RUN ln -s $JAVA_HOME/lib/amd64/server/libjvm.so /usr/lib/libjvm.so

# Verify the Java installation
RUN java -version
RUN echo "export JAVA_HOME=/usr/lib/jvm/jre1.8.0_351" >> /root/.bashrc
RUN pwd

ENTRYPOINT ["opt/google/dataflow/python_template_launcher"]

I’ve tested the JDBC connection separately, and it’s working fine.

However, when running the Dataflow job using this Flex template, it seems that the Java and other dependencies aren’t accessible inside the ParDo function, getting error while making JDBC connection

What could be causing this issue? Any insights would be greatly appreciated.

@ms4446