Hi,
I have deployed a custom model (from a Docker image) in a Vertex AI endpoint.
When I try to get a prediction in Java with the following code:
private PredictResponse predict(String endpointId, String query, String project, String location) throws IOException {
try (PredictionServiceClient serviceClient = getPredictionServiceClient()) {
EndpointName endpointName = EndpointName.of(project, location, endpointId);
ListValue.Builder listValue = ListValue.newBuilder();
JsonFormat.parser().merge(query, listValue);
List<Value> instanceList = listValue.getValuesList();
PredictRequest request = PredictRequest.newBuilder()
.setEndpoint(endpointName.toString())
.addAllInstances(instanceList)
.build();
return serviceClient.predict(request);
}
}
I got a io.grpc.StatusRuntimeException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR.
I see no error when I check the logs in the Google Cloud Console. It looks like the request has been processed without error.
The strange part is that everything is working if:
- I call another Vertex model endpoint (with a simpler model inside a Docker image)
- I use a HTTP REST request (with curl) to get the prediction from the initial model (instead of Java code)
The request in CLI works so it seems that the model+endpoint is working.
The request in Java to another endpoint is working, so the code seems correct.
Any idea how to debug/fix this?
