I am using the Mistral Large (2407) for inference. The context window according to the Vertex page for this model (here) say it has a context length of 128k. This is also what the Mistral docs confirms.
When I send a “large” request through (e.g. for 65k tokens) I get the following error:
{“object”:“Error”,“message”:“Prompt contains 65673 tokens, too large for model with 32768 maximum context length”,“type”:“invalid_request_error”,“code”:3051}
The API only seems to accept a 32k context length. Here is a minimal curl command that reproduces the issue:
curl
-X POST
-H “Authorization: Bearer $(./gcloud auth print-access-token)”
-H “Content-Type: application/json”
https://europe-west4-aiplatform.googleapis.com/v1/projects/basebox-llm-api/locations/europe-west4/publishers/mistralai/models/mistral-large@2407:streamRawPredict
–data ‘{
“model”: “mistral-large”,
“messages”: [
{“role”: “user”, “content”: ‘“$(cat large-text-file.txt | jq -Rs .)”’}
]
}’ - you’d have to supply your own large-text-file.
Anyone come across this?