Ops Agent communication issue

Hello Google Cloud Community

I’m part of the Infra Team that is working on the Installation and configuration of the “Ops Agents” for collecting and monitoring metrics.
After the installation in some VM Servers with a
Windows OS
Windows Server 2022 Datacenter 21H2

Machine type
n2-standard-4 (4 vCPUs, 16 GB Memory)

CPU platform
Intel Cascade Lake
Architecture
x86/64

This is the error that we are getting after the installation and use the configuration of the Ops Agent.

Error:

2025-05-29T14:10:08.979-0400 error internal/queue_sender.go:57 Exporting failed. Dropping data. {“resource”: {}, “otelcol.component.id”: “googlecloud”, “otelcol.component.kind”: “exporter”, “otelcol.signal”: “metrics”, “error”: “rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"\nrpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx:443: i/o timeout"”, “dropped_items”: 1874}

go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1

C:/Users/ContainerAdministrator/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.126.0/exporterhelper/internal/queue_sender.go:57

go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*disabledBatcher[...]).Consume

C:/Users/ContainerAdministrator/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.126.0/exporterhelper/internal/queuebatch/disabled_batcher.go:22

go.opentelemetry.io/collector/exporter/exporterhelper/internal/queuebatch.(*asyncQueue[...]).Start.func1

C:/Users/ContainerAdministrator/go/pkg/mod/go.opentelemetry.io/collector/exporter@v0.126.0/exporterhelper/internal/queuebatch/async_queue.go:47

Note: I removed the IP’s and replaced for xx.xx.xx for security reasons.
And :
ConfigAgent Error main.go:88: context deadline exceeded
2025-05-02T03:02:44.5225-04:00 OSConfigAgent Error main.go:88: context deadline exceeded

After some troubleshooting session, we identified that this could be a PROXY or Networking issue.
Do you have any clue about this behavior.

Thanks all in advice.

Hello,

Can you please share if the problem is persistent or periodic and transient? In general, judging only by the shared log, the OTel exporter fails to connect to xx.xx.xx:443. Depending on xx.xx.xx it can be either a Cloud Logging API endpoint or OTel Collector or some other service. like a proxy that you mentioned. In both cases it does not look like a problem with the Ops Agent but a network configuration or connectivity problem.

Same error than OP, it’s persistent.
Everything point to a connectivity issue based on the logs but the connection can be established.

Hi @Burno , may I assume that you work together with the original reporter (@G72)? I am checking with the Ops Agent team about functions of the agent that require to access OS Config API. I am unsure whether Ops Agent uses this API. While I wait for their response, can you please confirm that the said API service is enabled on the project where your VM server is provisioned? Please run the following command to check the service status or use a method that you are familiar with:

gcloud services list --enabled --filter="NAME:osconfig.googleapis.com" --project=PROJECT_ID

Replace PROJECT_ID with the project ID of the project where your VM is provisioned. If you see the output:

Listed 0 items.

it means that this service is not enabled.

I’m not related to @G72, just the exact same error.

Yes my API is enabled on my project:

image

Hello @Burno and @G72 . Since you report different problems (although the root cause can be the same), please use the diagnostics tool and reply with the results in this forum. If you are concerned sharing the results, please let me know. I will send you a direct message and you will be able to reply me with the diagnostic results.