We are using Prometheus to scrape runtime metrics. Some metrics, such as server_nio, have interesting labels, and I’d like to know what they mean. There is any documentation explaining these metrics better?
Some of these metrics are used by Apigee itself to scale the runtime, as described in the doc: Scale and autoscale runtime services | Apigee | Google Cloud
What I want to to understand is:
-
What do these metrics actually mean?
-
What factors affected by them? Like target performance, proxy performance, etc…
-
How can we tune and test the HPA behaviour based on these metrics?
The metrics are:
server_fault_count{source=“apigee_errors”}
server_fault_count{source=“policy_errors”}
server_fault_count{source=“target_errors”}
server_heap{state=“committed”}
server_heap{state=“init”}
server_heap{state=“max”}
server_heap{state=“p_used”}
server_heap{state=“used”}
server_nio{state=“accepted_total”}
server_nio{state=“accepted”}
server_nio{state=“AX_FAILED_COUNT”}
server_nio{state=“AX_SUCCESS_COUNT”}
server_nio{state=“close_failed”}
server_nio{state=“close_success”}
server_nio{state=“conn_pending”}
server_nio{state=“connected_total”}
server_nio{state=“connected”}
server_nio{state=“heap_committed”}
server_nio{state=“heap_init”}
server_nio{state=“heap_max”}
server_nio{state=“heap_usage”}
server_nio{state=“main_task_queue_depth”}
server_nio{state=“main_task_wait_time”}
server_nio{state=“max_conn”}
server_nio{state=“max_mc_queue_size”}
server_nio{state=“mc_queue_size”}
server_nio{state=“MINT_FAILED_COUNT”}
server_nio{state=“MINT_SUCCESS_COUNT”}
server_nio{state=“netty_task_wait_time”}
server_nio{state=“nio_task_queue_depth”}
server_nio{state=“nio_task_wait_time”}
server_nio{state=“non_heap_committed”}
server_nio{state=“non_heap_init”}
server_nio{state=“non_heap_max”}
server_nio{state=“non_heap_usage”}
server_nio{state=“pool_size”}
server_nio{state=“servers”}
server_nio{state=“timeouts”}
server_nio{state=“TRACE_FAILED_COUNT”}server_nio{state=“TRACE_SUCCESS_COUNT”}server_num_threads{}