HealthMonitor not removing target server when 500 returned

<HTTPTargetConnection>
        <Properties>
            <Property name="success.codes">1xx,2xx,3xx,400,403,404,429</Property>
        </Properties>
        <LoadBalancer>
            <Algorithm>RoundRobin</Algorithm>
            <Server name="node-1"/>
            <Server name="node-2"/>
            <MaxFailures>5</MaxFailures>
            <RetryEnabled>false</RetryEnabled>
        </LoadBalancer>
        <HealthMonitor>
            <IsEnabled>true</IsEnabled>
            <IntervalInSec>5</IntervalInSec>
            <HTTPMonitor>
                <Request>
                    <ConnectTimeoutInSec>5</ConnectTimeoutInSec>
                    <SocketReadTimeoutInSec>5</SocketReadTimeoutInSec>
                    <Port>9898</Port>
                    <Verb>GET</Verb>
                    <Path>/v1/health/readiness</Path>
                    <IncludeHealthCheckIdHeader>true</IncludeHealthCheckIdHeader>
                </Request>
                <SuccessResponse>
                    <ResponseCode>200</ResponseCode>
                </SuccessResponse>
            </HTTPMonitor>
        </HealthMonitor>
        <Path>v1</Path>
    </HTTPTargetConnection>

I deliberately break node-2 to return a 500 for /v1/health/readiness
The response returned

Content-Type text/html; charset=utf-8
Content-Length 4806
Status 500

However APIGEE MPs keep sending traffic to it.
What am I missing?

Is it possible you did not wait long enough? The configuration you have tells Apigee to remove a targetserver from rotation after 5 unhealthy responses. And the check, according to your configuration, happens every 5 seconds. So you’d need to wait up to 25+5 seconds in order to be certain that the health monitor in Apigee has received the 5x unhealthy responses.

If you want a quicker reaction time, you can modify your configuration so that the MaxFailures is 1, and the IntervalInSec is 1. That would tell Apigee to respond more quickly to a negative healthcheck response.

<HTTPTargetConnection>
    <Properties>
      <!-- for entering fault state, not considered in health checks -->
      <Property name="success.codes">1xx,2xx,3xx,400,403,404,429</Property>
    </Properties>

    <LoadBalancer>
      <Algorithm>RoundRobin</Algorithm>
      <Server name="target1" />
      <Server name="target2" />
      <!-- Set this to 1 to react after one negative healthcheck response -->
      <MaxFailures>1</MaxFailures>
      <RetryEnabled>false</RetryEnabled>
      <!-- for implicit health monitoring -->
      <ServerUnhealthyResponse>
        <ResponseCode>503</ResponseCode>
        < ! - - ... - - >
      </ServerUnhealthyResponse>
    </LoadBalancer>

    <!-- for explicit health monitoring -->
    <HealthMonitor>
      <IsEnabled>true</IsEnabled>
      <IntervalInSec>1</IntervalInSec>
      <HTTPMonitor>
        <Request>
          <ConnectTimeoutInSec>5</ConnectTimeoutInSec>
          <SocketReadTimeoutInSec>5</SocketReadTimeoutInSec>
          <Verb>GET</Verb>
          <Path>/status</Path>
          <IncludeHealthCheckIdHeader>true</IncludeHealthCheckIdHeader>
        </Request>
        <SuccessResponse>
          <ResponseCode>200</ResponseCode>
        </SuccessResponse>
      </HTTPMonitor>
    </HealthMonitor>

  </HTTPTargetConnection>