About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Sunday, August 4, 2019

Failures in Microservices

As microservices evolve into a tangled mess of synchronous and asynchronous flows with multi-level fanouts it becomes important to think about failure and resiliency since that is pretty much a guaranteed outcome when the availability of the whole system is a multiplicative of all its downstream microservices and dependencies.

How does one systematically think about handling load, graceful degradation and load shedding in the face of impaired operation and sustained high load ?  Google's SRE books contain excellent high level advice as it pertains to handling load and addressing cascading failures. I have prepared a actionable summary of a couple of chapters dealing with resiliency to win in the face of failure. Follow the notes here to create

Rigor and governance around Microservices frameworks and templates to enable systematic resiliency through circuit breakers and autoscaling for sustainable scale out of your System of Systems.

Different types of resources can be exhausted

Insufficient CPU > all requests become slower > various secondary effects

  1. Increased number of inflight requests
  2. Excessively long queue lengths
    - steady state rate of incoming requests > rate at which the server can process requests
  3. Thread starvation
  4. CPU or request starvation
  5. Missed RPC deadlines
  6. Reduced CPU caching benefits

Memory Exhaustion - as more in-flight requests consume more RAM, response, and RPC objects

  1. Dying containers due to OOM Killers
  2. A vicious cycle - (Increased rate of GC in Java, resulting in increased CPU usage)
  3. Reduction in app level cache hit rates

Threads (Tomcat HTTP )

  1. Thread starvation can directly cause errors or lead to health check failures.
  2. If the server adds threads as needed, thread overhead can use too much RAM.
  3. In extreme cases, thread starvation can also cause you to run out of process IDs.

File descriptors

  • Running out of file descriptors can lead to the inability to initialize network connections, which in turn can cause health checks to fail.

Dependencies among resources  

  •   Resource exhaustion scenarios feed from one another
  •   DB Connections (Negative Indicator)
All this can ultimately lead to Service Unavailability > Resource exhaustion can lead to servers crashing leading to snowball effect.

How To Prevent Server Overload

  1.  Load test the server’s capacity limits,
  2. Serve degraded results
  3.  Instrument servers to reject requests when overloaded - fail early and cheaply
  4. Instrument higher-level systems to reject requests at reverse proxies, by limiting the volume of requests by criteria such as IP address, At the load balancers, by dropping requests when the service enters global overload and at At individual tasks
  5. Perform capacity planning

Load Shedding

Detect When Load Shedding/ Graceful Degradation Should Kick In

  • Look @ CPU usage, latency, queue length and number of threads used
  • Decide whether your service enters degraded mode automatically or if manual intervention is necessary)?
  • Graceful degradation shouldn’t trigger very often
  • Monitor and alert when too many servers enter these modes
  • Design a way to quickly turn off complex graceful degradation when you run into emergent behavior


  • Per-task throttling based on CPU, memory, or queue length >    - Limit queue length. For a system with fairly steady traffic over time, it is better to have *small queue lengths* relative to the thread pool size, which results in the server rejecting requests early when it can’t sustain the rate of incoming requests.
  • Dynamically adjusting the number of in-flight task updates based on the volume of requests and available capacity
  •  Return 503 service unavailable to any incoming request when there are more than a given number of client requests in flight
  •  Change the queuing method from FIFO to LIFO or using the CoDel algorithm can reduce load by removing requests that are unlikely to be worth processing

Graceful degradation

  • Decrease the amount of work or time needed by decreasing the quality of responses
  • What actions should be taken when the server is in degraded mode?
  • Implement these strategies at every layer in the stack, or sufficient to have a high-level choke-point

Circuit Breaker Retry Advice

Retries can amplify the effects seen in Server Overload
  1.  Limit to 3 retries per request. Don’t retry a given request indefinitely.
  2.  Impose a server-wide retry budget when retry budget is exceeded, don’t retry; just fail the request
  3.  Examine if you need to perform retries at a given level. Prevent retry fanout
  4.  Separate retriable and nonretriable error conditions. Don’t retry permanent errors or malformed requests
  5. Retry exponential backoff with jitter.
  6. All Retry behavior should be configurable. We can turn this off.

Implement Deadline Propagation

  • Pick a deadline
  • Server/appinstance should check the deadline left at each stage before attempting to perform any more work on the request
  • Each server in the request tree implements deadline propagation
  • Reduce the outgoing deadline by a few hundred milliseconds to account for network transit times  
  • Set an upper bound for outgoing deadlines
  • Deadlines several orders of magnitude longer than the mean request latency is usually bad

Multi-modal latency requests

For multivariate workloads allow only 25% of your threads to be occupied by any one client in order to provide fairness in the face of heavy load by any single client misbehaving.

Address Cascading failures

  • Process health checking is relevant to the cluster scheduler, whereas service health checking is relevant to the load balancer
  • Increase Resources, Restart Servers, Drop Traffic, Enter Degraded Modes, Eliminate Batch Load, Eliminate Bad Traffic and Autoscale

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.