About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Friday, November 1, 2019

Spring RestTemplate Buyer Beware!

TL;DR Be vary of the default RestTemplate injected or manually configured in your existing application. You should leverage HTTP Connection pooling for the RestTemplate which may not be turned  on by default. You  can explicitly configure it with the code sample I provided above. Also the Pool defaults are undersized. change those to a number appropriate to your env.  I set them to max 20 per route. tune per load  also configure connection pool stale connection reaping. Instead of the RestTemplate as the Spring docs advise as of Spring Framework 5.0.

TL;DR based on the multiple enterprise engagements … 

  • The default HTTP client connx. pools must be changed before deploy
  • Don’t forget to set ConnectionRequestTimeout  (defaults to infinity)
  • If possible replace RestTemplate/HttpClient with WebClient, else migrate to okHttpClient which has resolved most observed issues;
  • okHttpClient  has excellent connection pool manager, connection failure and timeouts handling mechanisms..

For instance
    //Wont configure the PoolingHttpClientConnectionManager
    public RestTemplate restTemplate() {
        return new RestTemplate();

    // WILL configure the PoolingHttpClientConnectionManager
    public RestTemplate restTemplate(RestTemplateBuilder builder) {
        return builder.build();

For this to work you need to put HTTPClient or the okHttpclient library on the Classpath

Spring apps leverage the org.springframework.web.client.RestTemplate as a synchronous client to perform HTTP requests. The default configuration of the RestTemplate doesn’t use a connection pool to send requests, it uses a SimpleClientHttpRequestFactory that wraps a standard JDK’s HttpURLConnection opening and closing the connection. This is a problem.  BasicHttpClientConnectionManager can be used for a Low Level, Single Threaded Connection

Under load Spring RestTemplate client connections are capped at 4 per route. This *blows up under load*. If you see `HTTP status 500` for requests or slow responses please check the HTTPClient configuration and visit the recommendations

## Recommendations
- If you need to have a connection pooling under rest template then you should use different implementation of the ClientHttpRequestFactory that pools the connections. new RestTemplate(new HttpComponentsClientHttpRequestFactory())

- Use the `PoolingHttpClientConnectionManager` to Get and Manage a Pool of Multithreaded Connections. The defaults of the pooling connection manager too small. You should bump UP the MaxTotal, DefaultMaxPerRoute & MaxPerRoute to 20.

- Maximize the utilization of the HTTP Conn Pool
  -  Implement a Custom Keep Alive Strategy
  -  Configure connection evictions to detect idle and expired connections and close them
 -  Read this article https://www.baeldung.com/httpclient-connection-management for connection management.

HttpClientConnectionManager poolingConnManager
  = new PoolingHttpClientConnectionManager();
CloseableHttpClient client
 = HttpClients.custom().setConnectionManager(poolingConnManager)

also see https://bitbucket.org/asimio/resttemplate-troubleshooting-svc-2/src/master/src/main/java/com/asimio/api/demo/main/ResttemplateTroubleshootingSvc2Application.java

 As of 5.0, the non-blocking, reactive org.springframework.web.reactive.client.WebClient offers a modern alternative to the RestTemplate with efficient support for both sync and async, as well as streaming scenarios. Always use the *Builder to either create a (or more) RestTemplate or WebClient. Dependencies like spring-cloud-sleuth use the customizer/builder resp.  to add additional features

For greenfield apps pick WebClient over RestTemplate. see

 **The RestTemplate will be deprecated in a future version and will not have major new features added going forward. See the WebClient section of the Spring Framework reference documentation for more details and example code**

## Miscellaneous

  1. For slow requests or for goRouter latency follow https://docs.pivotal.io/pivotalcf/2-5/adminguide/troubleshooting_slow_requests.html and Debugging the Cloud Foundry Routing Tier https://www.youtube.com/watch?v=U5GWgabsxXY
  2. If you encounter a customer that is experiencing an application performance issue (increased latency or decreased throughput or slow requests), try having them run this plugin against the app while it’s under load: https://github.com/cloudfoundry/cpu-entitlement-plugin.
  3. If your Application running on TAS is slow, performing poorly, experiencing high latency and/or decreased throughput then follow debug instructions here  https://community.pivotal.io/s/article/Application-running-on-TAS-is-slow-performing-poorly-experiencing-high-latency-and-or-decreased-throughput

Tuesday, October 29, 2019

Architecture & Services Review Template for 360 degree healthcheck of a Microservice

Do you want to review the health of your system of microservices ? Need a checklist of things to look at as you evaluate the architecture and implementation. Take a look at this all encompassing checklist of things to examine the production readiness and scale of your system of microservices. 

  • Libraries
    • How many unused libraries are there?
    • Are there any libraries that could be replaced by features included with Spring?
  • Connection Pooling
    • How is concurrency handled ?
  • Latency
    • How long does the app take to start up?
    • Is there a meaningful difference in data transmission speed with a high load when using rsockets vs. https?
    • Is there a meaningful difference in data transmission speed when using a reactive tech stack vs. a traditional tech stack?
    • Are there any noticeable areas with inefficient HTTP calls?
    • What is the average response time for the app's network calls?
  • Memory/CPU
    • How much memory does the app use under a high load?. Does it need JVM GC tuning ?
    • How many threads does the app use under a high load?
    • What is the top constraint ? (CPU. Mem, Disk, Network,)
  • Error/Exception Handling
    • How many exceptions does the app usually throw under a high load?
    • What is the mean time between failures?
    • How long does an outage usually last?
  • Code Complexity/Cleanliness
    • What is the highest level of cyclomatic complexity within the app?
    • How many unused classes are in the app?
    • How many unused methods are in the app?
    • Compliance with 15 Factors ?
    • High frequency of code change heat map
    • Sev 1 Production Incidents Review
  • Spring
    • Is there Classpath dependency bloat ?
    • Upgrade to s-boot 2.2 and concomitant dependencies possible ?
  • Resiliency
    • Are circuit breakers and HTTPClients configured correctly
    • Are metrics from Circuit Breakers put in the firehose via micrometer
    • Failure Mode analysis.
  • Observability
    • Are applications logging at the right level
    • Are applications emitting metrics at the right level
    • Is spring-cloud-sleuth enabled for distributed traces ?
    • Configure http healthchecks for the app in Cloud Foundry
  • Performance
    • Is application startup time acceptable. Can this be reduced.
    • Is autoscaling behavior understood in context of downstream dependencies.
    • Policy for autoscaling up and down
  • Higher level Architecture Review

Sunday, October 27, 2019

How do you get Threaddumps and Heapdumps for Java applications running in Cloud Foundry ??

You Cannot.!!  You have hit a classical pain point due to the Java Buildpack using a JRE and not a full JDK .

So the issue is that you cannot cf ssh into the container in PCF and use the jcmd command to trigger a java threaddump. The classical way of resolving high CPU is to take three such threaddumps 30 seconds apart and check to see the threads that are stuck, ones that are not moving or contending on locks or deadlocks etc. You pair this with CPU Profiling information in the VM

NOT able to take a threaddump in PCF is frustrating. WAS/Weblogic had excellent support for getting these artifacts via must-gathers.

So what can you do ? 
You cannot invoke the /threaddump actuator endpoint because that does not provide nearly as much info as a classical threaddump will provide. 

Again this is a problem that anyone who wants to use the JDK tools in an app in PCF faces. Like for instance we want to run the javac command inside the app in PCF. We simply can't due to the above mentioned issue. 

OK So what can be done ... 
A one time custom java buildpack is created rebased on an Open full JDK and not a JRE. This is not sustainable in the long term.  You will need to restage the app with this custom Full JDK Java buildpack. 
- The JDK tooling (jcmd, jmap and other command line tools) have to be trojan horsed into the app via a side-car container or something like a pcfshell https://github.com/tfynes-pivotal/pcfshell or the app has to carry the executable with it. 
- Another option is that app itself carries a /threaddump endpoint via a spring boot actuator although if the app is dying due to OOM or high CPU this seldom works
- If the app is crashing due to an OOM it writes out a histogram and a cause of failure. In such a case enable verbose GC logging to stdout so that you can collect and visualize the GC logs and 2. you can configure a persistent volume bind for the Java buildpack to write the core file to a persistent volume oom-killer  jre-docs
Existence of a single bound Volume Service will result in Terminal heap dumps being written.
- Use flame graphs in PCF to debug high CPU. This requires some investigation.