About Me

My photo
Rohit is an investor, startup advisor and an Application Modernization Scale Specialist working at Google.

Friday, June 26, 2015

Warden and Deigo container preservation in Cloud Foundry

There are a number of use cases for Warden container preservation in Cloud Foundry -

  1. Post-mortem analysis of a compromised or hacked container
  2. Problem Determination and Troubleshooting
  3. Audit and compliance
There are two ways to achieve this before and after Diego i.e BD and AD

Before Deigo

This is a world where apps run in warden containers. Warden containers are garbage collected after a predetermined time period dea_next.crash_lifetime_secs Container container stays around on the DEA for about an hour or so by default before being removed. In addition to setting the dea_next.crash_lifetime_secs, also set dd the container_grace_time parameter to 3600 in the Warden configuration file stored in the warden.yml. container_grace parameter controls defines time before DEA deletes -containers.

After Deigo

Diego does not keep around containers after crashes. Diego kills containers when they "crash" this can be because:
  •  the application exited
  • the health check failed (today this is a port check however it will be possible to have a custom health check in the future)
  • the application exceeds its memory use

Till CF explicitly introduces support for keeping crashed containers around in Deigo  - there are some options to implement post-mortem container forensics :

  1. Modify the java buildpack release ruby script to upload the bits under /home/vcap rootfs to a s3 compatible blob store on JVM process exit. add OOM heapdump script that uploads heapdump to S3 storage
  2. Create a JVM shutdown hook that introspects and copies the bits of the app and the file-system to a Riak-S2 blob store. Runtime.getRuntime().addShutdownHook(shutdownHook); see
  3. Leverage a Tomcat Lifecycle Listener org.apache.catalina.LifecycleListener to do post-mortem activities. For a prototype look at the current Java buildpack’s ApplicationStartupFailureDetectingLifecycleListener
  4. As a part of the release phase, a postStop.sh script is executed after the actual server has stopped or crashed. This script will report the death of the instance along with instructions on downloading any relevant files from the specific instance and also provide a default grace period of 30 seconds before the warden container gets estroyed. This approach is implemented by the cloud foundry weblogic buildpack
  5. For the app in question explicitly specify a start command by adding a  ";sleep 1d" The push command would like this - cf push <app_name> -c "<original_command> ;sleep 1d". This will keep the container around for a day after process within the container has exited.  For a complete guide on troubleshooting  CF app issues take a look at 10-common-errors-when-pushing-apps-to-cloud-foundry.
  6. The easiest way to achieve container preservation without any extra work is to simply snapshot the DEA VM that contains the warden container.  The DEA VM mapping to the container in question will need to be ascertained by looking in the DEA logs for the app GUID. The mapping function can be scripted in a log analytics engine like Splunk or  ELK.
  7. For apps that are NOT running on the JVM , you can generate a  raw binary dump of the process memory by issuing a kill -6 ${PID} or kill -11 ${PID}. These dumps will need to either pulled manually or pushed from the warden container by an async task scheduled by the buildpack of the app runtime.  For a detailed discussion of various troubleshooting options with JVMs and Operating Systems refer to this cookbook from Kevin Grigorenko.
  8. If you want to download contents of a running app's file directory from the warden container  use the cf-download plugin. cf download spring-music Usage: cf download APP_NAME [PATH] [--overwrite] [--verbose] [--omit omitted_path] [-i instance]

Tuesday, June 23, 2015

Session Persistence in Cloud Foundry

Lets start from the basics:

1. By default, the Tomcat instance is configured to store all Sessions and their data in memory. Under certain circumstances it my be appropriate to persist the Sessions and their data to a repository.

2. The Cloud Foundry Go Router maintains session affinity. So users will keep hitting their session in the same container. Furthermore when the user logs out or his session expires he will be placed on a different container and a new session will start. If for some reason the container that contains the session dies then the user gets kicked out a new session begins on a different container if session persistence is not configured. The following blog post from James Bayer is a good intro to CF session persistence. 

There are a couple of issues relating to the update of the _vcap_id cookie when a  new session is created. Please ensure that your CF release has the following fixes ... 
  1. cloudfoundry/gorouter #76: Sticky Sessions lost on App Failure
  2. After app instances are recreated every request round robins to a different app instance 
3. If the session contains state of the user that cannot be rebuilt in a stateless fashion then I recommend session persistence using some kind of database like (Postgres) or a NoSQL store (like Redis).

4. The java-buildpack already has in-built support for session persistence with Redis. To enable Redis-based session replication, simply bind a Redis service containing a name, label, or tag that has session-replication as a substring. see

5. Another option you have is to use Spring Session project that implements session facade across app servers and protocol tiers like REST and WebSocket.

Sticky sessions are an optimization and not a guarantee; however you should be able to rely on CF pulling out the right session content if all the app instances are bound to a session persistence service. 

Microservices Security at the Edge

There are two approaches to Microservices security 

1. Network-centric-approach
Keep the private microservices on an externally unroutable private shared sub-domain. Private services can only be accessed among themselves and from public microservices. Inbound security is implemented by adding a HAProxy acting as a layer 7 HTTP filter  behind the public ELB. Egress security is configured with PCF Application security groups. 

2. Application-centric-approach
Leverage Spring-cloud-Zuul + spring-security-oauth2 to secure microservices reverse proxied by Zuul.  The API calls proxied by Zuul are protected using the OAuth2 protocol. Zuul proxied APIs can be protected using any security mechanism - not just Spring-Security. Spring security makes it easier to protect resources with less boilerplate.

The software based approach is explained in 
These articles explain how to how to build an API Gateway to control the authentication and access to the backend resources using Spring Cloud.  Please note that when using the application-centric approach, the service endpoints are not blocked, they are protected with the security scheme put in place with Spring. Some clever hacker could still figure out the endpoint of the internal service bypassing the API Gateway tier; however since the resources are protected they will not be able to access anything.