Choosing the right container orchestrator for scale-ups

Photo by Burst on Unsplash

Choosing the right container orchestrator for scale-ups

·

5 min read

Let me start with a few questions. Do you have proper load tests for your web apps? Are you deploying every week or more to production? Do you have application-level monitoring for quickly detecting and resolving issues? Autoscaling? If you are like most businesses, you probably lack the time to implement all of this properly.

Yet when you ask an average engineer about what container orchestrator they want to use the answer is often Kubernetes, for understandable reasons. The Kubernetes ecosystem provides thousands of tools for you to work with. Patterns are widely discussed and documented. However, for the majority of the applications running on AWS, I think the Kubernetes ecosystem is not the Swiss army knife that is going to solve their problems. In fact, it can hurt your business if chosen without proper consideration. Read along to find out when ECS is a better fit, and when Kubernetes might be the best option for you.

ECS has more focus

As the name implies, Amazon ECS is a container orchestrator: it starts and stops containers when requested, monitors their health, and scales up when there is a load and down when there is not. In contrast, Kubernetes is a full container platform: you can extend it with custom control plane APIs, manage traffic flows, do deployment orchestration and even manage AWS infrastructure!

I think the beauty of ECS lies in this simplicity: ECS even has a fully managed control plane. This means you will not even have to update to a new version the way you need to with EKS. All the other stuff, like network traffic control, can be done using specialized AWS services instead.

ECS on Fargate abstracts the hard part

ECS can run on self-managed hosts (EC2) or AWS managed hosts (Fargate). Fargate is one of the main reasons I like ECS. ECS on Fargate indeed costs more per vCPU & GB of memory. But in return, both the control plane AND the nodes running your containers are someone else's problem. This includes host autoscaling, container placement strategies, optimizing host packing densities, host termination handling and more. You ONLY have to worry about your containers!

I am a big fan of Amazon ECS on Fargate for small-to-medium-sized businesses running web apps or batch jobs. Here is why:

  • You can teach a small DevOps team to maintain an ECS cluster on Fargate while keeping high development velocity on the actual application itself.
    To ask a small DevOps team to maintain their own Kubernetes cluster while working on their application is madness. You'll want a platform team to abstract that Kubernetes complexity away for you.

  • The additional infrastructural cost of Fargate is often overestimated (see diagram below). The documentation says Fargate is up to 40% more expensive than an on-demand EC2 instance. Sounds like a lot right? Except it is not really. Real cost benefits for web services lie around 15-20%*, so ±250 dollars a month for a 40vCPU cluster. If switching away from Fargate to more self-managed nodes costs your engineers just a few hours per month to operate, it is already worth the extra cost! And I have yet to see an EKS version upgrade that takes less than a few days of work and planning.

  • You can start creating value fast and optimize later! Because you have spent so little time setting up Fargate, there is little 'wasted' time if you decide on a different way of running containers in the future.

    Diagram of how cost benefits are overestimated. Most businesses at best win 10% cost benefits.

💡
*Infrastructural cost benefits of self-managed nodes (EKS/ECS) are overestimated for three reasons: Firstly, Fargate cost optimization is often ignored. Second, not all EC2s are 100% filled with containers. This means you need more vCPUs on EC2 than on Fargate. Thirdly, due to the time involved in EC2 cost optimization, often its full potential is not reached. A more detailed analysis will be discussed in a future blog post.

ECS has limitations

Of course, there are several cases where ECS is outperformed. For example

  • The placement engine is less advanced than Kubernetes, resulting in less efficient clusters when running on EC2 and slower scaling. If you need containers to spin up FAST, Fargate is less than ideal (for now...).

  • Fargate doesn't support caching container images so if you start a LOT of images (e.g. 1000 per hour), you end up paying for downloading that image each time. With the release of Seekable OCI, the problem becomes less impactful, but still, it can add up. Also, Fargate has some additional start-up time.

  • Running ECS on-prem is not trivial.

To get a sense of what limitations ECS has and if it will affect your use-case, I suggest you check out the public container roadmap. Here, you'll find common limitations and requests to fix that.

Bonus Tip: The role of a platform team with ECS

A platform team can often abstract the complexities of the cloud and container infrastructure away from the developers. For a more complex orchestrator like Kubernetes, a platform team is mandatory. With ECS on Fargate, less so. This is very useful because small to medium sizes businesses don't have large platform teams available to them. Moreover, they are busy enough already dealing with the complexity of the cloud itself, like networking and DNS, cloud compliance, security, CI/CD and much more. If you want them to maintain container clusters as well, you'll give in to other innovations. They could be spending time on improving deployment velocity, testing, monitoring or cost or setting up a data analysis environment to make better use of your companies' data!

Conclusion

If you want to run containers, and not manage a whole application platform, ECS is a great choice. You can focus on going to production fast and running at a significant scale without problems. While limited in functionality, ECS on Fargate outperforms any other container orchestrator on AWS in terms of development speed.

With Fargate you pay extra per vCPU & GB of memory. In return, its simplicity allows it to be managed by a DevOps team. This means your platform team has more time to focus on improving security, reliability and velocity rather than operations. Also, ECS on Fargate can be optimized when you need it. And not earlier than that.