Hello World with Containers on AWS

Jul 19, 2023 -- Posted by : dwarren

This is a walk-through of setting up a containerized application on AWS, written for people that come from a more conventional systems administration background - particularly one centered around Windows servers. Like anything in IT there are a lot of ways to accomplish the same thing, and I can't say this is the best way, but I've seen reddit posts wanting a working configuration - so here's a working configuration.

Introduction

Before we get to the how, I'd like to explain why building something out in AWS is different than building it on-premises:

It's Not About Servers

When you first get access to AWS, many people go into the console and look for how to create servers. That's how the on-premises world works - if the company needs to install BlivetPro™ you provision a BlivetPro™ server. It's absolutely possible to use AWS that way, but hand-crafting individual servers doesn't let AWS do what it does best - scaling things up and down.

Making Servers Repeatable

For AWS to be able to scale things up and down, it has to be able to create a new server at a moment's notice.

The "old school" method is to create a golden image, which is the same idea as a VMware OVA or an exported Hyper-V machine. When AWS needs a new server it just launches one from that image. Since containers solve the same problem as golden images with additional benefits, we decided to jump directly into containers - but from the infrastructure point of view they largely accomplish the same task.

Making Everything Repeatable

AWS doesn't just let you you build servers by template - you can create anything in AWS by template. Think of it like having an instruction guide on how to set up an application, except you can just upload the instruction guide to AWS and it does all the work. The idea is called Infrastructure as Code, but that doesn't mean that you have to learn how to program. It's just a file with a list of the things you would otherwise make by hand.

The Old Way Still Works

There are still cases where you have to install a BlivetPro™ server, because that's the only way you can install it. Microsoft wants you to use AADDS because it's a managed service, but you can still buy an Azure virtual server and promote it to an AD controller - and sometimes that's the right play. In the same way, AWS hasn't taken anything away - they're just offering more options.

Back to Hello World

In this article we go through a template that describes a basic web application that runs one container. The key points are:

  •   A container running the application code
  •   A load balancer in front to dole out requests
  •   Auto-scaling to launch more containers if needed
  •   Two Availability Zones for redundancy

The format I'm using for this article is called CloudFormation, which is built in to AWS and can be accessed from the console. There are other standards like Terraform and CDK which are also perfectly valid and perhaps superior, but I think CloudFormation is the easiest to explain.

The rest of this article will be going through the CloudFormation template and explaining what each piece is for. You can accomplish every step directly through the AWS Console, and everything created by a template is visible in the AWS Console. You can also open CloudFormation in the AWS Console and watch the event log as the template creates resources.

You can download the full template here.

Parameter Section

CloudFormation templates take a set of input parameters so you can use the same template for different things. Think of them like arguments you put on a PowerShell command.  These parameters are based on our conventions - there is nothing special about the names.

These sections are just defining the names of the variables - the actual values are passed in when you deploy the template. Each section can also define the default value for the parameter, which is used if the parameter isn't passed in.

Application Code

At Teaglu we give every application a code for identification. This gets encoded into resource names so when you see resources in the console you can immediately know what application they are related to - otherwise you'll see a server named I-21514346 and have no clue what it's doing. This isn't a part of AWS guidance as far as I know - it's just something we do to keep things neat.

  ApplicationCode:
    Type: String
    Description: Code for application used in object names

DNS Stuff

The application has to be available on the internet under some DNS name. For this template we're assume the zone containing that DNS name is kept in the AWS service Route53. You don't have to keep your DNS hosted in AWS, but it makes things a lot easier because AWS can handle certificates for you. Renewing SSL certificates isn't anyone's favorite administrative task.

  ApplicationDnsZone:
    Type: String
    Description: DNS zone ID to place application record
  ApplicationDnsName:
    Type: String
    Description: Full DNS name to use for application record

Networking Stuff

The load balancer and containers running the application have to exist in a network to talk to users and each other. In the recommended AWS setup, there are public subnets and private subnets - public subnets are available to the internet while private subnets are not. This template also uses two Availability Zones, so that if one AZ has a failure the application will switch over to the other one.

This template assumes the application is being created in a pre-existing network environment, so these variables allow that environment to be passed in. It also assumes you have set up NAT Gateway to allow servers in the private subnets to reach the internet.

These values are just the ID numbers of the networking components, so that AWS knows what networks to connect things to:

  VPC:
    Type: AWS::EC2::VPC::Id
    Description: VPC for deployment
  PublicSubnetA:
    Type: AWS::EC2::Subnet::Id
    Description: Public subnet in Availability Zone 1
  PublicSubnetB:
    Type: AWS::EC2::Subnet::Id
    Description: Public subnet in Availability Zone 2
  PrivateSubnetA:
    Type: AWS::EC2::Subnet::Id
    Description: Private subnet in Availability Zone 1
  PrivateSubnetB:
    Type: AWS::EC2::Subnet::Id
    Description: Private subnet in Availability Zone 2

Container Stuff

This section specifies some basic parameters of the container running the application. This parameter specifies the repository where the container image we want to run is stored. That can be in AWS's own registry service ECR, in docker hub, or in some other registry provided by a vendor.

  ApplicationUrl:
    Type: String
    Description: Container registry and path for application

The registry login specifies how to log into the registry to pull the image, and is used if the image doesn't reside in AWS ECR. To use this, create a secret in AWS Secret Manager containing the username and password keys as described here, then use the ARN of the secret as the the parameter value. If the container image was stored in Amazon's own registry service ECR, then instead of using this paramater you would allow ECR access using AWS roles.

  RegistryLogin:
    Type: String
    Description: Contains secret manager reference for registry

The load balancer and the ECS controller will request this URL every few minutes to determine if the container is healthy. Containers aren't placed into service until they are marked healthy, and if they change to unhealthy they will be taken out of service and replaced.

You'll need to look at the application documentation to find an endpoint that can be checked. It doesn't have to be meant for status checks, it just has to be something that will return something other than a 200 code if the container isn't ready for traffic. Sometimes you can just use the login page.

  ApplicationHealthUrl:
    Type: String
    Default: /status
    Description: Relative URL for ALB health check

This parameter sets the maximum number of container instances that ECS will create to meet demand. While not strictly necessary, unless you have an unlimited budget you want to set a maximum count on all AWS resources. AWS is perfectly capable of scaling up to thousands of instances to meet a spike in demand, but that doesn't mean that you're capable of paying that bill.

  ApplicationMaxCount:
    Type: Number
    Default: 5
    Description: Max number of container instances

Container Version

This parameter controls the version of the container to launch, and is at the center of how updates work. When you want to update the container version, you change this value and redeploy the template. ECS will create new containers running the new version, make sure they are healthy, change the load balancer to point to the new containers, then shut down the old containers. This way of doing updates is known as Blue / Green deployment, but for a 2-second explanation: it brings up copies of the new version and "kicks the tires" before swapping it out.

  ApplicationVersion:
    Type: String
    Description: Container version of application

Environment Variables

Most containers are set up to take environment variables to configure things like database connections or passwords. This varies wildly based on the container you're using. These two items are particular to our in-house configuration library, but I've left them in place as an example of how to pass through environment variables.

  ApplicationConfiguration:
    Type: String
    Description: Configuration variable for application
    Default: ""
  ApplicationSecrets:
    Type: String
    Description: Secrets variable for application
    Default: ""

ECS Image

A server has to run an actual operating system, but luckily Amazon handles that for us by providing an OS image that's already set up to work as a container "worker node". This lets you deploy EC2 instances for ECS without having to ever touch Linux or a command line. This parameter isn't meant to be changed, it's just a mechanism to pick the recommended EC2 image.

  LatestAmiId:
    Type: 'AWS::SSM::Parameter::Value'
    Default: '/aws/service/ecs/optimized-ami/amazon-linux-2/recommended/image_id'
    Description: AMI to use for EC2 instances

EC2 Sizing

These two variables set the instance type and max number of EC2 instances. Just like with the number of containers, you always want to set a maximum number of containers to make sure you won't get an enormous AWS bill. This template is defaulting to t3a.medium which is a burstable type and not normally used for production, but it will keep your bill down while playing around. Unless you particularly need Intel, AMD instance types are slightly cheaper.

  Ec2PoolInstanceType:
    Type: String
    Default: t3a.medium
    Description: Instance type to use for EC2 Pool
  Ec2PoolMaxSize:
    Type: Number
    Default: 5
    Description: Max number of instances in EC2 Pool

Resources Section

The parameters section defined things passed in to the template, and the resources section of the template is where we define actual resources to be created. Each of the sections corresponds to something you could create manually through the console - this reference describes each resource and what values need to be passed in to each one.

Proxy Stuff

On AWS you normally use an application load balancer (ALB) to spread the incoming load across one or more containers. Application Load Balancers are a specific type of load balancer that's meant for HTTP traffic, instead of something that just forwards TCP connections like a firewall doing destination NAT. In an on-premises installation the function of an ALB would be performed by software like HAproxy or NGINX, or in large installations something like an F5 BIG-IP.

Declaring the Load Balancer

This section creates the load balancer instance, and assigns it to our public subnets so they can receive internet connections. In a normal production setup, you rarely see something in a public subnet that isn't some form of load balancer.

The section !Join ['-', [!Ref ApplicationCode, lb]] is just combining the application code with the string "lb". So if the application is named smurf-website the load balancer will end up being named smurf-website-lb. You'll see this pattern throughout the template, so that when you see things in the AWS console things you can tell what they're used for.

  ApplicationLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: !Join ['-', [!Ref ApplicationCode, lb]]
      Type: application
      IpAddressType: ipv4
      Scheme: internet-facing
      SecurityGroups:
        - !Ref LoadBalancerSecurityGroup
      Subnets:
        - !Ref PublicSubnetA
        - !Ref PublicSubnetB

Setting the DNS Name

This section creates the DNS record that points to the load balancer. Application Load Balancers don't have static IPs - they have a static DNS name that looks like skeleton-lb-464432167434.us-east-1.elb.amazonaws.com. You obviously don't want to use that name, so this section adds the name that we want as an alias for the long name. The end effect is similar to a CNAME record, but handled internally.

  ApplicationDns:
    Type: AWS::Route53::RecordSet
    Properties:
      HostedZoneId: !Ref ApplicationDnsZone
      Name: !Ref ApplicationDnsName
      Type: A
      AliasTarget:
        DNSName: !GetAtt "ApplicationLoadBalancer.DNSName"
        HostedZoneId: !GetAtt 'ApplicationLoadBalancer.CanonicalHostedZoneID'

Setting up the SSL Certificate

This section creates the SSL certificate for the domain name and stores it in Certificate Manager. Using ACM provides free SSL certificates, but more importantly ACM will automatically update the certificate for you. This section defines the certificate you want:

  ApplicationCertificate:
    Type: AWS::CertificateManager::Certificate
    Properties:
      DomainName: !Ref ApplicationDnsName
      ValidationMethod: DNS
      DomainValidationOptions:
        - DomainName: !Ref ApplicationDnsName
          HostedZoneId: !Ref ApplicationDnsZone

ACM creates the certificates the same way as Lets Encrypt - it has to create a DNS record to prove ownership of the domain. So if you don't have the DNS zone in Route53 it won't be able to function, and you'll have to install certificates by hand.

Load Balancer Target Group

A target group is just what is sounds like - a set of places the load balancer can send connections. Container instances in ECS will register to the target group, and when the load balancer can prove the containers are healthy it will start sending them traffic.

  ApplicationTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: !Join ['-', [!Ref ApplicationCode, tg]]
      Protocol: HTTP
      TargetType: ip
      ProtocolVersion: HTTP1
      Port: 8080
      HealthCheckEnabled: true
      HealthCheckIntervalSeconds: 30
      HealthCheckPath: !Ref HealthCheckUrl
      HealthCheckPort: traffic-port
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 3
      UnhealthyThresholdCount: 2
      VpcId: !Ref VPC
      TargetGroupAttributes:
        - Key: load_balancing.algorithm.type
          Value: round_robin
        - Key: stickiness.enabled
          Value: true
        - Key: stickiness.type
          Value: app_cookie
        - Key: stickiness.app_cookie.cookie_name
          Value: AFLSID
        - Key: stickiness.app_cookie.duration_seconds
          Value: 86400

Pay attention to the stickiness settings - these control how the load balancer routes requests if there is more than one container available. If you enable stickiness then users connected to a container will be reconnected to the same container if possible. If your containers store sessions internally then this will prevent users from constantly losing their sessions. If your application doesn't use sessions, or keeps them in a database, then you don't need this.

Load Balancer Listeners

In this section we declare a port 80 listener for the load balancer. We don't want actual traffic to be sent over unencrypted port 80, so this listener doesn't do anything except redirect the request to port 443. In an on-premises setup this would normally be a redirect line in your Apache / NGINX / HAproxy configuration.

  ApplicationListener80:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref ApplicationLoadBalancer
      Protocol: HTTP
      Port: 80
      DefaultActions:
        - Type: redirect
          RedirectConfig:
            Protocol: "HTTPS"
            Port: 443
            Host: "#{host}"
            Path: "/#{path}"
            Query: "#{query}"
            StatusCode: "HTTP_301"

The actual work happens in the encrypted port 443 listener. This is the glue record that connects together the load balancer, listener, certificate, and target group:

  ApplicationListener443:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      LoadBalancerArn: !Ref ApplicationLoadBalancer
      Protocol: HTTPS
      Port: 443
      SslPolicy: ELBSecurityPolicy-TLS13-1-2-2021-06
      Certificates:
        - CertificateArn: !Ref ApplicationCertificate
      DefaultActions:
        - Type: forward
          ForwardConfig:
            TargetGroups:
              - TargetGroupArn: !Ref ApplicationTargetGroup
                Weight: 100

The SSL policy named ELBSecurityPolicy-TLS13-1-2-2021-06 determines which versions of SSL the load balancer will support or allow, so if you have problems with older browsers or API callers you may need to adjust that. For example, if you have Windows 2012 servers that need to connect, you will need to leave some older protocols enabled.

Security Groups

Nearly every resource in AWS comes with a security group, which is basically a tiny firewall surrounding the resource. While security groups can include IPs directly, they commonly reference other security groups - this automatically expands to include the IP address of any resource that has that security groups. This lets you reference things like containers that don't have fixed IP addresses.

Load Balancer Security Group

We want the load balancer to be able to receive traffic from anywhere, so this container allows inbound traffic to port 80 and 443. It also allows outbound traffic to port 8080 to allow it to talk to the containers.

  LoadBalancerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Join ['-', [!Ref ApplicationCode, lb]]
      GroupDescription: !Join ['-', [!Ref ApplicationCode, lb]]
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 0.0.0.0/0
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: 8080
          ToPort: 8080
          CidrIp: 0.0.0.0/0

Application Security Group

The application security group is used to control traffic from the application container itself - your running code. This version only allows in traffic on port 8080 from the load balancer. Outgoing access is left open so the application can reach out to whatever it wants, but this could easily be limited to only allow specific traffic, for example to a database instance.

  ApplicationSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Join ['-', [!Ref ApplicationCode, application]]
      GroupDescription: !Join ['-', [!Ref ApplicationCode, application]]
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 8080
          ToPort: 8080
          SourceSecurityGroupId: !Ref LoadBalancerSecurityGroup
      SecurityGroupEgress:
        - IpProtocol: -1
          CidrIp: 0.0.0.0/0

Using one security group in the definition of another security group is a little strange coming from traditional firewalls, but it's common in AWS. You can think of this pattern as expanding in place to the IP addresses of anything else that has that security group attached. Or to put it another way, this will allow incoming traffic from anything that has the LoadBalancerSecurityGroup security group attached to it.

EC2 Security Group

The ec2 security group is used to control traffic from the EC2 containers hosts. This currently allows no incoming access, and outgoing access to anywhere. Outgoing is left on so the SSM Session agent can register, allowing you to access the host that way for debugging.

  Ec2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Join ['-', [!Ref ApplicationCode, ec2]]
      GroupDescription: !Join ['-', [!Ref ApplicationCode, ec2]]
      VpcId: !Ref VPC
      SecurityGroupEgress:
        - IpProtocol: -1
          CidrIp: 0.0.0.0/0

IAM Roles

AWS resource that run code usually have an associated role, which is an IAM role that gives the resource permissions to make calls to AWS. Windows services have a login that controls what the service can do - this is the same principle. We want to limit each thing to the smallest set of privileges that we can, under the Principle of Least Privilege. Of course in practice you end up starting out pretty broad, then [theoretically] tighten things up.

Execution Role

This role is assumed by the ECS agent running on the EC2 instances, and only needs to be able to pull images and a few secret records. We want to use managed policies whenever possible, because Amazon keeps them updated with any rights needed. AWS has pre-built roles for a lot of common tasks.

  ExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [!Ref ApplicationCode, taskexec]]
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ecs-tasks.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
      Policies:
        - PolicyName: !Join ['-', [!Ref ApplicationCode, taskexec]]
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - secretsmanager:GetSecretValue
                Resource: !Ref RegistryLogin

Application Role

The application role is assumed by your actual containers, so you may need to adjust this so your container can do the things it needs to do. In this case the configuration library pulls information from AppConfig and Secret Manager, so the role only includes those rights. If your application needed to use other AWS services - for example pull data from certain S3 buckets - you could include them in the role as well.

  ApplicationRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [!Ref ApplicationCode, app]]
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ecs-tasks.amazonaws.com
            Action:
              - sts:AssumeRole
      Policies:
        - PolicyName: !Join ['-', [!Ref ApplicationCode, app]]
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - appconfig:StartConfigurationSession
                  - appconfig:GetLatestConfiguration
                  - secretsmanager:GetSecretValue
                Resource: "*"

Note that this role allows access to any AppConfig or SecretsManager document - one way to tighten up the rights would be to only allow access to specific records that the application needs to access.

EC2 Instance Role

The EC2 instance role is assigned to the EC2 instance. It has to include the STS AssumeRole rights to allow the ECS container agent to assume the roles it needs. The two managed policies at the bottom allow SSM Connection Manager to function, allowing you to open a console session on the hosts if needed for debugging.

  Ec2InstanceRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [!Ref ApplicationCode, ec2]]
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Auto-Scaling Role

The auto-scaling role is used by the container auto-scaling function of ECS, so it needs to include rights that allow it to update services and change the amount of required instances.

  AutoScalingRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [!Ref ApplicationCode, autoscale]]
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - application-autoscaling.amazonaws.com
            Action:
              - sts:AssumeRole
      Path: "/"
      Policies:
        - PolicyName: !Join ['-', [!Ref ApplicationCode, autoscale]]
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - ecs:UpdateService
                  - ecs:DescribeServices
                  - application-autoscaling:*
                  - cloudwatch:DescribeAlarms
                  - cloudwatch:GetMetricStatistics
                Resource: "*"

Logging Setup

This section creates a customer CloudWatch log group, which will collect the output from your container instances. You normally want a separate logging group for each application so the logs don't get jumbled up with everything else. Be sure to specify a retention period, since CloudWatch logs can get out of hand as far as storage.

  CloudWatchLogsGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Ref ApplicationCode
      RetentionInDays: 14

Container Definition

The container definition sets up every container that will be launched. This is where you could adjust the CPU and memory allocation of your containers, or perhaps use a parameter to allow that to be set without changing the template.

  ApplicationDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Name: application
          Image: !Sub "${ApplicationUrl}:${ApplicationVersion}"
          RepositoryCredentials:
            CredentialsParameter: !Ref RegistryLogin
          PortMappings:
            - ContainerPort: 8080
              Protocol: tcp
          Environment:
            - Name: CONFIGURATION
              Value: !Ref ApplicationConfiguration
            - Name: SECRETS
              Value: !Ref ApplicationSecrets
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref CloudWatchLogsGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: application
      Family: !Join ['-', [!Ref ApplicationCode, application]]
      NetworkMode: awsvpc
      TaskRoleArn: !Ref ApplicationRole
      ExecutionRoleArn: !Ref ExecutionRole
      Cpu: 256
      Memory: 1024
      RuntimePlatform:
        CpuArchitecture: X86_64
        OperatingSystemFamily: LINUX

EC2 Stuff

AWS Fargate is far easier than setting up EC2 instances to back your cluster, but it costs about 30% more than running your own EC2 instances. At the end of the day it was worth the cash for us to use the more complicated option, but you can make your own choice. This template uses an EC2 auto-scaling group instead of Fargate.

EC2 Image Setup

The instance profile assigns security rights to new EC2 instances. I'm not sure why this isn't included in the launch template - it just isn't.

  Ec2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      InstanceProfileName: !Join ['-', [!Ref ApplicationCode, ec2]]
      Roles:
        - !Ref Ec2InstanceRole

The launch template tells EC2 how to create a new instance if it's needed, and tells it the size instance to create. The UserData section tells the ECS agent on the new instance which cluster to register with. It uses the Ec2PoolInstanceType parameter to set what kind of instances are created.

  Ec2Launch1:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Join ['-', [!Ref ApplicationCode, 1]]
      LaunchTemplateData:
        ImageId: !Ref LatestAmiId
        InstanceType: !Ref Ec2PoolInstanceType
        SecurityGroupIds:
          - !GetAtt Ec2SecurityGroup.GroupId
        IamInstanceProfile:
          Arn: !GetAtt Ec2InstanceProfile.Arn
        PrivateDnsNameOptions:
          EnableResourceNameDnsAAAARecord: false
          EnableResourceNameDnsARecord: false
        UserData:
          Fn::Base64: !Sub |
            #!/bin/bash -xe
            echo ECS_CLUSTER=${Cluster} >> /etc/ecs/ecs.config

EC2 Auto-Scaling Group

An EC2 auto-scaling group manages a fleet of EC2 instances, allowing the fleet to grow and shrink as it's needed. Even if you are only ever going to have one server, it's still a good idea to use an auto-scaling group in case the server dies, or the Availability Zone goes offline - the auto-scaling group will detect that and create a replacement automatically. Neither of those things is a routine occurence, but they do happen.

  Ec2ScalingGroup1:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: !Join ['-', [!Ref ApplicationCode, 1]]
      HealthCheckGracePeriod: 60
      LaunchTemplate:
        LaunchTemplateId: !Ref Ec2Launch1
        Version: !GetAtt Ec2Launch1.LatestVersionNumber
      VPCZoneIdentifier:
        - !Ref PrivateSubnetA
        - !Ref PrivateSubnetB
      NewInstancesProtectedFromScaleIn: true
      MaxSize: !Ref Ec2PoolMaxSize
      MinSize: 0
      DesiredCapacity: 1
      Cooldown: 60
      Tags:
        - Key: Name
          Value: !Join ['-', [!Ref ApplicationCode, autoscale, 1]]
          PropagateAtLaunch: "true"

EC2 ECS Capacity Provider

The container management of ECS uses Capacity Providers to provide the servers to run containers. This capacity provider is linked back to the EC2 autoscaling group that we created above, and offers server capacity to ECS.

Pay attention to the TargetCapacity setting, which tells ECS the percent of the resources that should be used. In this case we're specifying 100% meaning that we want to use all of our EC2 capacity to provide services before creating new EC2 instances. If you had multiple types of containers running, you might want to lower that value to allow containers to be quickly created without having to wait for a new EC2 host to be provisioned - container startup is nearly instantaneous while EC2 might take several minutes to provision hosts.

  Ec2CapacityProvider1:
    Type: AWS::ECS::CapacityProvider
    Properties:
      Name: !Join ['-', [!Ref ApplicationCode, 1]]
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref Ec2ScalingGroup1
        ManagedScaling:
          Status: ENABLED
          TargetCapacity: 100
          MaximumScalingStepSize: 1
          InstanceWarmupPeriod: 60
        # Bug https://github.com/aws/aws-cdk/issues/14732
        ManagedTerminationProtection: DISABLED

Cluster Definition

AWS Elastic Container Service is built around the idea of a Cluster, which are sets of resources. At small scale a cluster will normally correspond to an application, but they can be used many ways. At our scale a better name for a Cluster would have been Service Set - think of all the related Exchange services or Backup Exec service.

Declaring the Cluster

This section declares the customer and it's name.

  Cluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Ref ApplicationCode

Capacity Providers

We have to associate the capacity providers we created before with this cluster, which is done by this association record. This is a simple example - there are many ways to configure capacity, including tiered use of multiple pools and use of spot instances.

  ClusterCapacityProviderAssociation:
    Type: AWS::ECS::ClusterCapacityProviderAssociations
    Properties:
      Cluster: !Ref Cluster
      CapacityProviders:
        - !Ref Ec2CapacityProvider1
      DefaultCapacityProviderStrategy:
        - CapacityProvider: !Ref Ec2CapacityProvider1
          Base: 1
          Weight: 1

The Service

The service record defines a task we want to keep running, the same as a Windows service. This section links the load balancer, container definition, capacity provider, and networking configuration together.

Note that in this case we have to explicitly define some dependencies, or CloudFormation will create dependent resources in the wrong order. Normally CloudFormation understands dependencies, but in this particular case it does not.

This section also controls the strategy the ECS controller uses for container placement. In our case the application is a Java Tomcat application. A Java virtual machine will - by design - usually grow to consume the entire amount of memory it's allocated. This means the constraint on container placement will normally be memory, so we want to pack containers onto hosts based on that amount.

  ApplicationService:
    Type: AWS::ECS::Service
    DependsOn:
      - ApplicationLoadBalancer
      - ApplicationListener443
      - ClusterCapacityProviderAssociation

    Properties:
      ServiceName: application
      Cluster: !Ref Cluster
      DesiredCount: 1
      PlacementStrategies:
        - Field: MEMORY
          Type: binpack
      DeploymentController:
        Type: ECS
      TaskDefinition: !Ref ApplicationDefinition
      LoadBalancers:
        - ContainerName: application
          ContainerPort: 8080
          TargetGroupArn: !Ref ApplicationTargetGroup
      NetworkConfiguration:
        AwsvpcConfiguration:
          SecurityGroups:
            - !Ref ApplicationSecurityGroup
          Subnets:
            - !Ref PrivateSubnetA
            - !Ref PrivateSubnetB

Auto-scaling Target

The auto-scaling target is a glue record that represents the target number of something we need.

  ApplicationAutoScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: !Ref ApplicationMaxCount
      MinCapacity: 1
      ResourceId: !Join ['/', [ service, !Ref ApplicationCode, !GetAtt ApplicationService.Name ]]
      RoleARN: !GetAtt AutoScalingRole.Arn
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs

Container Auto-Scaling

The container auto-scaling policy controls how the service scales how many containers are needed. This is different from the EC2 auto-scaling, which adjusts the number of EC2 instances are needed to meet the requirements of ECS. The container auto-scaling controls the EC2 auto-scaling.

In this case our application is a Java Tomcat application. In normal operation a Java virtual machine will grow to use all the memory it has been allocated. This means that memory use is not a good indication of the load on the container, so we use CPU load instead. In this case we are targeting 50% CPU utilization - if the CPU is lower than that we scale down, and if it's higher than that we scape up. The Cooldown metrics slow the process down so the target isn't constantly scaling up and down.

If this was for example a PHP application running on FPM, then memory use might be a better indicator of server load - FPM doesn't run more than one request at a time in a process, and scales the number of processes up and down. You will probably have to play with these settings while testing your application.

  ApplicationAutoScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: !Join ['-', [ !Ref ApplicationCode, application, autoscale ]]
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ApplicationAutoScalingTarget
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 50
        ScaleInCooldown: 300
        ScaleOutCooldown: 300
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageCPUUtilization

Gotchas

There are a few places where AWS services either don't behave as expected, or fail in unexpected ways if you don't enable an obscure feature. These are the gotchas that I have found so far.

VPC Trunking / ENI Quota

Normally each container gets a unique IP address, which is done by adding a virtual NIC or "eni" to the EC2 instance. The number of allowed NICs on an EC2 instance is surprisingly small - for example, looking at the table from this page an m6a.large instance can only have 3 network interfaces. Since the EC2 instance itself uses a network interface that only allows two containers to run on the instance.

To get around this problem you have to enable something called awsVpcTrunking at the account level, which allows it to run multiple IP addresses through a smaller number of network interfaces. The procedure to enable this is described here. There are a number of limitations and requirements which are documented here.

Deploying

There are lots of automated ways to deploy a CloudFormation Template, but to keep it simple we can just put it into a shell script. This assumes you have the AWS command line toolkit installed and you have logged in with enough rights to create everything.

The AWS CLI is available for Linux, Windows, and Mac here, so you can stay on your chosen operating system. You can also download the Powershell module here if PowerShell is your soup du jure, although I haven't used it myself.

Tags

Tags are not required by AWS, but are commonly used to assign costs to projects - in our case this assigns the development environment and the application code "skeleton". Many companies have specific policies about tagging - for example you might be required to tag all resources with a cost code so accounting can assign those costs back to a budget item.

Sample Script

This is a Linux shell script to deploy a CloudFormation template to a stack - to use a BAT file you'd have to take out the bash stuff and change the line continuation marks to a carat (^). Obviously you'll want to replace the parameter overrides with values to match your environment.

#!/bin/sh

set -eu

aws cloudformation deploy \
        --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \
        --template-file 1web-2az-ec2.yaml \
        --stack-name skeleton \
        --parameter-overrides \
                ApplicationCode=skeleton \
                ApplicationUrl=registry.teaglu.com/apps/mqgateway \
                RegistryLogin=arn:aws:secretsmanager:us-east-1:178310642712:secret:dev-deploy-7ef5af \
                ApplicationVersion=1.3.1 \
                ApplicationDnsZone=Z03833892UY7BXR41UC8W \
                ApplicationDnsName=skeleton.omniglobalprotodynamics.com \
                VPC=vpc-00c00210b9f8c6039 \
                PrivateSubnetA=subnet-093d25366586b3e6b \
                PrivateSubnetB=subnet-0aaf47bc7c20a78d8 \
                PublicSubnetA=subnet-0c7920621fc69ff3a \
                PublicSubnetB=subnet-0186761a610eb7af2 \
                ApplicationConfiguration=aws://appconfig/skeleton/config/dev \
                Ec2PoolInstanceType=t3a.small \
                Ec2PoolMaxSize=2 \
        --tags \
                teaglu:env=dev \
                teaglu:app=skeleton

Each time you run the script, it will attempt to deploy the template again and change any resources to match the file. This is the normal way of adjusting things like instance sizes or container versions.

Shutting it All Down

CloudFormation keeps a reference to all the resources it creates, so if you want to get rid of the entire template and its resources you can just delete it from the CloudFormation console. When developing templates it's not unusual to repeatedly create and destroy stacks. CloudFormation can be slower than you think deploying resources, particularly things that are globally deployed like IAM roles. Whenever working on templates I try to have a TV nearby, or maybe Fall Guy.

Share and Enjoy

The template linked in this article is released as CC0, so use it or any parts of it however you want.

Share:

© 2024 Teaglu, LLC