Saving Money with NAT Instances

Jul 31, 2023 -- Posted by : dwarren

This is a walk-through of setting up redundant NAT instances in AWS to replace NAT Gateway. There have been some other fantastic articles on this topic, including this excellent one by Ben Whaley at Chime. This guide is targeted at people from a more conventional system administration background, and assumes a bare minimum of AWS design experience.

Background

The Problem

In a normal on-premises environment, each resource would have an internal address on a single network adapter. To make a service available to the internet at large, you would set up some form of NAT that redirects incoming traffic to an internal resource. If there is no static NAT in place, then the firewall would just use a default address to provide internet access - in a larger installation it might use a pool of addresses. This is also how Azure works.

In AWS, an Internet Gateway can only do static NAT. A subnet associated to an Internet Gateway expects everything to have a public IP address assigned - this is referred to as a Public Subnet. Subnets without an associated Internet Gateway are referred to as a Private Subnets.

Most of your resources don't need to be publicly available, and don't need a public IP address. Even items meant to be publically accessible are usually hidden behind some form of AWS load balancer, so only the load balancer needs a public IP address. In practice, the only things that normally live in public subnets are load balancers.

So in these Private Subnets, where does the default route point to? By default - nowhere, meaning these resources won't have any internet access at all. While these resources may not need to be publically available, most of them need to access the public internet for some reason.

To solve this problem, you have to run something running in the public subnet that can proxy traffic for everything in your private subnets, and pointing the default route in the private subnet to that something. Of course we don't want availability zones depending on each other, so we have to have one of this something in every availability zone.

Some Options

Running Everything Public

One option is to run all your services in a public subnet, and just ignore the somewhat bizarre public/private distinction. This does work up to a point, but you will run into service quotas on the number of IPs fairly quickly. If you're never going to go above a few machines, this might be a valid way of operating.

Note: On 7/28/2023 AWS announced an additional per-IP charge even when IPs are attached to a running instance.

Using NAT Gateway

AWS provides a managed service named NAT Gateway, which does everything we need. Why doesn't everyone use that then? It's preposterously expensive. NAT Gateway has a base price of $0.045/hour, which in a standard 30-day month comes to $32.40 per NAT Gateway per month. In a two-AZ redundant configuration that comes to $64.80/mo for NAT services.

For production or just companies with larger budgets, the true cost of NAT Gateway is the processing fee of $0.045 per GB of data processed. The base cost of outgoing data from US East 1 is $0.09/GB, so NAT Gateway adds an additional 50% to their outgoing traffic cost. For a large site that can be a significant cost.

For those of us operating on shoe-strings and trying to stretch our AWS Activate credits, this essentially makes an AWS VPC cost $64.80/mo just to idle. If you follow AWS guidance and create lots of separate accounts, this can add up to a large cost to accomplish very little.

Using NAT Instances

A single EC2 instance can do the job of NAT Gateway using nothing but IPTables. This lets you get rid of the per-GB processing charge, and an EC2 instance type with reservation can be purchased for about half the cost of NAT Gateway. A c7g.medium instance with a 1-year no-upfront reservation is $17.45/mo, so in a two-AZ redundant configuration your "idle cost" is reduced to $34.90/mo, and the processing charge is completely removed.

This is the option we're discussing in this article.

Sharing NAT Providers

You would think you can share either kind of NAT provider - either the managed NAT Gateway or your own NAT instances. That's what VPC Peering is for, right? Unfortunately the fine print of VPC Peering tells us that peerings are always non-transitive, and that minor detail means you can't use a router in another VPC.

It is possible to share NAT resources using the more complicated AWS Transit Hub. That may be a valid path for other reasons, but Transit Hub also has a per-hour and per-GB cost that's comparable with NAT Gateway. If you're using NAT Gateway that means you now have two per-GB charges piling up on top of each other, and if you're using NAT instances that means you're back to the very per-GB rate we're trying to avoid.

Replacing NAT Gateway

A Baseline - What Does NAT Gateway Do?

To look at our solution for replacing NAT Gateway, we go through each function provided by the NAT Gateway managed service, and look at whether it's possible using our own implementation.

NAT for Outgoing Connections

This is the easy part - we just create a Linux instance, enable IP forwarding, and add an outgoing IPTables "masquerade" rule. The fact that this only takes a few commands will help us down the line, because we don't have to create a Golden Image - we can just include the commands in the user data section and use a base Linux image.

  We can do this.

Automatic Infrastructure Replacement

If something underneath NAT Gateway has a problem it will be automatically replaced - that's part of the managed service offering. We can replacate this function by using an EC2 Auto-Scaling Group set to target one active instance - this is usually called "Scaling Group of One".

  We can do this.

Automatic Infastructure Scaling

If your resources need to send an enormous amount of data, NAT Gateway will automatically scale up the underlying infrastructure to handle it. While it's certainly possible to pick larger instance types, our solution doesn't address doing this automatically.

  We're not doing this right now.

Preserving State during Infrastructure Events

NAT Gateway is able to preserve the state of outgoing TCP connections during scaling events, presumably using technology similar to what is used in active-active firewall pairs. I tip my hat to AWS engineers for pulling off this feat, but for our solution we're just going to rely on the higher application layers to retry things.

  We're not doing this right now, and probably never will.

High-Level Design

For a given VPC, we want one NAT instance in each availability zone. While a NAT instance in one availability zone can service subnets in another, this isn't a redundant configuration. Additionally, while data transfer between availability zones is cheap - it isn't free.

Making sure we always have a running instance is easy enough using the Auto-Scaling Group of One trick.

The routing changes need to respond to changes, so they can't really be specified in a template. For that functionality we create a Lambda function that scans the available instances, selects one NAT instance to use for each availability zone, and updates the routes to use that NAT instance. We assign a specific tag to each NAT instance, so that the Lambda function can find them.

As it turns out there are few housekeeping items that are difficult to do from a CloudFormation template - specifically disabling the source/destination filter and assigning elastic IP addresses. So our Lambda function takes care of those items as well.

The final piece is a EventBridge rule that calls our Lambda function every time something changes with one of our NAT instances.

CloudFormation Template

In this section we go through a CloudFormation template to set up NAT Instances for two availability zones. This could also be done using CDK or Terraform, which are both arguably more complete tools. I chose to use CloudFormation both because I think it's more understandable, and because we don't have Terraform in place yet.

You can download the full template here.

Parameters Section

VPC Code

At Teaglu we assing every VPC a code. This gets encoded into infrastructure names so that - for example - in the EC2 control panel a NAT instance isn't just immediately identifiable as a NAT instance, it's obvious which VPC it serves.

  VPCCode:
    Type: String
    Description: Code for VPC

Network Stuff

These parameters identify the VPC and public subnets we'll be serving. We don't actually need the private subnets to be specified - the Lambda function will find the correct private subnets.

  VPC:
    Type: AWS::EC2::VPC::Id
    Description: VPC for deployment
  PublicSubnetA:
    Type: AWS::EC2::Subnet::Id
    Description: Public subnet in Availability Zone 1
  PublicSubnetB:
    Type: AWS::EC2::Subnet::Id
    Description: Public subnet in Availability Zone 2

Container AMI

This parameter identifies the Linux AMI we're going to be using for NAT instances. This parameter isn't meant to be overridden, it's simple a method to retrieve the latest AMI identifier from SSM instead of hard-coding a version. Since ARM instances are cheaper than x64 instances and we have no reason not to use ARM, we're using ARM images and Graviton instance types.

  LatestAmiId:
    Type: 'AWS::SSM::Parameter::Value'
    Default: '/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-arm64-gp2'
    Description: AMI to use for EC2 instances

Instance Type

A NAT instance doesn't really have to do anything except forward packets using IPTables. In this case we've picked the smallest non-burstable instance type of the latest Graviton generation. We wouldn't want to use a burstable type because it might not have CPU credits or host resources available to forward packets when it needs to.

  Ec2InstanceType:
    Type: String
    Default: c7g.medium
    Description: Instance type to use for EC2 Instances

SNS Notification Topic

The Lambda publishes an event to an SNS topic every time it makes a change. For now we just have the SNS topic set to a notification topic that lets operations know something happened - in case there's a complaint about a network event. While this solution is resilient, we can still have momentary outages while the system replaces a NAT instance.

  NotifySns:
    Type: String
    Description: SNS Topic to send notifications

Elastic IP Mapping

This parameter is a JSON map from availability zone to elastic IP. If the lambda replaces a NAT instance, it will check this mapping to see if it should assign a particular elastic IP to the instance so that outgoing connections come from the same IP address. If you don't need to do this, you can just pass {}.

My personal preference is to have a set of persisent IPs used for outbound connections, and set the reverse DNS to something under our domain name - for example nat-ue1-1b.aws.teaglu.net. You can't set reverse DNS names without making the forward DNS name match, so this isn't easy to do in a template, particularly when the DNS zone used isn't in the same account.

  ElasticIpMap:
    Type: String
    Description: Elastic IP Mapping

Resources Section

EC2 Security Group

The security group for our NAT instance allows all traffic outbound so it can forward connections anywhere, and allows all traffic inbound so it can forward traffic from anywhere.

  Ec2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupName: !Join ['-', [nat, !Ref VPCCode]]
      GroupDescription: !Join ['-', [nat, !Ref VPCCode]]
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: -1
          CidrIp: 0.0.0.0/0
      SecurityGroupEgress:
        - IpProtocol: -1
          CidrIp: 0.0.0.0/0

EC2 Instance Role

The EC2 instance role includes the SSM managed policies, so that you can use SSM to connect to the instances if there's a problem.

  Ec2InstanceRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [nat, !Ref VPCCode, ec2]]
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - ec2.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
        - arn:aws:iam::aws:policy/AmazonSSMManagedEC2InstanceDefaultPolicy

Defining the NAT Instance

The instance profile and launch template define how to launch a new NAT instance. This NAT instance isn't based on an image we made and doesn't include any software other than the base Linux install, so we have to use the User Data to set it up to NAT traffic:

  Ec2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      InstanceProfileName: !Join ['-', [nat, !Ref VPCCode]]
      Roles:
        - !Ref Ec2InstanceRole

  Ec2Launch:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateName: !Join ['-', [nat, !Ref VPCCode]]
      LaunchTemplateData:
        ImageId: !Ref LatestAmiId
        InstanceType: !Ref Ec2InstanceType
        IamInstanceProfile:
          Arn: !GetAtt Ec2InstanceProfile.Arn
        NetworkInterfaces:
          - DeviceIndex: 0
            AssociatePublicIpAddress: true
            Groups:
              - !GetAtt Ec2SecurityGroup.GroupId
        PrivateDnsNameOptions:
          EnableResourceNameDnsAAAARecord: false
          EnableResourceNameDnsARecord: false
        UserData:
          Fn::Base64: !Sub |
            #!/bin/bash -xe
            iptables -A INPUT -i eth0 -m state --state ESTABLISHED,RELATED -j ACCEPT
            iptables -A INPUT -i eth0 -j DROP
            iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
            sysctl net.ipv4.ip_forward=1

The User Data command will be run every time the server boots. To go through the commands one at a time:

Command Purpose
#!/bin/bash -xe Tells the server to run this set of commands using bash.
iptables -A INPUT -i eth0 -m state --state ESTABLISHED,RELATED -j ACCEPT Allow return traffic for connections from the server.
iptables -A INPUT -i eth0 -j DROP Don't allow any incoming connections to the server. Because the server needs a wide-open security group to function, we have to make sure that the server doesn't accidentally expose any service itself.
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE Use source NAT to forward any connections not sent to the server itself.
sysctl net.ipv4.ip_forward=1 Turn on IP forwarding, to allow the server to operate as a router.

Scaling Groups of One

Instead of creating an instance directly in each public subnet, we create an auto-scaling group in each public subnet with a target number of 1 instances. This means that if our instance is killed or becomes unhealthy, it will be automatically replaced. While it would be possible to use a single auto-scaling group designed to spread across availability zones, that seemed "un-necessarily convoluted".

We want the Lambda and EventBridge rule to be in place before creating the NAT instances, so we create an explicit dependency using the DependsOn construct. Without this, the first time the stack is deployed the Lambda might not be in place to catch the initial state change of the EC2 instances.

  Ec2ScalingGroupA:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      - Lambda
      - EventBridgeRule
    Properties:
      AutoScalingGroupName: !Join ['-', [nat, !Ref VPCCode, a]]
      HealthCheckGracePeriod: 60
      LaunchTemplate:
        LaunchTemplateId: !Ref Ec2Launch
        Version: !GetAtt Ec2Launch.LatestVersionNumber
      VPCZoneIdentifier:
        - !Ref PublicSubnetA
      MaxSize: 1
      MinSize: 1
      DesiredCapacity: 1
      Tags:
        - Key: Name
          Value: !Join ['-', [nat, !Ref VPCCode, a]]
          PropagateAtLaunch: "true"
        - Key: teaglu:app
          Value: nat
          PropagateAtLaunch: "true"

  Ec2ScalingGroupB:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      - Lambda
      - EventBridgeRule
    Properties:
      AutoScalingGroupName: !Join ['-', [nat, !Ref VPCCode, b]]
      HealthCheckGracePeriod: 60
      LaunchTemplate:
        LaunchTemplateId: !Ref Ec2Launch
        Version: !GetAtt Ec2Launch.LatestVersionNumber
      VPCZoneIdentifier:
        - !Ref PublicSubnetB
      MaxSize: 1
      MinSize: 1
      DesiredCapacity: 1
      Tags:
        - Key: Name
          Value: !Join ['-', [nat, !Ref VPCCode, b]]
          PropagateAtLaunch: "true"
        - Key: teaglu:app
          Value: nat
          PropagateAtLaunch: "true"

Lambda IAM Role

To use a Lambda function, we have to define an IAM role that has the rights to read the current state of instances and routing tables, and the rights to update routing tables and any other housekeeping issues.

  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Join ['-', [nat, !Ref VPCCode, lambda]]
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
            - sts:AssumeRole
      Path: "/"
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEC2ReadOnlyAccess
      Policies:
        - PolicyName: LoggingPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: "*"
        - PolicyName: UpdatePolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - ec2:CreateRoute
                  - ec2:DeleteRoute
                  - ec2:ModifyNetworkInterfaceAttribute
                  - ec2:AssociateAddress
                Resource: "*"
        - PolicyName: NotifyPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - sns:Publish
                Resource: "*"

The Lambda Function

We can create the Lambda function in the template as well, either by pointing to the code or including it in-line. In this example we're just pointing to an S3 bucket for the actual code.

  Lambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Join ['-', [nat, !Ref VPCCode, updater]]
      Handler: index.handler
      Architectures:
        - arm64
      Runtime: nodejs18.x
      Role: !GetAtt LambdaRole.Arn
      Environment:
        Variables:
          EIPMAP: !Ref ElasticIpMap
          NOTIFY_TOPIC: !Ref NotifySns
          REGION: !Ref "AWS::Region"
          VPC: !Ref VPC
      MemorySize: 128
      ReservedConcurrentExecutions: 1
      Timeout: 20
      Code:
        S3Bucket: teaglu-lambda
        S3Key: nat-updater.zip

An EventBridge Rule to Fire the Lambda

Instead of continuously running our Lambda function to check for changes, we create an EventBridge rule that fires whenever an EC2 instance starts or stops. There isn't a way to filter this to only our NAT instances, so it will be left up to the Lambda function to determine if the instance that triggers the rule is of interest.

  EventBridgeRule:
    Type: AWS::Events::Rule
    Properties:
      Name: !Join ['-', [nat, !Ref VPCCode, instance, change ]]
      EventBusName: default
      EventPattern:
        source:
          - aws.ec2
        detail-type:
          - EC2 Instance State-change Notification
      State: ENABLED
      Targets:
        - Arn: !GetAtt Lambda.Arn
          Id: 'execute-route-lambda'

Allowing the EventBridge Rule to Invoke the Lambda

This final step grants permission for the EventBridge rule to call the lambda function. This is normally done automatically by the console, but since we're using a template we have to do this step by hand:

  EventBridgeLambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt Lambda.Arn
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt EventBridgeRule.Arn

Lambda Function

When our NAT instances are created or destroyed, they won't automatically be added into or take out of routing tables - for that we have to catch CloudWatch events and make the changes in a Lambda function. We chose to create the function Node/TypeScript because it's easy to adjust and appears to be the lingua-franca of Lamda functions, but any supported language will work.

The full function is available here. This is a walk-through of the function logic:

Getting a NAT Instance for Each Availability Zone

Our function gathers a list of NAT instances in the VPC we care about, and what availability zone each of them is running in. It's return value is a mapping keyed on availability zone, where each availability zone has a single NAT instance selected. We take care that even if two instances are out there, we select the same NAT instance every time - otherwise our result could flip back and forth every time the function is executed.

We find the NAT instances by the instance state being running and the presense of a particular tag - in our case the tag teaglu:app will be set to the value nat. You can change this to suit your environment and tagging strategy. In our case we use the teaglu:app tag to separate expenses by product or application, and using nat fits right in by creating a psuedo-application named "nat".

The application also gathers whether or not an Elastic IP is connected to the NAT instance, so that can be fixed later on. The instances will be assigned a random IP allocation when they are created, but just to be good neighbors we keep elastic IPs with a reverse DNS set under our domain name.

If the function sees anything weird about the NAT instance - for example it having more than one network interface - it logs the issue but ignores the NAT instance. If anything doesn't match the conditions the function was built to handle, the function should ignore it.

const getNatInstances= async (ec2Client, vpcId, notifyEvents) => {
  // Map of NAT instances by AZ to return
  var natInstances= {};

  // List of instance IDs to query for EIPs
  var instanceIds= [];

  // Map of NAT instances by instance ID for updating them with EIPs
  var instanceIndex= {};
  
  // Get a list of running EC2 instances with the NAT application tag
  const ec2ListResponse= await ec2Client.send(new DescribeInstancesCommand(
    {
      Filters: [
        {
          Name: "instance-state-name",
          Values: ['running']
        },
        {
          Name: 'tag:teaglu:app',
          Values: ['nat']
        }
      ]
    }));

  // I don't know why everything comes back under a reservation, but it does
  for (var reservationNo= 0; reservationNo < ec2ListResponse.Reservations.length; reservationNo++) {
    const reservation= ec2ListResponse.Reservations[reservationNo];

    for (var instanceNo= 0; instanceNo < reservation.Instances.length; instanceNo++) {
      var instance= reservation.Instances[instanceNo];

      var networkInterfaceId= null;

      var ignore= false;
      var sourceDestCheck= false;

      // Loop through attached NICs      
      for (var nicNo= 0; nicNo < instance.NetworkInterfaces.length; nicNo++) {
        var nic= instance.NetworkInterfaces[nicNo];
        
        if (!networkInterfaceId) {
          // Save this for later so we can fix it if need be.  Cloudformation
          // doesn't have a way to set this properly so it's our job
          sourceDestCheck= nic.SourceDestCheck;

          networkInterfaceId= nic.NetworkInterfaceId;
        } else {
          // We're not set up to properly deal with anything that has more
          // than one network interface, so ignore the instance.
          ignore= true;
          
          notifyEvents.push({
            event: 'natInstanceMisconfigured',
            instanceId: instance.InstanceId,
            message: 'Multiple Network Interfaces Found'
          });
        }
      }

      var availabilityZone= instance.Placement.AvailabilityZone;

      if (availabilityZone in natInstances) {
        // Only keep the one with the lowest instance ID
        if (natInstances[availabilityZone].instanceId < instance.InstanceId) {
          ignore= true;
        }
        
        notifyEvents.push({
          event: 'multipleNatInstances',
          instanceId: instance.InstanceId,
          message: 'Multiple NAT Instances Found'
        });
      }

      if (!ignore) {
        const record= {
          InstanceId: instance.InstanceId,
          PrivateIpAddress: instance.PrivateIpAddress,
          NetworkInterfaceId: networkInterfaceId,
          SourceDestCheck: sourceDestCheck,
          AllocationId: null
        };

        // Add to return map        
        natInstances[availabilityZone]= record;
        
        // Add to structures for EIP hunting
        instanceIndex[instance.InstanceId]= record;
        instanceIds.push(instance.InstanceId);
      }
    }
  }

  // Look up all the EIP records to see if any are assigned to our instances
  const eipResponse= await ec2Client.send(new DescribeAddressesCommand({
    Filters: [
      {
        Name: "instance-id",
        Values: instanceIds
      }
    ]
  }));

  // Go through EIP results and attach to instances  
  for (var addressNo= 0; addressNo < eipResponse.Addresses.length; addressNo++) {
    var address= eipResponse.Addresses[addressNo];
    
    if (address.InstanceId in instanceIndex) {
      instanceIndex[address.InstanceId].AllocationId= address.AllocationId;
    }
  }

  return natInstances;
};

Reading the Route Tables to see What's What

The job of this function is to read through every route table in a given VPC that we might need to adjust. Each route table returned doesn't include the full list of routes, it just includes where the current default route points. Like the previous function, if we see anything in the default route that we don't recognize, we want to log it but otherwise do nothing. For instance, if the default route pointed to a transit gateway, then the route table is not something this function should mess with.

// Get a list of route tables we want to evaluate
const getRouteTables= async (ec2Client, vpcId, subnets, notifyEvents) => {
  var routeTables= [];
  
  const response= await ec2Client.send(new DescribeRouteTablesCommand(
    {
      Filters: [
        {
          Name: "vpc-id",
          Values: [ vpcId ]
        }
      ]
    }));
  
  for (var tableNo= 0; tableNo < response.RouteTables.length; tableNo++) {
    var table= response.RouteTables[tableNo];

    // Build a list of availability zones the route table applies to.  Private
    // subnets normally only apply to one zone so they can point to a local
    // NAT instance / gateway, but maybe there's a case.
    var availabilityZones= [];
    for (var associationNo= 0; associationNo < table.Associations.length; associationNo++) {
      var association= table.Associations[associationNo];
      
      if (association.SubnetId in subnets) {
        var availabilityZone= subnets[association.SubnetId].AvailabilityZone;
        
        if (!availabilityZones.includes(availabilityZone)) {
          availabilityZones.push(availabilityZone);
        }
      }
    }
    
    // If no NAT instance is available in the same availability zone, then
    // later code will select the first one.  This makes that code be
    // deterministic.
    availabilityZones.sort();

    // The default route ENI
    var defaultRoute= null;
    
    // We query for subnets by whether IPs are automatically assigned, but
    // I've seen that set both ways.  If we see an IGW by definition this is
    // a public route table.
    var isPublic= false;
    
    // If a default route exists that has something we're not built to deal
    // with like a Transit gateway, then we don't want to change this route
    // table.  We're looking for no default gateway or an ENI to a NAT
    // instance
    var isOther= false;

    // Loop through routes to find the default gateway  
    for (var routeNo= 0; routeNo < table.Routes.length; routeNo++) {
      var route= table.Routes[routeNo];
      if (route.DestinationCidrBlock === '0.0.0.0/0') {
        if (route.GatewayId) {
          isPublic= true;
        } else if (route.EgressOnlyInternetGatewayId) {
          isOther= true;
        } else if (route.NatGatewayId) {
          isOther= true;
        } else if (route.TransitGatewayId) {
          isOther= true;
        } else if (route.LocalGatewayId) {
          isOther= true;
        } else if (route.CarrierGatewayId) {
          isOther= true;
        } else if (route.VpcPeeringConnectionId) {
          isOther= true;
        } else if (route.NetworkInterfaceId) {
          defaultRoute= route.NetworkInterfaceId;
        }
      }
    }
    
    if (!isPublic && !isOther && (availabilityZones.length > 0)) {
      routeTables.push({
        RouteTableId: table.RouteTableId,
        AvailabilityZones: availabilityZones,
        NetworkInterfaceId: defaultRoute
      });
    }
    
    if (isOther) {
      // We're not expecting to find these, so notify about it
      notifyEvents.push({
        event: 'otherDefaultRoute',
        instanceId: table.RouteTableId,
        message: 'Non-handled Default Route Found'
      });
    }
  }
  
  return routeTables;
};

Putting it All Together

This function scans a VPC and adjusts it to meet our goals. We want this function to be stateless - it should only look at the current state of things, then update the configuration to match the desired state of things. It should also be idempotent, that is running it more times than needed shouldn't change the outcome.

In addition to changing the default routes, it also does a few housekeeping things that are just impractical to do from the CloudFormation template:

The first housekeeping task is disabling source/destination filtering. By default all EC2 instances are restricted from sending from IP addresses they don't own, and receiving packets sent to an IP they don't own. Normally this is a good thing because it prevents instances circumventing security groups, but for this specific function it prevents the exact functionality we're trying to create.

The second housekeeping task is assigning the correct elastic IP address. If you don't care about your outgoing IP address, you may want to remove that section of code.

// Scan a VPC to see if the nat instances are set up correctly, and fix the
// route tables if they aren't correct
export const scanVpc = async (region, vpcId, eipMap, notifyTopic) => {
  // Create client objects for EC2 and SNS
  const ec2Client= new EC2Client({ region: region });
  const snsClient= new SNSClient({ region: region });

  // A list of events to notify admins about
  var notifyEvents= [];

  // Gather information
  const natInstances= await getNatInstances(ec2Client, vpcId, notifyEvents);
  const subnets= await getSubnets(ec2Client, vpcId);
  
  const routeTables= await getRouteTables(ec2Client, vpcId, subnets, notifyEvents);

  // A list of actions to take
  var actions= [];

  // Loop through the selected NAT instances for housekeeping issues
  for (var availabilityZone in natInstances) {
    var instance= natInstances[availabilityZone];

    // Make sure that source/destination check is disabled
    if (instance.SourceDestCheck) {
      actions.push({
        event: {
          event: 'disableSourceDestCheck',
          instanceId: instance.InstanceId,
          availabilityZone: availabilityZone
        },
        
        command: new ModifyNetworkInterfaceAttributeCommand({
          NetworkInterfaceId: instance.NetworkInterfaceId,
          SourceDestCheck: {
            Value: "false"
          }
        })
      });
    }

    // If it doesn't have an EIP assigned, see if it's supposed to have one
    if (!instance.AllocationId) {
      const allocationId= eipMap[availabilityZone];
      if (allocationId) {
        actions.push({
          event: {
            event: 'associateAddress',
            instanceId: instance.InstanceId,
            allocationId: allocationId
          },
          
          command: new AssociateAddressCommand({
            AllocationId: allocationId,
            InstanceId: instance.InstanceId
          })
        });
      }
    }
  }

  // Loop through each table and look for adjustments  
  for (var tableNo= 0; tableNo < routeTables.length; tableNo++) {
    var table= routeTables[tableNo];

    // The current NAT instance
    var currentEni= table.NetworkInterfaceId;
    
    // The NAT instance we want
    var desiredEni= null;
    var desiredAvailabilityZone= null;

    // This is a weird case that probably won't show up.  If the routeing
    // table is applied to subnets in multiple security zones, and we've
    // selected a NAT instance in a zone different than the first one, we
    // want to leave it be.  There's not really a right answer and we
    // don't want it to flap back and forth
    if (currentEni) {
      for (var zoneNo= 0; zoneNo < table.AvailabilityZones.length; zoneNo++) {
        var availabilityZone= table.AvailabilityZones[zoneNo];
        
        if (availabilityZone in natInstances) {
          if (natInstances[availabilityZone].NetworkInterfaceId === currentEni) {
            desiredEni= currentEni;
            desiredAvailabilityZone= availabilityZone;
          }
        }
      }
    }
    
    // Call out cross-zone routes in the title so hopefully somebody notices
    var isCrossZone= false;
    
    if (!desiredEni) {
      // If there's nothing already selected, use the first zone.
      var availabilityZone= table.AvailabilityZones[0];
      if (availabilityZone in natInstances) {
        desiredEni= natInstances[availabilityZone].NetworkInterfaceId;
        desiredAvailabilityZone= availabilityZone;
      } else {
        // There's no NAT instance in the same availability zone, so pick one
        var keys= Object.keys(natInstances);
        if (keys.length > 0) {
          // we don't want to always pick the first one, but it needs to be
          // deterministic so we don't flap.  This uses the routing table ID
          // in decimal mod the number of options.
          var index= parseInt(table.RouteTableId.substring(4), 16) % keys.length;
          
          desiredEni= natInstances[keys[index]].NetworkInterfaceId;
          desiredAvailabilityZone= keys[index];
          
          // Flag this for attention, because this could run up inter-AZ
          // charges
          isCrossZone= true;
        }
      }
    }
    
    if (currentEni !== desiredEni) {
      if (currentEni) {
        // Remove the current default route
        actions.push({
          event: {
            event: 'removeDefaultRoute',
            routeTable: table.RouteTableId,
            sourceZones: table.AvailabilityZones
          },

          command: new DeleteRouteCommand({
            RouteTableId: table.RouteTableId,
            DestinationCidrBlock: '0.0.0.0/0'
          })
        });
      }
      
      if (desiredEni) {
        // Add the desired default route
        actions.push({
          event: {
            event: 'createDefaultRoute',
            routeTable: table.RouteTableId,
            crossZone: isCrossZone,
            sourceZones: table.AvailabilityZones,
            destinationZone: desiredAvailabilityZone
          },
          
          command: new CreateRouteCommand({
            RouteTableId: table.RouteTableId,
            DestinationCidrBlock: '0.0.0.0/0',
            NetworkInterfaceId: desiredEni
          })
        });
      }
    }
  }

  var error= false;
  // Loop through actions performing each one, but stop if there's an error
  for (var actionNo= 0; !error && (actionNo < actions.length); actionNo++) {
    var action= actions[actionNo];
    
    var event= action.event;
    
    try {
      await ec2Client.send(action.command);
      event.success= true;
    } catch (actionException) {
      event.success= false;
      event.exception= actionException;
      
      error= true;
    }
    
    notifyEvents.push(event);
  }
    
  if (notifyEvents.length > 0) {
    await snsClient.send(
      new PublishCommand({
        Subject: !error ? 'NAT Instance Updates' : 'NAT Instance Updates Failed',
        Message: JSON.stringify({
          event: 'natGatetwayUpdates',
          success: !error,
          events: notifyEvents
        }),
        TopicArn: notifyTopic
      }));
  }

  return {
    success: !error,
    events: notifyEvents
  };
};

The Lambda Entry Point

The final function is exported from the Lambda function as handler - this is where code execution starts when the lambda function is called. The lambda is normally started by an EventBridge rule, but also want to allow someone to just pick the lambda and press Run if they want to - that's very useful when debugging.

EventBridge doesn't have a way to filter out EC2 instances by tag - every time any EC2 instance changes state we're going to be called. The main function of this event is to figure out if we need to scan or not. If it seens what looks like the event signature of an EventBridge rule it looks at the tag of the instance to see if we're interested, and the signature doesn't match it assumes it's a manual run.

// Main entry routine
export const handler = async (event) => {
  var ignore= false;

  // If this is an EventBridge rule, pull the tags to see if we should
  // care about it.
  if (event.detail && event.detail['instance-id']) {
    var instanceTags= await getInstanceTags(REGION, event.detail['instance-id']);
    if (instanceTags['teaglu:app'] !== 'nat') {
      ignore= true;
    }
  }

  if (!ignore) {  
    return await scanVpc(REGION, VPC, EIPMAP, NOTIFY_TOPIC);
  } else {
    return {};
  }
};

Share and Enjoy

The template and function linked in this article are released as CC0, so use it or any parts of it however you want.

Share:

© 2024 Teaglu, LLC