Today, I was launching a CloudFormation template which contained an EBS volume which I wanted to be encrypted with an AWS Key Management Service (KMS) Customer Managed Key (CMK).
The resource in the template looked like this:
"EncryptedVolume": {
"Type": "AWS::EC2::Volume",
"DeletionPolicy": "Snapshot",
"Properties": {
"AvailabilityZone": "eu-west-1a",
"Encrypted": true,
"KmsKeyId": "arn:aws:kms:eu-west-1:123456780123:key/blah-blah",
"Size": 64,
"VolumeType": "gp2"
}
}
I created the stack containing this resource and waited for the resource to create.
It seemed to take a while. And when CloudFormation resource creations take a long time, you can bet your bottom dollar that it's not a good thing. Half-an-hour later, my stack entered the ROLLBACK_COMPLETE state. My only hints:
- EncryptedVolume: Volume vol-12345678abcd1234 is still creating
- CREATE_FAILED: Resource creation cancelled
- The following resource(s) failed to create: EncryptedVolume. Rollback requested by user.
What happened? It wasn't clear. Checking CloudTrail showed that the ec2:CreateVolume call definitely fired from CloudFormation. I even had a volume ID returned. But checking the EC2 -> Volumes console for that ID showed up nothing. It's as if it didn't exist.
I've experienced in the past that AWS services sometimes act really weird if they're asked to use a KMS key that they don't have permission to. For example, if you ask an AutoScaling Group to launch volumes with boot volumes encrypted by an AWS KMS CMK, you're gonna have a bad time unless you configure your key policy properly. But this at least is relatively well documented through troubleshooting documentation that you can find when searching for the instance launch failure.
This time around, I didn't even have an error to work with. But while searching for something completely unrelated (volume deletion policies, actually), I happily and quite accidentally found some relevant looking information in the AWS Cloud Development Kit (CDK) EC2 Volume docs.
Turns out that for EC2 to create an EBS Volume encrypted with a CMK, the principal creating the volume has to have permission to call the following policy actions (note the conditions required to lock these permissions down using the principle of least privilege):
{
...
"Action": [
"kms:DescribeKey",
"kms:GenerateDataKeyWithoutPlainText",
],
"Condition": {
"StringEquals": {
"kms:ViaService": "ec2.<aws-region>.amazonaws.com",
"kms:CallerAccount": "<aws-account-id>"
}
}
...
}
I definitely had the IAM permissions in place to do this, as I was running as a user with the AdministratorAccess policy attached. However, I remembered that in order to allow IAM to work on a KMS CMK, the CMK's key policy had to be configured to grant those actions to the root principal of the account that the key was created in.
However, I'd created my KMS CMK via the AWS Cloud Development Kit (CDK) which should contain a sensible, out-of-the-box key policy to allow the key to be used by resources via IAM.
I checked the key policy of my KMS CMK that I had created. It turns out CDK had created the following key policy:
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Action": [
"kms:Create*",
"kms:Describe*",
"kms:Enable*",
"kms:List*",
"kms:Put*",
"kms:Update*",
"kms:Revoke*",
"kms:Disable*",
"kms:Get*",
"kms:Delete*",
"kms:ScheduleKeyDeletion",
"kms:CancelKeyDeletion",
"kms:GenerateDataKey",
"kms:TagResource",
"kms:UntagResource"
],
"Resource": "*"
}
Sure enough, we were missing the GenerateDataKeyWithoutPlainText policy action in the key policy. I followed the documentation I had found to add the relevant policy action (with the correct conditions, as written earlier in this post). After doing that, recreating the stack yielded me with a freshly-minted, encrypted volume. Problem solved.
I'm hoping that in the future, AWS can implement faster failing (without being stuck in CREATE_IN_PROGRESS for half-an-hour) and better error reporting (even a searchable string!) for this type of error.