AWS Cloud-Init lab

Cloud-init in AWS

This explores how metadata and cloud-init user-data work in AWS.

Work in your normal groups on your groupXY-web virtual machine - but everyone in the group can do these exercises at the same time if you like.

Log in to your AWS instance

Get a console onto your AWS virtual machine and login in so that you have a shell like this:

ubuntu@ip-10-30-0-91:~$

Examine metadata

Enter the following command:

curl -v http://169.254.169.254/

The response should include:

HTTP/1.1 401 Unauthorized

This means you first need to get a token. Enter these commands:

TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

echo $TOKEN

You should have fetched a long random string. This is your token that will be used to authenticate future accesses.

Now you can repeat the query, but supplying the token in a special header:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/

The response should include a list of API versions, starting like this:

1.0
2007-01-19
2007-03-01
2007-08-29
...
latest

We’ll use “latest” to get the most up-to-date API.

See what metadata is available:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/

The response should start like this:

ami-id
ami-launch-index
ami-manifest-path
...

These are the different types of metadata available. You can for example find the hostname for this instance:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/hostname; echo

(The echo on the end adds a line break - otherwise the response and the next shell prompt are on the same line)

Cloud-init collects together this metadata and turns it into structured “instance-data”. You can read it like this:

cat /run/cloud-init/instance-data.json

And there is a tool to extract individual items, e.g.

cloud-init query v1.cloud_name
cloud-init query v1.region

Cloud-init user-data

Cloud-init user-data is retrieved from a specific URL:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/user-data

However, when you run this, the response you are likely to get is “404 - Not Found”. This is because you haven’t set any user-data on this instance.

Find your instance in the AWS console, select it, then select Actions > Instance Settings > Edit user data

If your instance is running it will tell you you can’t edit it while it’s running, so shut it down (Instance state > stop instance). Then go back to editing user data.

Create some new user data in the box marked “Modify user data as text”

#cloud-config
packages:
- apache2

Then click Save, and restart your instance, and connect to its console.

Try again to fetch user-data:

TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/user-data; echo

You should see the user-data that you created. You can also check whether the task of installing apache2 at boot was performed:

cloud-init status --wait
grep apache2 /var/log/cloud-init.log

You should find a copy of the user-data here:

cat /var/lib/cloud/instance/user-data.txt

OPTIONAL: SSH keys via instance metadata

The AWS console “instance connect” allows you to login to your instance without supplying any password or SSH key. How does it do it?

There is some magic involved, which you don’t need to understand, but it’s interesting to see.

AWS holds a private key pair for instance connect (which it never exposes to you). The public key is exposed at a specific URL as part of instance metadata. For example, for the “ubuntu” user it’s:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/managed-ssh-keys/active-keys/ubuntu/

(if that doesn’t work, fetch a TOKEN as before and try again)

The magic is how AWS has customized the VM image that you booted from to trust this key and allow logins.

Look at how sshd is running:

ps auxwww | grep ssh

You should see that it has been invoked with some extra flags:

# The output from "ps" should include the following on one long line:
/usr/sbin/sshd -D \
  -o AuthorizedKeysCommand /usr/share/ec2-instance-connect/eic_run_authorized_keys %u %f
  -o AuthorizedKeysCommandUser ec2-instance-connect

These come from an override file:

cat /lib/systemd/system/ssh.service.d/ec2-instance-connect.conf 

What this means is, as well as looking for public keys in the usual places like /home/ubuntu/.ssh/authorized_keys, sshd will also run that script at login time to pick up additional authorized keys.

You can inspect that script. It in turn runs /usr/share/ec2-instance-connect/eic_curl_authorized_keys which is a longer script, but ultimately just does a curl command like the one you ran above.

As a result, EC2 can control which public keys are allowed to login to the instance, and change them frequently for security reasons, without having to upload any files to your instance.

OPTIONAL: role-based tokens via instance metadata

The metadata mechanism is also used to provide tokens to EC2 instances, so you can now understand how the “aws” CLI tool picked up its token. (Note that this is not part of cloud-init)

List the role:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/; echo

You should see the role assigned to your instance, e.g. groupXY-awscli

You can fetch a token by naming the role at the end of the URL path:

curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/groupXY-awscli

The returned, time-limited token can then be used in future API calls to the AWS public API.

The logic to fetch tokens this way is built in to the botocore python library which the aws cli uses. Any applications you write would typically use a library like this too.

References