Get started with AWs and python

When you start for the first (or even second) time with AWS, it is a bit tricky to get your head around all the bits and bolts than need to be connected together. If on top of this you try to work with AWS in Beijing from outside China, the web GUI makes your work even harder because of slowness or even timeouts.

This scripts set up for you a full set of resources (vpc, route table, security group, subnet, internet gateway, instance with the relevant associations and attachments) for easy testing or bootstrapping of your infrastructure.

It is mostly meant as a testing help, so it does not handle all the options possible, but I find it invaluable to get started. You just need the AWS basics:

and it will do the rest for you. You need to provide a tag name (defaults to ‘roles’) and value, and all resources will be created and located via this tag, to allow for easy spawning and tearing down.

usage: fullspawn.py3 [-h] [--tag TAG] [--up | --down] [--wet | --dry]
 [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--cidr CIDR]
 [--ami AMI] [--keypair KEYPAIR] [--profile PROFILE]
 [--instance INSTANCE]
 role

Spawns a full AWS self-contained infrastructure.

positional arguments:
 role Tag value used for marking and fetching resources.

optional arguments:
 -h, --help show this help message and exit
 --tag TAG, -t TAG Tag name used for marking and fetching resources.
 (default: roles)
 --up, -u Creates a full infra. (default: up)
 --down, -d Destroys a full infra. (default: up)
 --wet, -w Actually performs the action. (default: dry)
 --dry Only shows what would be done, not doing anything.
 (default: dry)
 --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
 Verbosity level. (default: WARNING)
 --cidr CIDR The network range for the VPC, in CIDR notation. For
 example, 10.0.0.0/16 (default: 10.0.42.0/28)
 --ami AMI The AMI id for your instance. (default: ami-33734044)
 --keypair KEYPAIR A keypair aws knows about. (default: yourkey)
 --profile PROFILE Profile to use for credentials. Will use AWS_PROFILE
 environment variable if set. (default: default)
 --instance INSTANCE Instance type. (default: t2.micro)

For instance:

# Let's see what would happen when creating a full infra...
./fullspawn.py3 -t tag --up --dry testing
# Look good let's do it.
./fullspawn.py3 -t tag --up --wet testing
# oops, this was a stupid tag name
./fullspawn.py3 -t tag --down --wet testing

You probably want to have a look at some variables inside the script, setting a few defaults which might not be relevant for you. I am thinking about the ami (AMI), the keypair (KEYPAIR) and the ingress rules (INGRESS) all defined before the argparse calls.

The code is available on github.

Enjoy!

Vagrant, hostmanager, virtualbox and aws

No need to present Vagrant if you read this post. Hostmanager is a plugin which manages /etc/hosts on the host and the guests, to have common fqdns and simplify connections to your box. You can for instance set up host manager to make sure that you web development virtual is always accessible at web.local, or make sure that all machines in your virtual cluster can talk to each other with names like dev1.example.com, dev2.example.com and so on, no matter what DHCP decided what the IP address should be.

This is all and well, but there are a few issues. For instance, with virtualbox and dhcp, hostmanager only finds out 127.0.0.1 as IP for your virtuals, which is not handy to say the least.

If you use aws, you would like the guests to use the private IP, while having the host using the public IP, which is not possible by default.

Luckily, you can write your own IP resolver. The one I give here, which can as well be found on github, solves the following issues:

  • Only for Linux guests (probably other unices as well to be honest, but I cannot guarantee that hostname -I works on all flavours)
  • dhcp and virtualbox
  • public/private IP with aws
$cached_addresses = {}
# There is a bug when using virtualbox/dhcp which makes hostmanager not find
# the proper IP, only the loop one: https://github.com/smdahlen/vagrant-hostmanager/issues/86
# The following custom resolver (for linux guests) is a good workaround.
# Furthermore it handles aws private/public IP.

# A limitation (feature?) is that hostmanager only looks at the current provider.
# This means that if you `up` an aws vm, then a virtualbox vm, all aws ips
# will disappear from your host /etc/hosts.
# To prevent this, apply this patch to your hostmanager plugin (1.6.1), probably
# at $HOME/.vagramt.d/gems/gems or (hopefully) wait for newer versions.
# https://github.com/smdahlen/vagrant-hostmanager/pull/169
$ip_resolver = proc do |vm, resolving_vm|
  # For aws, we should use private IP on the guests, public IP on the host
  if vm.provider_name == :aws
    if resolving_vm.nil?
      used_name = vm.name.to_s + '--host'
    else
      used_name = vm.name.to_s + '--guest'
    end
  else
    used_name= vm.name.to_s
  end

  if $cached_addresses[used_name].nil?
    if hostname = (vm.ssh_info && vm.ssh_info[:host])

      # getting aws guest ip *for the host*, we want the public IP in that case.
      if vm.provider_name == :aws and resolving_vm.nil?
        vm.communicate.execute('curl http://169.254.169.254/latest/meta-data/public-ipv4') do |type, pubip|
          $cached_addresses[used_name] = pubip
        end
      else

        vm.communicate.execute('uname -o') do |type, uname|
          unless uname.downcase.include?('linux')
            warn("Guest for #{vm.name} (#{vm.provider_name}) is not Linux, hostmanager might not find an IP.")
          end
        end

        vm.communicate.execute('hostname --all-ip-addresses') do |type, hostname_i|
          # much easier (but less fun) to work in ruby than sed'ing or perl'ing from shell

          allips = hostname_i.strip().split(' ')
          if vm.provider_name == :virtualbox
            # 10.0.2.15 is the default virtualbox IP in NAT mode.
            allips = allips.select { |x| x != '10.0.2.15'}
          end

          if allips.size() == 0
            warn("Trying to find out ip for #{vm.name} (#{vm.provider_name}), found none useable: #{allips}.")
          else
            if allips.size() > 1
              warn("Trying to find out ip for #{vm.name} (#{vm.provider_name}), found too many: #{allips} and I cannot choose cleverly. Will select the first one.")
            end
            $cached_addresses[used_name] = allips[0]
          end
        end
      end
    end
  end
  $cached_addresses[used_name]
end

Just put this code in your vagrantfile, outside the Vagrant.configure block, and you can use it by allocating it that way:

config.hostmanager.ip_resolver = $ip_resolver

On a side note, and as explained in the comment, there is a limitation (feature?) in that hostmanager only looks at the current provider. This means that if you up an aws vm, then a virtualbox vm, all aws ips will disappear from your host /etc/hosts.

To prevent this, apply this patch to your hostmanager plugin (1.6.1), probably
at $HOME/.vagramt.d/gems/gems or (hopefully) wait for newer versions.
The patch itself is ridiculously tiny:

diff --git a/lib/vagrant-hostmanager/hosts_file/updater.rb b/lib/vagrant-hostmanager/hosts_file/updater.rb
index 9514508..ef469bf 100644
--- a/lib/vagrant-hostmanager/hosts_file/updater.rb
+++ b/lib/vagrant-hostmanager/hosts_file/updater.rb
@@ -82,8 +82,8 @@ module VagrantPlugins
 
         def update_content(file_content, resolving_machine, include_id)
           id = include_id ? " id: #{read_or_create_id}" : ""
-          header = "## vagrant-hostmanager-start#{id}\n"
-          footer = "## vagrant-hostmanager-end\n"
+          header = "## vagrant-hostmanager-start-#{@provider}#{id}\n"
+          footer = "## vagrant-hostmanager-end-#{@provider}\n"
           body = get_machines
             .map { |machine| get_hosts_file_entry(machine, resolving_machine) }
             .join

Restart Mongodb with not enough nodes in a replica set

The context could be a virtualised cluster, where an hypervisor went suddenly down. 2 of your Mongo replicas are unavailable, only 1 is left, which then of course drops back to being secondary and read only.

You want to have this server running alone for a while while the others come back online, as you decide that it is better to have potential small inconsistency instead of not running for a few hours. The thing is that this last server will complain that the rest of the set is not available. To get it started again, you just need to make it forget about the rest of the set.

  1. Switch the service off
    service mongodb stop
  2. Remove the line replSet from your /etc/mongodb.conf
  3. Restart the service
    service mongodb start

    Mongo will complain:

    mongod started without --replSet yet 1 documents are present in local.system.replset
     [initandlisten] ** Restart with --replSet unless you are doing maintenance and no other clients are connected.
     [initandlisten] ** The TTL collection monitor will not start because of this.
     [initandlisten] ** For more info see http://dochub.mongodb.org/core/ttlcollections
  4. Remove the offending document in system.replset from the mongoshell
    // will give you one document back
    db.system.replset.find()
    // remove all documents (there is only one)
    db.system.replset.remove({})
    // check resultset is empty
    db.system.replset.find()
    
  5. Restart mongo
    service mongodb stop
    service mongodb start
  6. Once the other nodes are up, add again the replSet line in /etc/mongodb.conf and restart the service.

New puppet apt module, now with better hiera!

The puppet apt module from puppetlabs works great, but has one big issue. You can define sources (and keys, settings and ppas) from hiera, but only your most specific definition will be used by the module, as only the default priority lookup is done. This means a lot of cut & paste if you want to manage apt settings across multiple environments or server roles. This is known, but will not be fixed as it is apparently by design.

Well, this design did not fit me, so I forked the puppetlabs module, updated it to enable proper hiera_hash look up, and published it to the puppet forge. There is no more difference with the original, but it does simplify my life a lot. Now if you define multiple sources in your hierarchy, for instance at datacenter level:

apt::sources:
 'localRepo':
   location: 'https://repo.example.com/debian'
   release: '%{::lsbdistcodename}'
   repos: 'main contrib non-free'

and at server level:

apt::sources:
  'puppetlabs':
    location: 'http://apt.puppetlabs.com'
    repos: 'main'
    key:
      id: '47B320EB4C7C375AA9DAE1A01054B7A24BD6EC30'
      server: 'pgp.mit.edu'

you will nicely have both sources in sources.list.d, instead of only having the one defined at server level.

You can find the source on github, and can download the module from the puppet forge. Installing it is as simple as:

puppet module install lomignet-apt

Puppet error messages and solutions

This is a collection of error messages I got while setting up a puppet infrastructure and testing modules, as well as their reasons ans solutions.

Failed to load library ‘msgpack’

Message

On an agent:

Debug: Failed to load library 'msgpack' for feature 'msgpack'
Debug: Puppet::Network::Format[msgpack]: feature msgpack is missing

on the master:

Debug: Puppet::Network::Format[msgpack]: feature msgpack is missing
Debug: file_metadata supports formats: pson b64_zlib_yaml yaml raw

Context

This happens when you run puppet agent, for instance.

Reason

msgpack is an efficient serialisation format. Puppet uses is (experimentally) when communicating between master and agent. This format requires a gem, which if not installed will give this debug message. This is completely harmless, it just pollutes your logs.

Fix

Just install the msgpack ruby gem. Depending on your system, you can

#debian based:
apt-get install ruby-msgpack
#generic
gem install msgpack

This immediately removes the debug messages. To actually use msgpack, you need to add in the [main] or [agent] section of puppet.conf the line:

preferred_serialization_format =  msgpack

Could not retrieve information from environment

Message

Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve information from environment $yourenvironment source(s) puppet://localhost/pluginfacts

Context

Puppet agent run.

Reason

If no module has a facts.d folder, puppet will throw this error. This is an actual bug in puppet, at least version 3.7.3.

Fix

Option 1: Just discard. This is shown as an error, but has no impact and the run will carry on uninterrupted.

Option 2: actually create a facts.d folder in a module.

Could not find data item classes in any Hiera data file

Message

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item classes in any Hiera data file and no default supplied on node aws-nl-puppetmaster.dp.webpower.io

Context

Your puppet tests were all working fine from vagrant. You just installed a puppet master and the first agent run gives you this error.

Reason

Check your hiera.yaml file (in /etc/puppet/hiera.yaml, /etc/puppetlabs/puppet/hiera.yaml or pointed by hiera_config from your puppet.conf). There is a :datadir section, telling puppet where to find hiera data. If the path there is absolute, then it should directly point to the directory. If it is relative, then it works only under vagrant and is based on puppet.working_dir.

Fix

Many options are possible.

  • Use a common absolute path everywhere.
  • Put the directory, maybe via a link, in its default location.
  • Puppet can interpolate variables when reading datadir, so if your issue is due to different environments, you could use a path like
    “/var/hieradata/%{::environment}/configuration”

Note that if you change hiera.yaml, you need to reload the puppet master as hiera.yaml is only read at startup.

No such file or directory @ dir_s_rmdir

Message

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node $nodename: No such file or directory @ dir_s_rmdir - /var/puppet/hiera/node/$nodename.yaml20150812-5415-1802nxn.lock

Context

Puppet agent run

Reason

  1. Puppet master tries to create a file in a directory he does not own, and has thus no permission.
  2. Puppet tries to create a file or directory whereas the parent does not exists.
  3. The partition where puppet tries to create a lock file is full.

Fix

  1. With the path given in the example error message:
    chown -R puppet:puppet /var/puppet/hiera
  2. Make sure the parent is created in the manifest as well

This ‘if’ statement is not productive

Message

 This 'if' statement is not productive.

followed by some more explnation depending on the context.

Context

Puppet agent run

Reason

Puppet does not want to leave alone, and pretends to know better than me. I might want to have a if (false) {…} or if (condition) {empty block} for whatever reasons, but no, puppet very violently and rudely bails out. There is a bug discussion about it as well as a fix to change the wording, but the behaviour will stay.

Fix

Comment out what puppet does not like.

sslv3 alert certificate revoked or certificate verify failed

Message

SSL_connect returned=1 errno=0 state=SSLv3 read server session ticket A: sslv3 alert certificate revoked

or

Wrapped exception:
SSL_connect returned=1 errno=0 state=unknown state: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: puppetmaster.example.com]

Context

Puppet agent run

Reason

You probably revoked or clean certificates on the puppet master, but did not inform the agent about it. or maybe you are now pointing to a new puppetmaster.

Fix

You can fix this by cleaning the agent as well:

sudo rm -r /etc/puppet/ssl
sudo rm -r /var/lib/puppet/ssl

Easily simulating connection timeouts

I needed an easy way to simulate timeout when connected to a REST API. As part of the flow of an application I am working on I need to send events to our data platform, and blocking the production flow ‘just’ to send an event in case of timeout is not ideal, and I needed a way to test this.

I know there are a few options:

  • Connecting to a ‘well known’ timing out url, as google.com:81, but this is very antisocial
  • Adding my own firewall rule to DROP connection, but this is a lot of work (yes, I am very very lazy and I would need to look up the iptables syntax)
  • Connecting to a non routable IP, like 10.255.255.1 or 10.0.0.0

All those options are fine (except the first one, which although technically valid is very rude and no guaranteed to stay), but they all give indefinite non configurable timeouts.

I thus wrote a small python script, without dependencies, which just listens to a port and makes the connection wait a configurable amount of seconds before either closing the connection, either returning a valid HTTP response.

Its usage is very simple:

usage: timeout.py [-h] [--http] [--port PORT] [--timeout TIMEOUT]

Timeout Server.

optional arguments:
 -h, --help show this help message and exit
 --http, -w if true return a valid http 204 response.
 --port PORT, -p PORT Port to listen to. Default 7000.
 --timeout TIMEOUT, -t TIMEOUT
 Timeout in seconds before answering/closing. Default
 5.

For instance, to wait 2 seconds before giving an http answer:

./timeout.py -w -t2

Would give you following output if a client connects to it:

./timeout.py -w -t2
Listening, waiting for connection...
Connected! Timing out after 2 seconds...
Processing complete.
Returning http 204 response.
Closing connection.

Listening, waiting for connection...

This is the full script, which you can find on github as well:

#!/usr/bin/env python
import argparse
import socket
import time


# Make the TimeoutServer a bit more user friendly by giving 3 options:
# --http/-w to return a valid http response
# --port/-p to define the port to listen to (7000)
# --timeout/-t to define the timeout delay (5)

parser = argparse.ArgumentParser(description='Timeout Server.')
parser.add_argument('--http', '-w', default=False, dest='http', action='store_true',
                    help='if true return a valid http 204 response.')
parser.add_argument('--port', '-p', type=int, default=7000, dest='port',
                    help='Port to listen to. Default 7000.')
parser.add_argument('--timeout', '-t', type=int, default=5, dest='timeout',
                    help='Timeout in seconds before answering/closing. Default 5.')
args = parser.parse_args()


# Creates a standard socket and listen to incoming connections
# See https://docs.python.org/2/howto/sockets.html for more info
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('127.0.0.1', args.port))
s.listen(5)  # See doc for the explanation of 5. This is a usual value.

while True:
    print("Listening, waiting for connection...")
    (clientsocket, address) = s.accept()
    print("Connected! Timing out after {} seconds...".format(args.timeout))
    time.sleep(args.timeout)
    print('Processing complete.')

    if args.http:
        print("Returning http 204 response.")
        clientsocket.send(
            'HTTP/1.1 204 OK\n'
            #'Date: {0}\n'.format(time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime())
            'Server: Timeout-Server\n'
            'Connection: close\n\n'  # signals no more data to be sent)
        )

    print("Closing connection.\n")
    clientsocket.close()

Puppet and virtual resources tutorial to manage user accounts

Virtual resources are a very powerful and not well understood feature of puppet. I will here explain what they are and why there are useful, using as example the management of users in puppet.

By default, in puppet, a resource may be specified only once. The typical example when this can be hurtful is when a user needs to be created on for instance the database and web servers. This user can be only defined once, not once in the database class and once in the webserver class.

If you were to define this user as a virtual resource, then you can define them in multiple places without issue. The caveat is that as the name suggests this user is virtual only, and is not actually created on the server. Some extra work is needed to create (realize in puppet-speak) the user.

Data structure and definitions

Jump to the next section if you directly want to go to the meat of the post. I still want to detail the data structure for better visualisation.

The full example can be found on github. The goal is to be able to define users with the following criteria and assumptions:

  • User definition is centralised in one place (typically common.yaml). A defined user on hiera does not mean that they are created on any server, it must be explicitly required.
  • A user might be ‘normal’ or have sudo rights. Sudo rights mean that they can do whatever they wishes, passwordless. There is no finer granularity.
  • A user might be normal on a server, sudo on another one, absent on others. This can be defined anywhere in the hiera hierarchy.

As good practice, all can be done via hiera. A user can be defined so, with simple basic properties:

accounts::config::users:
  name:
    # List of roles the user belongs to. Not necessarily matched to linux groups
    # They will be used in user::config::{normal,super} in node yaml files to
    # decide which users are present on a server, and which ones have sudo allowed.
    # Note that all users are part of 'all' groups
    roles: ['warrior', 'priest', 'orc']
    # default: bash
    shell: "/bin/zsh"
    # already hashed password.
    # https://thisdataguy.com/2014/06/10/understand-and-generate-unix-passwords
    # python -c 'import crypt; print crypt.crypt("passwerd", "$6$some_random_salt")'
    # empty/absent means no login via password allowed (other means possible)
    pass: '$6$pepper$P9Wt3.3Uqh9UZbvz5/6UPtHqa4KE/2aeyeXbKm0mpv36Z5aCBv0OQEZ1e.aKcPR6RBYvQIa/ToAfdUX6HjEOL1'
    # A PUBLIC rsa key.
    # Empty/absent means not key login allowed (other means possible)
    sshkey: 'a valid public ssh key string'

Roles here have no direct Linux counterpart, they have nothing to do with linux groups.
They are only an easy way to manage users inside hiera. You can for instance say
that all system administrators belong to the role sysops, and grant sudo to the sysops group everywhere in one go.

Roles can be added at will, and are just a string tag. Role names will be used later to actually select and create users.

To then actually have users created on a server, roles must be added to 2 specific configuration arrays, depending if a role must have sudo rights or not.  Note that all values added to these arrays are merged along the hierarchy, meaning that you can add users to specific servers in the node definition.

For instance, if in common.yaml we have:

accounts::config::sudo: ['sysadmin']
accounts::config::normal: ['data']

and in a specific node definition (say a mongo server)  we have:

accounts::config::sudo: ['data']
accounts::config::normal: ['deployer']

– all sysadmin users will be everywhere, with sudo
– all data users will be everywhere, without sudo
– all data users will have the extra sudo rights on the mongo server
– all deployer users will be on the mongo server only, without sudo

Very well, but to the point please!

So, why do we have a problem that cannot be resolved by usual resources?

  • I want the user definition to be done in one place (ie. one class) only
  • I would like to avoid manipulate data outside puppet (not in a ruby library)
  • If a user ends up being normal and sudo in a server, declaring them twice will not be possible

How does this work?

Look at the normal.pp manifest, Unfortunately, the sudo.pp manifest duplicates it almost exactly. The reasons is ordering and duplication of definition of the roles resource. This is a detail.

Looking at the file, here are the interesting parts. First accounts::normal::virtual

class accounts::normal { 
  ...
  define virtual() {...}
  create_resources('@accounts::normal::virtual', $users)
  ...
}

This defines a virtual resource (note the @ in front of the resource name on the create_resources line), which is called for each and every element of $users. Note that as it is a virtual resource, users will not actually be created (yet).

The second parameter to create_resources() needs to be a hash. Keys will be resource titles, attributes will be resource parameters. Luckily, this is exactly how we defined users in hiera!

This resource actually does not do much, it just calls the actual user creating resource, called Accounts::VirtualAccounts::Virtual is a virtual resource, used as you would call any other puppet resource:

resource_name {title: attributes_key => attribute_value}

This is how the resource is realised. As said above, creating a virtual resource (virtual users in our case) does not automatically create the user. By calling it directly, the user is finally created:

accounts::virtual{$title:
 pass   => $pass,
 shell  => $shell,
 sshkey => $sshkey,
 sudo   => false
}

Note the conditional statement just before:

unless defined (Accounts::Virtual[$title]) { ... }

In my design, there is no specific sudoer resource. The sudoer file is managed as part as the user resource. This means that if a user is found twice, once as normal and once as sudo, the same user resource could be declared twice. As the sudo users are managed before the normal users, we can check if the user has already been defined. If that’s the case, the resource will not be called a second time.

This is all and well, but how is the accounts::normal::virtual resource called? Via another resource, of course! This is what roles (accounts::normal::roles) does:

define roles($type) { ... }
create_resources('accounts::normal::roles', $normal)

Notice the difference in create_resources? There is no @ prefix in the resource name. This means that this resource is directly called with $normal as parameter, and is not virtual.

Note the $normal parameter. It is just some fudge to translate an array (list of role to create as normal user) to a hash, which is what create_resources() requires.

Inside account::normal::roles, we found the nicely named spaceship operator. Its role will be to realise a bunch resources, but only a subset of them. You can indeed give a filter parameter. In our case (forgetting the ‘all’ conditional, which is just fudging to handle a non explicit group), you can see its use to filter on roles:

 Accounts::Normal::Virtual <| roles == $title |>

What this says is simply that we realise the resources Accounts::Normal::Virtual, but only for users having the value $title in their roles array.

To sum up, here is what happened in pseudo code

  • for each role as $role (done directly in a class)
    • for each user as $user (done in the role resource)
      • apply the resource virtual user (done in the virtual user resource)

Easy, no?