--no options with argparse and python

Ruby has this very nice feature when you define options with optparse:

opts.on('--[no-]flag', "Set flag.") do |p|
    options.persistPost=p
end

which allows you to have the --flag and --no-flag options for free. Python does not have this, but there are a 3 options to go around that.

The verbose way

Just define 2 options.

  parser.add_argument(
    '--flag',
    dest='flag',
    action='store_true',
    help='Set flag',
  )
  parser.add_argument(
    '--no-flag',
    dest='flag',
    action='store_false',
    help='Unset flag',
  )

Custom action

You can give a custom action to the action parameter of add_argument. This custom action can look at the actual option given and act accordingly.

  parser.add_argument(
    '--flag', '--no-flag',
    dest='flag',
    action=BooleanAction,
    help='Set flag',
  )

BooleanAction is just a tiny 6 lines class, defined as follow:

class BooleanAction(argparse.Action):
    def __init__(self, option_strings, dest, nargs=None, **kwargs):
        super(BooleanAction, self).__init__(option_strings, dest, nargs=0, **kwargs)

    def __call__(self, parser, namespace, values, option_string=None):
        setattr(namespace, self.dest, False if option_string.startswith('--no') else True)

As you can see, it just looks at the name of the flag, and if it starts with --no, the destination will be set to False.

Custom parser

Create your own add_argument method, which can then automagically add the --no option for you.
First define your own parser:

class BoolArgParse(argparse.ArgumentParser):
    def add_bool_arguments(self, *args, **kw):
        grp = self.add_mutually_exclusive_group()
        # add --flag
        grp.add_argument(*args, action='store_true', **kw)
        nohelp = 'no ' + kw['help']
        del kw['help']
        # add --no-flag
        grp.add_argument('--no-' + args[0][2:], *args[1:], action='store_false', help=nohelp, **kw)

Then use it:

parser = BoolArgParse()
parser.add_bool_arguments('--flag',dest='flag', help='set flag.')

Comparison

I do not want to say plus and min points as not all use cases want the same features, but there you are:

  • Verbose way:
    • More lines of code (need to define 2 flags),
    • Help more verbose,
    • Easy (no extra class),
    • Possibility to have the same parameter multiple times, the last one wins (eg. --flag --no-flag).
  • Custom action:
    • Less lines of code,
    • Help not verbose (only one line of help),
    • Possibility to have the same parameter multiple times, the last one wins (eg. --flag --no-flag).
  • Custom parser
    • The most lines of codes,
    • Help verbose but grouped,
    • Cannot have the same flag repeated.

Get started with AWs and python

When you start for the first (or even second) time with AWS, it is a bit tricky to get your head around all the bits and bolts than need to be connected together. If on top of this you try to work with AWS in Beijing from outside China, the web GUI makes your work even harder because of slowness or even timeouts.

This scripts set up for you a full set of resources (vpc, route table, security group, subnet, internet gateway, instance with the relevant associations and attachments) for easy testing or bootstrapping of your infrastructure.

It is mostly meant as a testing help, so it does not handle all the options possible, but I find it invaluable to get started. You just need the AWS basics:

and it will do the rest for you. You need to provide a tag name (defaults to ‘roles’) and value, and all resources will be created and located via this tag, to allow for easy spawning and tearing down.

usage: fullspawn.py3 [-h] [--tag TAG] [--up | --down] [--wet | --dry]
 [--log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--cidr CIDR]
 [--ami AMI] [--keypair KEYPAIR] [--profile PROFILE]
 [--instance INSTANCE]
 role

Spawns a full AWS self-contained infrastructure.

positional arguments:
 role Tag value used for marking and fetching resources.

optional arguments:
 -h, --help show this help message and exit
 --tag TAG, -t TAG Tag name used for marking and fetching resources.
 (default: roles)
 --up, -u Creates a full infra. (default: up)
 --down, -d Destroys a full infra. (default: up)
 --wet, -w Actually performs the action. (default: dry)
 --dry Only shows what would be done, not doing anything.
 (default: dry)
 --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
 Verbosity level. (default: WARNING)
 --cidr CIDR The network range for the VPC, in CIDR notation. For
 example, 10.0.0.0/16 (default: 10.0.42.0/28)
 --ami AMI The AMI id for your instance. (default: ami-33734044)
 --keypair KEYPAIR A keypair aws knows about. (default: yourkey)
 --profile PROFILE Profile to use for credentials. Will use AWS_PROFILE
 environment variable if set. (default: default)
 --instance INSTANCE Instance type. (default: t2.micro)

For instance:

# Let's see what would happen when creating a full infra...
./fullspawn.py3 -t tag --up --dry testing
# Look good let's do it.
./fullspawn.py3 -t tag --up --wet testing
# oops, this was a stupid tag name
./fullspawn.py3 -t tag --down --wet testing

You probably want to have a look at some variables inside the script, setting a few defaults which might not be relevant for you. I am thinking about the ami (AMI), the keypair (KEYPAIR) and the ingress rules (INGRESS) all defined before the argparse calls.

The code is available on github.

Enjoy!

Vagrant, hostmanager, virtualbox and aws

No need to present Vagrant if you read this post. Hostmanager is a plugin which manages /etc/hosts on the host and the guests, to have common fqdns and simplify connections to your box. You can for instance set up host manager to make sure that you web development virtual is always accessible at web.local, or make sure that all machines in your virtual cluster can talk to each other with names like dev1.example.com, dev2.example.com and so on, no matter what DHCP decided what the IP address should be.

This is all and well, but there are a few issues. For instance, with virtualbox and dhcp, hostmanager only finds out 127.0.0.1 as IP for your virtuals, which is not handy to say the least.

If you use aws, you would like the guests to use the private IP, while having the host using the public IP, which is not possible by default.

Luckily, you can write your own IP resolver. The one I give here, which can as well be found on github, solves the following issues:

  • Only for Linux guests (probably other unices as well to be honest, but I cannot guarantee that hostname -I works on all flavours)
  • dhcp and virtualbox
  • public/private IP with aws
$cached_addresses = {}
# There is a bug when using virtualbox/dhcp which makes hostmanager not find
# the proper IP, only the loop one: https://github.com/smdahlen/vagrant-hostmanager/issues/86
# The following custom resolver (for linux guests) is a good workaround.
# Furthermore it handles aws private/public IP.

# A limitation (feature?) is that hostmanager only looks at the current provider.
# This means that if you `up` an aws vm, then a virtualbox vm, all aws ips
# will disappear from your host /etc/hosts.
# To prevent this, apply this patch to your hostmanager plugin (1.6.1), probably
# at $HOME/.vagramt.d/gems/gems or (hopefully) wait for newer versions.
# https://github.com/smdahlen/vagrant-hostmanager/pull/169
$ip_resolver = proc do |vm, resolving_vm|
  # For aws, we should use private IP on the guests, public IP on the host
  if vm.provider_name == :aws
    if resolving_vm.nil?
      used_name = vm.name.to_s + '--host'
    else
      used_name = vm.name.to_s + '--guest'
    end
  else
    used_name= vm.name.to_s
  end

  if $cached_addresses[used_name].nil?
    if hostname = (vm.ssh_info && vm.ssh_info[:host])

      # getting aws guest ip *for the host*, we want the public IP in that case.
      if vm.provider_name == :aws and resolving_vm.nil?
        vm.communicate.execute('curl http://169.254.169.254/latest/meta-data/public-ipv4') do |type, pubip|
          $cached_addresses[used_name] = pubip
        end
      else

        vm.communicate.execute('uname -o') do |type, uname|
          unless uname.downcase.include?('linux')
            warn("Guest for #{vm.name} (#{vm.provider_name}) is not Linux, hostmanager might not find an IP.")
          end
        end

        vm.communicate.execute('hostname --all-ip-addresses') do |type, hostname_i|
          # much easier (but less fun) to work in ruby than sed'ing or perl'ing from shell

          allips = hostname_i.strip().split(' ')
          if vm.provider_name == :virtualbox
            # 10.0.2.15 is the default virtualbox IP in NAT mode.
            allips = allips.select { |x| x != '10.0.2.15'}
          end

          if allips.size() == 0
            warn("Trying to find out ip for #{vm.name} (#{vm.provider_name}), found none useable: #{allips}.")
          else
            if allips.size() > 1
              warn("Trying to find out ip for #{vm.name} (#{vm.provider_name}), found too many: #{allips} and I cannot choose cleverly. Will select the first one.")
            end
            $cached_addresses[used_name] = allips[0]
          end
        end
      end
    end
  end
  $cached_addresses[used_name]
end

Just put this code in your vagrantfile, outside the Vagrant.configure block, and you can use it by allocating it that way:

config.hostmanager.ip_resolver = $ip_resolver

On a side note, and as explained in the comment, there is a limitation (feature?) in that hostmanager only looks at the current provider. This means that if you up an aws vm, then a virtualbox vm, all aws ips will disappear from your host /etc/hosts.

To prevent this, apply this patch to your hostmanager plugin (1.6.1), probably
at $HOME/.vagramt.d/gems/gems or (hopefully) wait for newer versions.
The patch itself is ridiculously tiny:

diff --git a/lib/vagrant-hostmanager/hosts_file/updater.rb b/lib/vagrant-hostmanager/hosts_file/updater.rb
index 9514508..ef469bf 100644
--- a/lib/vagrant-hostmanager/hosts_file/updater.rb
+++ b/lib/vagrant-hostmanager/hosts_file/updater.rb
@@ -82,8 +82,8 @@ module VagrantPlugins
 
         def update_content(file_content, resolving_machine, include_id)
           id = include_id ? " id: #{read_or_create_id}" : ""
-          header = "## vagrant-hostmanager-start#{id}\n"
-          footer = "## vagrant-hostmanager-end\n"
+          header = "## vagrant-hostmanager-start-#{@provider}#{id}\n"
+          footer = "## vagrant-hostmanager-end-#{@provider}\n"
           body = get_machines
             .map { |machine| get_hosts_file_entry(machine, resolving_machine) }
             .join

Restart Mongodb with not enough nodes in a replica set

The context could be a virtualised cluster, where an hypervisor went suddenly down. 2 of your Mongo replicas are unavailable, only 1 is left, which then of course drops back to being secondary and read only.

You want to have this server running alone for a while while the others come back online, as you decide that it is better to have potential small inconsistency instead of not running for a few hours. The thing is that this last server will complain that the rest of the set is not available. To get it started again, you just need to make it forget about the rest of the set.

  1. Switch the service off
    service mongodb stop
  2. Remove the line replSet from your /etc/mongodb.conf
  3. Restart the service
    service mongodb start

    Mongo will complain:

    mongod started without --replSet yet 1 documents are present in local.system.replset
     [initandlisten] ** Restart with --replSet unless you are doing maintenance and no other clients are connected.
     [initandlisten] ** The TTL collection monitor will not start because of this.
     [initandlisten] ** For more info see http://dochub.mongodb.org/core/ttlcollections
  4. Remove the offending document in system.replset from the mongoshell
    // will give you one document back
    db.system.replset.find()
    // remove all documents (there is only one)
    db.system.replset.remove({})
    // check resultset is empty
    db.system.replset.find()
    
  5. Restart mongo
    service mongodb stop
    service mongodb start
  6. Once the other nodes are up, add again the replSet line in /etc/mongodb.conf and restart the service.

New puppet apt module, now with better hiera!

The puppet apt module from puppetlabs works great, but has one big issue. You can define sources (and keys, settings and ppas) from hiera, but only your most specific definition will be used by the module, as only the default priority lookup is done. This means a lot of cut & paste if you want to manage apt settings across multiple environments or server roles. This is known, but will not be fixed as it is apparently by design.

Well, this design did not fit me, so I forked the puppetlabs module, updated it to enable proper hiera_hash look up, and published it to the puppet forge. There is no more difference with the original, but it does simplify my life a lot. Now if you define multiple sources in your hierarchy, for instance at datacenter level:

apt::sources:
 'localRepo':
   location: 'https://repo.example.com/debian'
   release: '%{::lsbdistcodename}'
   repos: 'main contrib non-free'

and at server level:

apt::sources:
  'puppetlabs':
    location: 'http://apt.puppetlabs.com'
    repos: 'main'
    key:
      id: '47B320EB4C7C375AA9DAE1A01054B7A24BD6EC30'
      server: 'pgp.mit.edu'

you will nicely have both sources in sources.list.d, instead of only having the one defined at server level.

You can find the source on github, and can download the module from the puppet forge. Installing it is as simple as:

puppet module install lomignet-apt

Puppet error messages and solutions

This is a collection of error messages I got while setting up a puppet infrastructure and testing modules, as well as their reasons ans solutions.

Failed to load library ‘msgpack’

Message

On an agent:

Debug: Failed to load library 'msgpack' for feature 'msgpack'
Debug: Puppet::Network::Format[msgpack]: feature msgpack is missing

on the master:

Debug: Puppet::Network::Format[msgpack]: feature msgpack is missing
Debug: file_metadata supports formats: pson b64_zlib_yaml yaml raw

Context

This happens when you run puppet agent, for instance.

Reason

msgpack is an efficient serialisation format. Puppet uses is (experimentally) when communicating between master and agent. This format requires a gem, which if not installed will give this debug message. This is completely harmless, it just pollutes your logs.

Fix

Just install the msgpack ruby gem. Depending on your system, you can

#debian based:
apt-get install ruby-msgpack
#generic
gem install msgpack

This immediately removes the debug messages. To actually use msgpack, you need to add in the [main] or [agent] section of puppet.conf the line:

preferred_serialization_format =  msgpack

Could not retrieve information from environment

Message

Error: /File[/var/lib/puppet/facts.d]: Could not evaluate: Could not retrieve information from environment $yourenvironment source(s) puppet://localhost/pluginfacts

Context

Puppet agent run.

Reason

If no module has a facts.d folder, puppet will throw this error. This is an actual bug in puppet, at least version 3.7.3.

Fix

Option 1: Just discard. This is shown as an error, but has no impact and the run will carry on uninterrupted.

Option 2: actually create a facts.d folder in a module.

Could not find data item classes in any Hiera data file

Message

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item classes in any Hiera data file and no default supplied on node aws-nl-puppetmaster.dp.webpower.io

Context

Your puppet tests were all working fine from vagrant. You just installed a puppet master and the first agent run gives you this error.

Reason

Check your hiera.yaml file (in /etc/puppet/hiera.yaml, /etc/puppetlabs/puppet/hiera.yaml or pointed by hiera_config from your puppet.conf). There is a :datadir section, telling puppet where to find hiera data. If the path there is absolute, then it should directly point to the directory. If it is relative, then it works only under vagrant and is based on puppet.working_dir.

Fix

Many options are possible.

  • Use a common absolute path everywhere.
  • Put the directory, maybe via a link, in its default location.
  • Puppet can interpolate variables when reading datadir, so if your issue is due to different environments, you could use a path like
    “/var/hieradata/%{::environment}/configuration”

Note that if you change hiera.yaml, you need to reload the puppet master as hiera.yaml is only read at startup.

No such file or directory @ dir_s_rmdir

Message

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node $nodename: No such file or directory @ dir_s_rmdir - /var/puppet/hiera/node/$nodename.yaml20150812-5415-1802nxn.lock

Context

Puppet agent run

Reason

  1. Puppet master tries to create a file in a directory he does not own, and has thus no permission.
  2. Puppet tries to create a file or directory whereas the parent does not exists.
  3. The partition where puppet tries to create a lock file is full.

Fix

  1. With the path given in the example error message:
    chown -R puppet:puppet /var/puppet/hiera
  2. Make sure the parent is created in the manifest as well

This ‘if’ statement is not productive

Message

 This 'if' statement is not productive.

followed by some more explnation depending on the context.

Context

Puppet agent run

Reason

Puppet does not want to leave alone, and pretends to know better than me. I might want to have a if (false) {…} or if (condition) {empty block} for whatever reasons, but no, puppet very violently and rudely bails out. There is a bug discussion about it as well as a fix to change the wording, but the behaviour will stay.

Fix

Comment out what puppet does not like.

sslv3 alert certificate revoked or certificate verify failed

Message

SSL_connect returned=1 errno=0 state=SSLv3 read server session ticket A: sslv3 alert certificate revoked

or

Wrapped exception:
SSL_connect returned=1 errno=0 state=unknown state: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: puppetmaster.example.com]

Context

Puppet agent run

Reason

You probably revoked or clean certificates on the puppet master, but did not inform the agent about it. or maybe you are now pointing to a new puppetmaster.

Fix

You can fix this by cleaning the agent as well:

sudo rm -r /etc/puppet/ssl
sudo rm -r /var/lib/puppet/ssl

Easily simulating connection timeouts

I needed an easy way to simulate timeout when connected to a REST API. As part of the flow of an application I am working on I need to send events to our data platform, and blocking the production flow ‘just’ to send an event in case of timeout is not ideal, and I needed a way to test this.

I know there are a few options:

  • Connecting to a ‘well known’ timing out url, as google.com:81, but this is very antisocial
  • Adding my own firewall rule to DROP connection, but this is a lot of work (yes, I am very very lazy and I would need to look up the iptables syntax)
  • Connecting to a non routable IP, like 10.255.255.1 or 10.0.0.0

All those options are fine (except the first one, which although technically valid is very rude and no guaranteed to stay), but they all give indefinite non configurable timeouts.

I thus wrote a small python script, without dependencies, which just listens to a port and makes the connection wait a configurable amount of seconds before either closing the connection, either returning a valid HTTP response.

Its usage is very simple:

usage: timeout.py [-h] [--http] [--port PORT] [--timeout TIMEOUT]

Timeout Server.

optional arguments:
 -h, --help show this help message and exit
 --http, -w if true return a valid http 204 response.
 --port PORT, -p PORT Port to listen to. Default 7000.
 --timeout TIMEOUT, -t TIMEOUT
 Timeout in seconds before answering/closing. Default
 5.

For instance, to wait 2 seconds before giving an http answer:

./timeout.py -w -t2

Would give you following output if a client connects to it:

./timeout.py -w -t2
Listening, waiting for connection...
Connected! Timing out after 2 seconds...
Processing complete.
Returning http 204 response.
Closing connection.

Listening, waiting for connection...

This is the full script, which you can find on github as well:

#!/usr/bin/env python
import argparse
import socket
import time


# Make the TimeoutServer a bit more user friendly by giving 3 options:
# --http/-w to return a valid http response
# --port/-p to define the port to listen to (7000)
# --timeout/-t to define the timeout delay (5)

parser = argparse.ArgumentParser(description='Timeout Server.')
parser.add_argument('--http', '-w', default=False, dest='http', action='store_true',
                    help='if true return a valid http 204 response.')
parser.add_argument('--port', '-p', type=int, default=7000, dest='port',
                    help='Port to listen to. Default 7000.')
parser.add_argument('--timeout', '-t', type=int, default=5, dest='timeout',
                    help='Timeout in seconds before answering/closing. Default 5.')
args = parser.parse_args()


# Creates a standard socket and listen to incoming connections
# See https://docs.python.org/2/howto/sockets.html for more info
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('127.0.0.1', args.port))
s.listen(5)  # See doc for the explanation of 5. This is a usual value.

while True:
    print("Listening, waiting for connection...")
    (clientsocket, address) = s.accept()
    print("Connected! Timing out after {} seconds...".format(args.timeout))
    time.sleep(args.timeout)
    print('Processing complete.')

    if args.http:
        print("Returning http 204 response.")
        clientsocket.send(
            'HTTP/1.1 204 OK\n'
            #'Date: {0}\n'.format(time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime())
            'Server: Timeout-Server\n'
            'Connection: close\n\n'  # signals no more data to be sent)
        )

    print("Closing connection.\n")
    clientsocket.close()