Pentaho kettle: how to remotely execute a job with a file repository

Pentaho/kettle background

Kettle (now known as pdi) is a great ETL tool, opensource with a paid enterprise edition if you need extra support or plugin.

One great feature is the ability to remotely execute one of your jobs for testing, without having to deploy anything. This is done via the carte server (part of pdi), which basically is a service listening on a port to which you send your jobs.

Carte background

Carte works very well when you are using a database repository, but you will run into issues when you use a file repository. The reason is that when you run a job remotely, kettle needs to bundle all the relevant jobs and transformation to send them over. This is not always possible, an obvious example is if some job names are parametrised.

There is still a way to deal with this. Carte’s behaviour is to use the jobs/transformations sent by kettle, or to use the one it can find locally if the repository names match.

The solution

The solution is then quite logical: copy over your current repository to the carte server, set it up with the same name as your local repository and you are good to go.

This is a bit painful to do manually, so I give here the job I wrote to do that automatically from inside pentaho. There are not  a lot of assumptions done, except that you can copy file to your carte server with scp (you thus need ssh access).

The flow is as follow:

  1. Delete existing compressed local repository if any
  2. Compress local repository
  3. Delete remote compressed repository if any
  4. Copy over compressed local repository
  5. Uncompress remote compressed repository

You can see this in the following picture (the read arrows show the inside of the transformations):

Copy a local repository

Copy a local repository

To be generic, a few configuration values must be added to your They set up the remote server name, your username, various paths. The following is an example with comments for all fields.

# Hostname of your etl server where carte runs
# Name of your ssh user
# Use one of ssf.password or shh.keypath + ssh.keypass
# password of your ssh uer, leave empty if none
# Where is your private key on your local machine
# If your private key is password protected, add it here.
# If not, leave it empty
# Where does your repo sits
# Where to compress locally the repository
# What is the name of your compressed repository
# (can be anything, this is irrelevant but having
# it here keeps consistency)
# Where to uncompress the zip file? This setup
# allows multiple users and the final directory
# will be ${ssh.etltargetdir}/${ssh.etltargetuserrepo}


This job assumes that you have ssh access. If this is the case, you can use this job as is, but there is one thing you might want to update.

I assumed that a key is used for ssh, but a password might be the only thing you need. If that is the case, update the 2 ssh steps and the copy step accordingly by unticking ‘use private key’.

That’s all, folks

This job should be quite easy to use. Do not hesitate to comment if you have questions.

Sadly I cannot attach a zip file to this post, and after doing some over-enthusiastic cleaning I lost the example file completely. I hope that the description given in this post is enough.

Official fix

It looks like this workaround is not needed anymore, since this bug fix PDI-13774, available in version 5.4.0GA.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s