Pentaho/kettle background
Kettle (now known as pdi) is a great ETL tool, opensource with a paid enterprise edition if you need extra support or plugin.
One great feature is the ability to remotely execute one of your jobs for testing, without having to deploy anything. This is done via the carte server (part of pdi), which basically is a service listening on a port to which you send your jobs.
Carte background
Carte works very well when you are using a database repository, but you will run into issues when you use a file repository. The reason is that when you run a job remotely, kettle needs to bundle all the relevant jobs and transformation to send them over. This is not always possible, an obvious example is if some job names are parametrised.
There is still a way to deal with this. Carte’s behaviour is to use the jobs/transformations sent by kettle, or to use the one it can find locally if the repository names match.
The solution
The solution is then quite logical: copy over your current repository to the carte server, set it up with the same name as your local repository and you are good to go.
This is a bit painful to do manually, so I give here the job I wrote to do that automatically from inside pentaho. There are not a lot of assumptions done, except that you can copy file to your carte server with scp (you thus need ssh access).
The flow is as follow:
- Delete existing compressed local repository if any
- Compress local repository
- Delete remote compressed repository if any
- Copy over compressed local repository
- Uncompress remote compressed repository
You can see this in the following picture (the read arrows show the inside of the transformations):
To be generic, a few configuration values must be added to your kettle.properties. They set up the remote server name, your username, various paths. The following is an example with comments for all fields.
# Hostname of your etl server where carte runs ssh.etlhost=etlserver.example.com # Name of your ssh user ssh.etluser=thisdwhguy # Use one of ssf.password or shh.keypath + ssh.keypass # password of your ssh uer, leave empty if none ssh.password= # Where is your private key on your local machine ssh.keypath=/Users/thisdwhguy/.ssh/id_rsa # If your private key is password protected, add it here. # If not, leave it empty ssh.keypass= # Where does your repo sits local.repo=/Users/thisdwhguy/pentaho # Where to compress locally the repository zip.tmpdir=/Users/thisdwh/tmp # What is the name of your compressed repository # (can be anything, this is irrelevant but having # it here keeps consistency) zip.devclone=devclone.zip # Where to uncompress the zip file? This setup # allows multiple users and the final directory # will be ${ssh.etltargetdir}/${ssh.etltargetuserrepo} ssh.etltargetdir=/path/to/repositories ssh.etltargetuserrepo=thisdwhguy
Caveats
This job assumes that you have ssh access. If this is the case, you can use this job as is, but there is one thing you might want to update.
I assumed that a key is used for ssh, but a password might be the only thing you need. If that is the case, update the 2 ssh steps and the copy step accordingly by unticking ‘use private key’.
That’s all, folks
This job should be quite easy to use. Do not hesitate to comment if you have questions.
Sadly I cannot attach a zip file to this post, and after doing some over-enthusiastic cleaning I lost the example file completely. I hope that the description given in this post is enough.
Official fix
It looks like this workaround is not needed anymore, since this bug fix PDI-13774, available in version 5.4.0GA.