Clean pentaho shared connections from transformations and jobs

Pentaho has this nice shared.xml file, which can be found in your $HOME/.kettle repository. Once used, you can define all your connections there, in theory preventing duplicating connection definition in all jobs, and thus having one place only where to update your connections when needed.

The sad reality is that each time you save a job or a transformation, the connections are still always embedded in the job or transformation, effectively duplicating them. If you somehow remove the connection details from your job/transfo, the one from shared.xml will be used, which is what we want.

This ‘somehow’ can easily be achieved by the following snippet:

find . -type f -print0 | xargs -0 perl -0 -p -i -e 's/\s*<connection>\s*<.*?<\/connection>\s*$//smg'

We run it regularly on our codebase to keep it clean, and this always worked as expected.

Advertisements

One thought on “Clean pentaho shared connections from transformations and jobs

  1. Boa resposta Verónica. Considerades que a inversión que se está a facer no proxecto SETI que menciona Verónica está xuoa?ficadtiPsr certo, quero dar a benvida aos participantes do Colexio Los Sauces de Vigo e enviar saúdos para Ramón.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s