Clean pentaho shared connections from transformations and jobs

Pentaho has this nice shared.xml file, which can be found in your $HOME/.kettle repository. Once used, you can define all your connections there, in theory preventing duplicating connection definition in all jobs, and thus having one place only where to update your connections when needed.

The sad reality is that each time you save a job or a transformation, the connections are still always embedded in the job or transformation, effectively duplicating them. If you somehow remove the connection details from your job/transfo, the one from shared.xml will be used, which is what we want.

This ‘somehow’ can easily be achieved by the following snippet:

find . -type f -print0 | xargs -0 perl -0 -p -i -e 's/\s*<connection>\s*<.*?<\/connection>\s*$//smg'

We run it regularly on our codebase to keep it clean, and this always worked as expected.

4 thoughts on “Clean pentaho shared connections from transformations and jobs

  1. Boa resposta Verónica. Considerades que a inversión que se está a facer no proxecto SETI que menciona Verónica está xuoa?ficadtiPsr certo, quero dar a benvida aos participantes do Colexio Los Sauces de Vigo e enviar saúdos para Ramón.

  2. hi,
    thanks so much for clearly stating this problem. I came across the problem recently and this is the only place I can find that acknowledges it, though I think this is part because this is one of those difficult-to-search-for topics … searching for ‘pentaho shared data connections save’ or similar returns so much other stuff.
    Anyway I was wondering if you’re aware of any fix for this or even a bug/feature tracking entry somewhere? I’m upgrading Pentaho DI v4.2.1 (!) enterprise to v7.0 community, successfully breaking down via a powershell script several large repository exports of jobs into hundreds of individual files, one per job/transformation. This need to remove saved shared connection definitions is something I’ll have to do to keep shared connection defs out of files, and to keep source control clean.
    cheers, Rod

Leave a reply to Alxeey Cancel reply