Pentaho has this nice shared.xml file, which can be found in your $HOME/.kettle repository. Once used, you can define all your connections there, in theory preventing duplicating connection definition in all jobs, and thus having one place only where to update your connections when needed.
The sad reality is that each time you save a job or a transformation, the connections are still always embedded in the job or transformation, effectively duplicating them. If you somehow remove the connection details from your job/transfo, the one from shared.xml will be used, which is what we want.
This ‘somehow’ can easily be achieved by the following snippet:
find . -type f -print0 | xargs -0 perl -0 -p -i -e 's/\s*<connection>\s*<.*?<\/connection>\s*$//smg'
We run it regularly on our codebase to keep it clean, and this always worked as expected.