I needed to permanently remove a data disk from Kudu. In my case, this disk had way too many IOs and I needed to have Kudu not writing to it anymore. This post explains how to do this, safely.
Sanity checks
First, you need to make sure that there are no tables with replication factor 1. If by bad luck some tablet of this table are on the disk you will remove, then the table would become unavailable. Note that the user running this command must be in the superuser_acl
list of Kudu (replace of course ${kudu_master_host} with the real hostname).
kudu cluster ksck ${kudu_master_host} | grep '| 1 |' | cut -f2 ' '
If there are tables there, you need to
- either DROP them
- either recreate them with a higher replication factor. You cannot change the replication factor of an existing table.
Technically, there are other options, but they are trickier:
- I could
kudu tablet change_config move_replica
tablets for all tables with RF 1 from eg. server 1 to server 2, then remove the directory for server 1, rebalance, then rinse and repeat from server 2 to 3 and so on. Note that you can only move tablet between servers, not disks, so if can take a while if you have many servers. - I could move the data directory from one disk to another disk as not whole disks are used by Kudu but only subdirectories. As all other disks already had Kudu data directories in my case, this would have meant that a disk would receive twice as many IOs.
Start a rebalance. After this the data will be properly spread, and more importantly we know that rebalance can happen.
kudu cluster rebalance ${kudu_master_host}
Stop kudu.
Remove a disk
Remove the path to directory you want to remove from fs_data_dirs
.
While kudu is still stopped, tell kudu on the tablet server which configuration you just changed, that there is now 1 less disk:
sudo -u kudu kudu fs update_dirs --force --fs_wal_dir=[your wal directory] --fs_data_dirs=[comma separated list of remaining directories]
Restart kudu. Data will be automatically rebalanced.
Congrats, go to your next node once all tablets are happy (kudu cluster ksck ${kudu_master_host}
does not return any error).