Flume played a nasty trick on us recently. The files on a spooldir source were not processed, ending up filling up our disk. Looking a bit at the symptoms, 2 effects were obvious:
- A lot of canary files appeared, with names like ‘flume-spooldir-perm-check-983522283290612181.canary’
- Flume.log was swamped with java.io.IOException: Not a data file.
The canary files are created as a part of permission checking of a spooldir, as can be seen on github. The only thing is that they are supposed to be deleted afterwards, as they only are there to see if flume can write on the directory. In our case they were not deleted, because creating a new empty file can be done on a full disk, but writing it needs free space. The check thus errored out before deletion. Those files can safely be deleted.
The “java.io.IOException: Not a data file” exception was due to the presence of a temporary directory holding metadata for processing. This directory is controlled by the trackerDir directive in the definition of the spooldir source in flume.conf (by default .flumespool in the spooldir). We ended up having empty metadata files, which then did not have the 2 bytes that avro (we are using an avro sink) expected to see. There is actually nothing wrong at all with the actual data file, only with the metadatafile. The solution is thus to delete .flumespool and the issue resolved itself (after releasing a bit of space from the disk, of course.)
The root cause of all this was that we used the default deletePolicy of never in flume.conf, meaning that once a file is processed, it is renamed but not removed. We do have a job to remove those processed files, but it failed for while, thus letting the disk fill up. We now delete files directly after processing (deletePolicy: immediate).
Edit: I sent a flume bug and patch, at FLUME-2361. Hopefully this will be merged at some point soon.