With a little code however, you can extend most FileOutputFormat
sub-classes to avoid committing these files to HDFS when the tasks complete. You need to override two methods from the OutputFormat
:
getRecordWriter
- Wrap the writer to track that something was output via theContext.write(K, V)
methodgetOutputCommitter
- Extend theOutputCommitter
to override theneedsTaskCommit(Context)
method
Here's an example for extending SequenceFileOutputFormat: