
In part 1 of the big data file formats we reviewed Parquet vs Avro. It was apparent from the start that the two file formats were built for different things. Avro is clearly a complex row structured file format used in communication and transactions, where schema is king and nested structures are no problem. Parquet on the other hand has risen to the top with the popularity of Spark, is columnar based storage and is well suited to structured and tabular type data. But, lest the annals of inter-webs call us uncouth and forgetful, we must add ORC file format to the list.
Read more