Update: Check out my new Parquet post.
Recently while delving and burying myself alive in AWS Glue and PySpark, I ran across a new to me file format. Apache Parquet.
It promised to be the unicorn of data formats. I’ve not been disappointed yet.
Update: Check out my new Parquet post.
Recently while delving and burying myself alive in AWS Glue and PySpark, I ran across a new to me file format. Apache Parquet.
It promised to be the unicorn of data formats. I’ve not been disappointed yet.
One of the biggest hurdles I’ve found when teaching myself any sort of SQL/Python/Data Wrangling skills is the problem of finding usable, real life data to work with. Data that I can actually attempt to answer questions with.
Hmmm…. What to use… What to use? When I want to explore data quickly and with the least amount of pain, the first problem I face is where do I start. There are a million approaches and I’m usually thinking long-term, ease of maintenance, surrounding platform, etc etc.