Data Engineering Archives - Page 23 of 23 - Confessions of a Data Guy

Look at my Hadoop Cluster… oh, and Don’t Sneeze.

I’ve been wanting to get more hands on experience with Apache Hadoop for a years. It’s one thing to read about something and say yeah… I get it, but trying to implement it yourself from scratch just requires a whole different level of understanding. There seems to be something about trying to solve a problem that helps a person understand the technology a little better.

December 29, 2018

Data, Data Engineering, Python

Getting My Spark On

There sure has been a lot of kerfuffle around Spark lately. Spark this Spark that, Spark is the best thing ever, and so on and so forth. I recently had some small exposure to PySpark when working on a Glue project, at the time a lot of the functions reminded me of Pandas and I’ve been trying to find time to explore Spark a little more.

November 24, 2018

Data, Data Engineering, Data Warehousing, Python

Python and Apache Parquet. Yes Please.

Update: Check out my new Parquet post.
Recently while delving and burying myself alive in AWS Glue and PySpark, I ran across a new to me file format. Apache Parquet.

It promised to be the unicorn of data formats. I’ve not been disappointed yet.

September 29, 2018

Data, Data Engineering, Python

Python and Data. Devils in the Details.

I work with Python and data a lot, specifically different RDBMS’s with structured data. Anyone who does this type of work will probably have run across pyodbc, a Python package that allows ODBC access into different
database platforms.

July 6, 2018

Data, Data Engineering, Python, SQL

Exploring LendingTree Data with Python/SQL

One of the biggest hurdles I’ve found when teaching myself any sort of SQL/Python/Data Wrangling skills is the problem of finding usable, real life data to work with. Data that I can actually attempt to answer questions with.

May 24, 2018

Data, Data Engineering, Python

Data Engineering Basics. Python and Data. RDBMS vs Pandas.

Hmmm…. What to use… What to use? When I want to explore data quickly and with the least amount of pain, the first problem I face is where do I start. There are a million approaches and I’m usually thinking long-term, ease of maintenance, surrounding platform, etc etc.

March 10, 2018

Look at my Hadoop Cluster… oh, and Don’t Sneeze.

Getting My Spark On

Python and Apache Parquet. Yes Please.

Python and Data. Devils in the Details.

Exploring LendingTree Data with Python/SQL

Data Engineering Basics. Python and Data. RDBMS vs Pandas.

Interesting links

Pages

Categories

Archive