I recently did a little project to find out what makes a company tick, using Python and the Twitter API. It has to be done quickly, in like a day, and didn’t need to be overly complicated.

One of the biggest hurdles I’ve found when teaching myself any sort of SQL/Python/Data Wrangling skills is the problem of finding usable, real life data to work with. Data that I can actually attempt to answer questions with.

Yeah, Web Scraping is super easy in Python, just pip install BeautifulSoup and away you go. Not.

It wouldn’t be the first time. The story is usually the same, lots of people, contractors, software installation, months of ETL work, months of database work, testing testing and more testing. And then it arrives, a beautiful spiffy Enterprise Data Warehouse with all it’s facts and dimensions in all their Kimball glory.

Some of the most unused yet powerful functions in T-SQL are Window functions. These functions are powerful because they allow calculations on a Window of the data you specify, even while the calculation scrolls through your data.

Hmmm…. What to use… What to use? When I want to explore data quickly and with the least amount of pain, the first problem I face is where do I start. There are a million approaches and I’m usually thinking long-term, ease of maintenance, surrounding platform, etc etc.

Seriously…..It’s called a non-lookup dude. Probably one of the most annoying situations I’ve come across when working on Enterprise Data Warehouse {EDW} teams/projects is the non-lookup problem.