, ,

Reading Excel (.xlsx) Files with Polars

I make it my duty in life to never have to open an Excel file (xlsx); I feel like if I do, then I made a critical error in my career trajectory. But, I recently had no choice but to open an Excel on a Mac (or try) to look at some sample data from a client.

The file was large and messy, including extra junk header records; it was opening strangely in Numbers for Mac, and even trying to convert to CSV or TSV was being a bugger.

That’s when I decided to use Polars to see if I could read the Excel file and then dump it to CSV for better data munging. It worked great. Here is the code snippet for all those others looking to skip headers with Polars while reading an Excel file.

One side note, you will probably get this error right away and need to pip install the following package to read Excel files with Polars.

Please install using the command `pip install fastexcel`.

import polars as pl

df = pl.read_excel(‘sample.xlsx’, read_options={“header_row”: 2})

df.limit(10).write_csv(‘sample.csv’)

Polars, what a beautiful thing, the ability to write one-liners with plenty of extra options to do strange things like skip junk rows. It’s truly gem and under-appreciated tool.