Is Python OOP the Devil? Or Savior?
Nothing will raise the hackles on the backs of hairy and pale programmers who’ve been stuck in their mom’s basement for a decade like bringing up OOP (Object Oriented Programming), especially in the context of Python. It’s like two fattened calves prepared for slaughter, sharpen your knives, and take your place, it’s time to feast upon the boiling cauldron of emotions simmering away in the interwebs.
Programmers and programming.
Recently I found myself much amused by the illustrious Matt Martin, a Staff Engineer who’s active on Linkedin. He shares real-life takes on Data Engineering, Golang, Python, and sometimes Rust. I always sit up and take notice when I run across a fellow Data person willing to stick their neck out and say things that are true, but probably unpopular.
Anyways, Matt posted something most people know in their heart of hearts is generally true.
The man is on his way to the top if he keeps posting truths, but I digress.
The truth about Python Classes and OOP.
You don’t have to be much of a soothsayer or rub that ol’ crystal ball too hard these days to see the writing on the wall when it comes to OOP. You can smell that change in the wind, and it isn’t really that surprising. Like anything, programming and its popular Moda Operandi often rise and fall over the decades depending on what is popular.
In the Age of ThePrimeagen, like it or not, it seems OOP is finally showing some cracks in the armor in favor of imperative, procedural, and functional coding. The age of the slacks-wearing .NET developer is coming to an end, with tools like Golang and Rust starting to gain momentum.
The case for OOP hasn’t been helped by Python either, in fact, probably over the long haul, a great deal of damage has been done by Python Programmers using OOP. Let me explain.
Python Classes and OOP have been overused and misused by a legion of mediocre and mid-Python developers to the point of making it patently obvious to the casual observer that things have gone too far. Take this comment from John Crickett for example.
Does what Matt says about Python Classes being useless hold true? Why yes, if you use the 80/20 rule then he is correct, Python Classes are dead, it’s good advice to new and upcoming basement dwellers rattling endlessly away on their keyboards.
Don’t do it. Don’t reach for it. You will be sorry.
The unfortunate part is just how many classes, courses, videos, and tutorials introduce you to programming, the side effect being that when you are new and don’t know any better … than you think you’re being sophisticated and reach for OOP because you want to prove something about yourself. The only thing you end up proving is that your a lemming.
Can OOP in Python be useful sometimes? Well of course, in 20% of the use cases (that’s being generous), sure, you might find something that fits the bill.
I myself (who in full disclosure distains every Python Class I run across), have written a few over low these many years. There were instances of so-called utility work that totally made sense, including ideas like inheritance, etc. Few and far between, but not much.
The key was even in those circumstances to not overdo it, which is tempting even in a valid use case. Keep it small and to the point.
The end of OOP (with Python or whatever)?
We can only hope. The last thing we need is another generation of know-it-alls polluting our AI and LLM overloads and by accident teaching those destroyers of codebases to do more of the same. It’s like a self-fulfilling prophecy.
AI trained on the brain waves of decades of OOP zealots pushing a never-ending stream of Git commits filled with useless Python Classes. Lord save us.
But… Why? Why are classes bad in your opinion? This article doesn’t answer it.
I agree with the sentiment but it bothers me that there is no justification.
For anyone looking for a justification as for why classes are bad in Data Engineering:
a) Typically you run your ETL pipeline only once per run. There’s not need to create several instances of your ETLProcessor class or whatever you called it. If you want to run the same pipeline for say several disjunct date ranges in parallel you will typically do so via your scheduler (Airflow, Step Functions…)
b) Typically your ETL pipeline doesn’t need to hold state. OOP is used to hold data per instance of a class being created. This is useful for software with many entities being acted upon, like a game. But in data engineering you just have your single ETL transformation job running in isolation without any chaotic or unpredictable interactions with other components. The complexity of data engineering is elsewhere.
Due to the above points using object oriented programming in data engineering over functional programming just introduces boilerplate and unneeded complexity. Thus it’s simply a bad practice. Don’t do it.
Point out to any “clean coder” who says otherwise, that the famous “gang of four” book was written in the 1990s by four java engineers who developed business software, not by Data Engineers developing ETL pipelines in today’s ecosystem using Python.