ml pipelines
Building Machine Learning (ML) pipelines with big data is hard enough, and it doesn’t take much of a curve ball to make it a nightmare. Most of what you will read online are tutorials on how to take a few CSV files and run them through some sklearn package. If you are lucky, you might find some “big data” ML stories on Medium where someone uses Spark to crunch a bunch of JSON, Parquet, or CSV files at scale of 10 to a few hundred gigabytes of data. Usually they are simplistic and ambiguous. Unfortunately that isn’t how it works in the real world.
Read more