Good ole’ string slicing. That’s one thing that never changes in Data Engineering, working with strings. You would think we would all get to row up some day and do the complicated stuff, but apparently you can’t outrun your past. I blame this mostly on the data and old schools companies. Plain text and flat files are still incredibly popular and common for storing and exporting data between systems. Hence string work comes upon us all like some terrible overload. The one you should fear the most is fixed width delimited files. I ran into a problem recently where PySpark was surprisingly terrible at processing fixed with delimited files and “string slicing.” It got me wondering … is it me or you?
Read more