For any Data Engineer working on aws
for any length of time, there is one task that always seems to come up and never go away. Manipulating files on s3
a bucket on aws
is something I’ve had to do for years, it just never goes away. It’s always something … listing files, moving files, copying files, checking for files, getting the last modified file, checking file sizes, downloading files … it pretty much never ends.
Luckily aws
provides a few tools to make these easy, their handy cli
for command-line work, or the trusty boto3
Python package. I want to give an introduction to the common commands Data Engineers have to run with both the aws cli
and boto3
to perform various common tasks. We will then compare and contrast which tool to use in our pipelines and the pros and cons of each.