Frequently used interaction patterns with AWS.
To create a new bucket, use
aws s3 mb bucketname.
To add a subfolder to a bucket, use
aws s3api put-object --bucket bucketname --key foldername
There are multiple ways to access your AWS account. I store config and credential files in
~/.awsas discussed here. AWS access methods find these files automatically so I don’t have to worry about that.
What I do have to worry about is choosing the appropriate profile depending on what AWS account I want to interact with (e.g. my personal one or one for work). This is different for each library, so I cover this below.
Built by people at Dask,
s3fs is built on top of
botocore and provides a convenient way to interact with S3. It can read and – I think – write data, but there are easier ways to do that, and I use the library mainly to navigate buckets and list content.
To choose a profile other than
# Read and write directly from
Pandas can read and write files to and from S3 directly if you provide the file name as
Pandasuses the default profile to access S3. Recent versions of
storage_optionsparameter that can be used to provide, among other things, a profile name.
This works well for simple jobs, but in a large project, passing the profile information to each read and write call is cumbersome and ugly.
Simple improvement using
functools.partialprovides a simple solution, as it allows me to create a custom function with a frozen storage options argument.
More flexible solution with custom function
Often, I run projects on my Mac for testing and a virtual machine to run the full code. In this case, I need a way to automatically provide the correct profile name.
The above is not ideal, as it requires cumbersome unpacking of return. Maybe using decorator is better.
A new library from AWS labs for Pandas interaction with a number of AWS services. Looks very promising, but haven’t had any use for it thus far.