AWS
Frequently used interaction patterns with AWS.
CLI
To create a new bucket, use
aws s3 mb bucketname.To add a subfolder to a bucket, use
aws s3api put-object --bucket bucketname --key foldername
Setup
There are multiple ways to access your AWS account. I store config and credential files in
~/.awsas discussed here. AWS access methods find these files automatically so I don’t have to worry about that.What I do have to worry about is choosing the appropriate profile depending on what AWS account I want to interact with (e.g. my personal one or one for work). This is different for each library, so I cover this below.
s3fs
Built by people at Dask, s3fs is built on top of botocore and provides a convenient way to interact with S3. It can read and – I think – write data, but there are easier ways to do that, and I use the library mainly to navigate buckets and list content.
Navigate buckets
| |
To choose a profile other than default, use:
| |
# Read and write directly from Pandas
Pandas can read and write files to and from S3 directly if you provide the file name as
s3://<bucket>/<filename>.By default,
Pandasuses the default profile to access S3. Recent versions ofPandashave astorage_optionsparameter that can be used to provide, among other things, a profile name.
Basics
| |
| |
This works well for simple jobs, but in a large project, passing the profile information to each read and write call is cumbersome and ugly.
Simple improvement using functools.partial
functools.partialprovides a simple solution, as it allows me to create a custom function with a frozen storage options argument.
| |
More flexible solution with custom function
Often, I run projects on my Mac for testing and a virtual machine to run the full code. In this case, I need a way to automatically provide the correct profile name.
| |
| |
The above is not ideal, as it requires cumbersome unpacking of return. Maybe using decorator is better.
awswrangler
A new library from AWS labs for Pandas interaction with a number of AWS services. Looks very promising, but haven’t had any use for it thus far.