Download a csv file from s3 and create a pandas.dataframe
Tweet-it!How to download a .csv file from Amazon Web Services S3 and create a pandas.dataframe using python3 and boto3
.
Import lib
import boto3
import pandas as pd
import io
(pip3 install boto3 pandas
if not installed)
Set region and credentials
First we need to select the region where the bucket is placed and your account credentials.
- You can find the region in the url, when you preview the desired bucket https://s3.console.aws.amazon.com/s3/buckets/vperezb/?region=us-east-1
- In this case: region=us-east-1
- Copy access and secret from https://console.aws.amazon.com/iam/home?#/security_credential
- Security Credentials -> Access keys (access key ID and secret access key) -> Create New Access Key -> Show Access Key
Using Account credentials isn’t a good practise as they give full access to AWS resources http://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html?icmpid=docs_iam_console
REGION = 'us-east-1'
ACCESS_KEY_ID = 'paste_here_your_key_id'
SECRET_ACCESS_KEY = 'paste_here_your_secret_access_key'
Select file location in AWS S3
BUCKET_NAME = 'vperezb'
KEY = 'path/in/s3/namefile.txt' # file path in S3
Caution: The path does not include the starting
/
Download the file from S3
s3c = boto3.client(
's3',
region_name = REGION,
aws_access_key_id = ACCESS_KEY_ID,
aws_secret_access_key = SECRET_ACCESS_KEY
)
obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)
df = pd.read_csv(io.BytesIO(obj['Body'].read()), encoding='utf8')
Print dataframe
df
name | sex | city | country | age | job | |
---|---|---|---|---|---|---|
0 | Bob | M | Los Angeles | USA | 40 | Actor Extraordinaire |
1 | Joe | M | New York | USA | 35 | Policeman |
All togeather
import boto3
import pandas as pd
import io
REGION = 'us-east-1'
ACCESS_KEY_ID = 'paste_here_your_key_id'
SECRET_ACCESS_KEY = 'paste_here_your_secret_access_key'
BUCKET_NAME = 'vperezb'
KEY = 'path/in/s3/namefile.txt' # file path in S3
s3c = boto3.client(
's3',
region_name = REGION,
aws_access_key_id = ACCESS_KEY_ID,
aws_secret_access_key = SECRET_ACCESS_KEY
)
obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)
df = pd.read_csv(io.BytesIO(obj['Body'].read()), encoding='utf8')
df
Written on December 26, 2017