When I started launching my Elastic Map Reduce (EMR) jobs from within a Elastic Beanstalk EC2 instance, I was stymied by the error message:
Terminated with errors Access denied checking streaming input path: s3://bucket/key
I first opened up permissions on my S3 bucket and file, but that didn't work. I then explicitly set the IAM role for EMR and assigned a policy for full read/write rights to S3. That also did not work.
After conversing with Amazon Web Services technical support, they noted that I was making requests using temporary credentials. EMR does not support temporary credentials, so the actual request was being performed by something with no authority to access any resources.
I solved the problem by explicitly setting my credentials (AWS access key, secret access key) in my job creation code. Since I am using mrjob, this was a matter of:
runner = EMRJobRunner(
aws_access_key_id='xxxxx',
aws_secret_access_key='xxxxxxx',
...)
Before, I was not explicitly setting the access keys, so mrjob was using the keys boto was using, which are apparently temporary credentials passed in via Elastic Beanstalk.
No comments:
Post a Comment