Mathjax

Friday, January 31, 2014

Dateinfer Updated

The dateinfer package has been updated to version 0.20.

You can get it from PyPi, Github, or install it via pip.

Tuesday, January 14, 2014

Release of dateinfer: v0.1.1

I have released the first version of a new Python library called dateinfer. dateinfer makes a "best guess" date format given a list of example date strings. For example:

>>> import dateinfer
>>> dateinfer.infer(['Mon Jan 13 09:52:52 MST 2014', 'Tue Jan 21 15:30:00 EST 2014'])
'%a %b %d %H:%M:%S %Z %Y'
>>>

The library is available through pypi and is hosted on github.

Wednesday, January 8, 2014

Access denied checking streaming input path

When I started launching my Elastic Map Reduce (EMR) jobs from within a Elastic Beanstalk EC2 instance, I was stymied by the error message:

Terminated with errors Access denied checking streaming input path: s3://bucket/key

I first opened up permissions on my S3 bucket and file, but that didn't work. I then explicitly set the IAM role for EMR and assigned a policy for full read/write rights to S3. That also did not work.

After conversing with Amazon Web Services technical support, they noted that I was making requests using temporary credentials. EMR does not support temporary credentials, so the actual request was being performed by something with no authority to access any resources.

I solved the problem by explicitly setting my credentials (AWS access key, secret access key) in my job creation code. Since I am using mrjob,  this was a matter of:

runner = EMRJobRunner(
            aws_access_key_id='xxxxx',

            aws_secret_access_key='xxxxxxx',
            ...)

Before, I was not explicitly setting the access keys, so mrjob was using the keys boto was using, which are apparently temporary credentials passed in via Elastic Beanstalk.