For the latest version of boto, see <a href=""></a> -- Python interface to Amazon Web Services boto: A Python interface to Amazon Web Services &mdash; boto v2.38.0

How to mix Django, Uploadify, and S3Boto Storage Backend?


I'm doing fairly big file uploads on Django. File size is generally 10MB-100MB.

I'm on Heroku and I've been hitting the request timeout of 30 seconds.

The Beginning

In order to get around the limit, Heroku's recommendation is to upload from the browser DIRECTLY to S3.

Amazon documents this by showing you how to write an HTML form to perform the upload.

Since I'm on Django, rather than write the HTML by hand, I'm using django-uploadify-s3 (example). This provides me with an SWF object, wrapped in JS, that performs the actual upload.

This part is working fine! Hooray!

The Problem

The problem is in tying that data back to my Django model in a sane way. Right now the data comes back as a simple URL string, pointing to the file's location.

However, I was previously using S3 Boto from django-storages to manage all of my files as FileFields, backed by the delightful S3BotoStorageFile.

To reiterate, S3 Boto is working great in isolation, Uploadify is working great in isolation, the problem is in putting the two together.

My understanding is that the only way to populate the FileField is by providing both the filename AND the file content. When you're uploading files from the browser to Django, this is no problem, as Django has the file content in a buffer and can do whatever it likes with it. However, when doing direct-to-S3 uploads like me, Django only receives the file name and URL, not the binary data, so I can't properly populate the FieldFile.

Cry For Help

Anyone know a graceful way to use S3Boto's FileField in conjunction with direct-to-S3 uploading?

Else, what's the best way to manage an S3 file just based on its URL? Including setting expiration, key id, etc.

Many thanks!

Source: (StackOverflow)

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.

Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.

Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.

I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.

What would you suggest is the best solution to this setup?

I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.

Are there any other suggestions or better ways of approaching this?

Source: (StackOverflow)

Downloading the files from s3 recursively using boto python.

I have a bucket in s3, which has deep directory structure. I wish I could download them all at once. My files look like this :

foo/bar/1. . 
foo/bar/100 . . 

Are there any ways to download these files recursively from the s3 bucket using boto lib in python?

Thanks in advance.

Source: (StackOverflow)

How do I test a module that depends on boto and an Amazon AWS service?

I'm writing a very small Python ORM around boto.dynamodb.layer2. I would like to write tests for it, but I don't want the tests to actually communicate with AWS, as this would require complicated setup, credentials, network access, etc.

Since I plan to open source the module, including credentials in the source seems like a bad idea since I will get charged for usage, and including credentials in the environment is a pain.

Coupling my tests to the network seems like a bad idea, as it makes the tests run slower, or may cause tests to fail due to network errors or throttling. My goal is not to test boto's DynamoDB interface, or AWS. I just want to test my own code.

I plan to use unittest2 to write the tests and mock to mock out the parts of boto that hit the network, but I've never done this before, so my question boils down to these:

  1. Am I going about this the right way?
  2. Has anyone else done this?
  3. Are there any particular points in the boto.dynamodb interface that would be best to mock out?

Source: (StackOverflow)

How to auto assign public ip to EC2 instance with boto

I have to start a new machine with ec2.run_instances in a given subnet but also to have a public ip auto assigned (not fixed elastic ip).

When one starts a new machine from the Amazon's web EC2 Manager via the Request Instance (Instance details) there is a check-box called Assign Public IP to Auto-assign Public IP. See it highlighted in the screenshot:

Request Instance wizard

How can I achieve that check-box functionality with boto?

Source: (StackOverflow)

Boto EC2: Create an instance with tags

Is there a way with the boto python API to specify tags when creating an instance? I'm trying to avoid having to create an instance, fetch it and then add tags. It would be much easier to have the instance either pre-configured to have certain tags or to specify tags when I execute the following command:

        ec2_conn, ami_name, security_group, instance_type_name, key_pair_name, user_data

Source: (StackOverflow)

How to change metadata on an object in Amazon S3

If you have already uploaded an object to an Amazon S3 bucket, how do you change the metadata using the API? It is possible to do this in the AWS Management Console, but it is not clear how it could be done programmatically. Specifically, I'm using the boto API in Python and from reading the source it is clear that using key.set_metadata only works before the object is created as it just effects a local dictionary.

Source: (StackOverflow)

Why are no Amazon S3 authentication handlers ready?

I have my $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY environment variables set properly, and I run this code:

import boto
conn = boto.connect_s3()

and get this error:

boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler']

What's happening? I don't know where to start debugging.

It seems boto isn't taking the values from my environment variables. If I pass in the key id and secret key as arguments to the connection constructor, this works fine.

Source: (StackOverflow)

Upload image available at public URL to S3 using boto

I'm working in a Python web environment and I can simply upload a file from the filesystem to S3 using boto's key.set_contents_from_filename(path/to/file). However, I'd like to upload an image that is already on the web (say

Should I somehow download the image to the filesystem, and then upload it to S3 using boto as usual, then delete the image?

What would be ideal is if there is a way to get boto's key.set_contents_from_file or some other command that would accept a URL and nicely stream the image to S3 without having to explicitly download a file copy to my server.

def upload(url):
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = "test"
                return "Success?"
    except Exception, e:
            return e

Using set_contents_from_file, as above, I get a "string object has no attribute 'tell'" error. Using set_contents_from_filename with the url, I get a No such file or directory error . The boto storage documentation leaves off at uploading local files and does not mention uploading files stored remotely.

Source: (StackOverflow)

How can I copy files bigger than 5 GB in Amazon S3?

Amazon S3 REST API documentation says there's a size limit of 5gb for upload in a PUT operation. Files bigger than that have to be uploaded using multipart. Fine.

However, what I need in essence is to rename files that might be bigger than that. As far as I know there's no rename or move operation, therefore I have to copy the file to the new location and delete the old one. How exactly that is done with files bigger than 5gb? I have to do a multipart upload from the bucket to itself? In that case, how splitting the file in parts work?

From reading boto's source it doesn't seem like it does anything like this automatically for files bigger than 5gb. Is there any built-in support that I missed?

Source: (StackOverflow)

git aws.push: No module named boto

i'm trying to follow the tutorial: deploy django on aws Elastic Beanstalk

when i'm doing the Step 6's substep 5:

git aws.push

I get a ImportError message:

(tryhasinenv)Lee-Jamess-MacBook-Pro:tryhasin h0925473$ git aws.push
Traceback (most recent call last):
  File ".git/AWSDevTools/aws.elasticbeanstalk.push", line 21, in <module>
    from aws.dev_tools import * 
  File "/Users/h0925473/tryhasin_root/tryhasin/.git/AWSDevTools/aws/", line 5, in <module>
    import boto
ImportError: No module named boto

I have no idea what to do. Can somebody tell me what's wrong?

Source: (StackOverflow)

How to store data in GCS while accessing it from GAE and 'GCE' locally

There's a GAE project using the GCS to store/retrieve files. These files also need to be read by code that will run on GCE (needs C++ libraries, so therefore not running on GAE).

In production, deployed on the actual GAE > GCS < GCE, this setup works fine. However, testing and developing locally is a different story that I'm trying to figure out.

As recommended, I'm running GAE's dev_appserver with GoogleAppEngineCloudStorageClient to access the (simulated) GCS. Files are put in the local blobstore. Great for testing GAE.

Since these is no GCE SDK to run a VM locally, whenever I refer to the local 'GCE', it's just my local development machine running linux. On the local GCE side I'm just using the default boto library ( with a python 2.x runtime to interface with the C++ code and retrieving files from the GCS. However, in development, these files are inaccessible from boto because they're stored in the dev_appserver's blobstore.

Is there a way to properly connect the local GAE and GCE to a local GCS?

For now, I gave up on the local GCS part and tried using the real GCS. The GCE part with boto is easy. The GCS part is also able to use the real GCS using an access_token so it uses the real GCS instead of the local blobstore by:


According to the docs:

access_token: you can get one by run 'gsutil -d ls' and copy the
  str after 'Bearer'.

That token works for a limited amount of time, so that's not ideal. Is there a way to set a more permanent access_token?

Source: (StackOverflow)

boto issue with IAM role

I'm trying to use AWS' recently announced "IAM roles for EC2" feature, which lets security credentials automatically get delivered to EC2 instances. (see

I've set up an instance with an IAM role as described. I can also get (seemingly) proper access key / credentials with curl.

However, boto fails to do a simple call like "get_all_buckets", even though I've turned on ALL S3 permissions for the role.

The error I get is "The AWS Access Key Id you provided does not exist in our records"

However, the access key listed in the error matches the one I get from curl.

Here is the failing script, run on an EC2 instance with an IAM role attached that gives all S3 permissions:

import urllib2
import ast
from boto.s3.connection import S3Connection

print "access:" + resp['AccessKeyId']
print "secret:" + resp['SecretAccessKey']
conn = S3Connection(resp['AccessKeyId'], resp['SecretAccessKey'])
rs= conn.get_all_buckets()

Source: (StackOverflow)

Using Amazon s3 boto library, how can I get the URL of a saved key?

I am saving a key to a bucket with:

    key = bucket.new_key(fileName)
    key.set_metadata('Content-Type', 'image/jpeg')

After the save is successful, how can I access the URL of the newly created file?

Source: (StackOverflow)

Using django-storages and the s3boto backend, How do I add caching info to request headers for an image so browser will cache image?

I am using the s3boto backend, not the s3 backend.

In the django-storages docs it says to specify the AWS_HEADERS variable on your file:

AWS_HEADERS (optional)

If you’d like to set headers sent with each file of the storage:

# see
'Expires': 'Thu, 15 Apr 2010 20:00:00 GMT',
'Cache-Control': 'max-age=86400',

This is not working for me.

Here is my model:

class Photo(models.Model):
        docstring for Photo
        represents a single photo.. a photo can have many things associated to it like
        a project, a portfolio, etc...

    def image_upload_to(instance, filename):
        today =
        return 'user_uploads/%s/%s/%s/%s/%s/%s/original/%s' % (instance.owner.username, today.year, today.month,, today.hour, today.minute, filename)

    def thumb_upload_to(instance, filename):
        today =
        return 'user_uploads/%s/%s/%s/%s/%s/%s/thumb/%s' % (instance.owner.username, today.year, today.month,, today.hour, today.minute, filename)

    def medium_upload_to(instance, filename):
        today =
        return 'user_uploads/%s/%s/%s/%s/%s/%s/medium/%s' % (instance.owner.username, today.year, today.month,, today.hour, today.minute, filename)

    owner = models.ForeignKey(User)
    # take out soon
    projects = models.ManyToManyField('Project', through='Connection', blank=True)
    image = models.ImageField(upload_to=image_upload_to)
    thumb = ThumbnailerImageField(upload_to=thumb_upload_to, resize_source=dict(size=(102,102), crop='center'),)
    medium = ThumbnailerImageField(upload_to=medium_upload_to, resize_source=dict(size=(700,525),))
    title = models.CharField(blank=True, max_length=300)
    caption = models.TextField(blank=True)
    can_view_full_res = models.BooleanField(default=False)
    is_portfolio = models.BooleanField(default=False)
    created_time = models.DateTimeField(blank=False, auto_now_add=True)
    disabled = models.DateTimeField(blank=True, null=True, auto_now_add=False)
    cost = models.FloatField(default=0)
    rating = models.IntegerField(default=0)
    mature_content = models.BooleanField(default=False)
    objects = ViewableManager()

    def get_absolute_url(self):
        return "/m/photo/%i/" %

    def get_prev_by_time(self):
        get_prev = Photo.objects.order_by('-created_time').filter(created_time__lt=self.created_time)
            return get_prev[0]
        except IndexError:
            return None

    def get_next_by_time(self):
        get_next = Photo.objects.order_by('created_time').filter(created_time__gt=self.created_time)
            return get_next[0]
        except IndexError:
            return None

    def __unicode__(self):

This is what is on my template where I have the image...

<img class='shadow' src='{{ object.medium.url }}'>

Here are the request and response headers:

Request URL:
Request Method:GET
Status Code:200 OK
Request Headersview parsed
GET /user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=s%2ByKsWDxrDJbyeVHd%2BDS3JlByts%3D&Expires=1332529522&AWSAccessKeyId=MY_ACCESS_KEYID HTTP/1.1
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11
Accept: */*
Referer: http://localhost:8000/m/photo/1/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Query String Parametersview URL encoded
Response Headersview parsed
HTTP/1.1 200 OK
x-amz-id-2: wOWRRDi5TItAdiYSPf8X4z4I4v5/Zu8XLhwlxmZa8w8w1Jph8WQkenihVJI/ZKnV
x-amz-request-id: THE_X_AMZ_REQUEST_ID
Date: Fri, 23 Mar 2012 18:05:24 GMT
Cache-Control: max-age=86400
Last-Modified: Fri, 23 Mar 2012 09:00:13 GMT
ETag: "6e34e718a349e0bf9e4aefc1afad3d7d"
Accept-Ranges: bytes
Content-Type: image/jpeg
Content-Length: 91600
Server: AmazonS3

When I paste the path to the image into the address bar it WILL cache the image and give me a 304... Here are those request and response headers:

Request URL:
Request Method:GET
Status Code:304 Not Modified
Request Headersview parsed
GET /user_uploads/travismillward/2012/3/23/3/0/medium/_0677866898.jpg?Signature=evsDZiw3QGsjPacG4CHn6Ji2dDA%3D&Expires=1332528782&AWSAccessKeyId=MY_ACCESS_KEYID HTTP/1.1
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: http://localhost:8000/m/photo/1/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
If-None-Match: "6e34e718a349e0bf9e4aefc1afad3d7d"
If-Modified-Since: Fri, 23 Mar 2012 09:00:13 GMT
Query String Parametersview URL encoded
Response Headersview parsed
HTTP/1.1 304 Not Modified
x-amz-id-2: LfdHa10SdWnx4UH1rc62NfUDeiNVGRzBX73CR+6GDrXJgv9zo+vyQ9A3HCr1YLVa
x-amz-request-id: THE_X_AMZ_REQUEST_ID
Date: Fri, 23 Mar 2012 18:01:16 GMT
Last-Modified: Fri, 23 Mar 2012 09:00:13 GMT
ETag: "6e34e718a349e0bf9e4aefc1afad3d7d"
Server: AmazonS3

Source: (StackOverflow)