We'll soon have to setup a Django Web application that's going to handle file uploads for a competition, where the submission period will go on for about two weeks. We expect some of the media may be as large as 200mb, because we will also be accepting video submissions. I personally never had to hassle with managing large uploads, but I'd like to configure the app to be ready for the load during the submission period.
AFAIK, if I use one of the supplied Django application setups provided by WebFaction, every request will get forwarded to Django; Django streams the incoming chunks to a temporary file and, while this is happening, an Apache thread will be kept busy. I feel a bit queasy having expensive threads handling simple uploads. I don't have a complete understanding on how Apache's processes and threads work, specifically how does a fixed pool of child processes and threads handle different requests on a single instance, some of which wouldn't need any special handlers and others that'd require expensive mod_wsgi or mod_python handlers. My current understanding is that if I have an Apache instance that has requests that need to be handled via mod_wsgi, it means that an instance of the Python interpreter will be spawned for each process (add to the memory footprint all the modules that will eventually get loaded, plus application data); perhaps not immediately, but most definitely the first time a request that's going to hit my Django application lands?
From what I've figured, if I want to go green and have specialized inexpensive threads that are going to handle file uploads, I'd have to set a separate Web server instance that does that, and also behaves as a fast proxy to my Apache instance that serves Django requests. That way I can control and balance between requests that handle uploads and those that serve my dynamic pages.
So I've been looking into running a custom nginx instance that would use the nginx upload module. I'm assuming the upload module doesn't generate a lot of overhead, which means I can have nginx handle the uploads in an inexpensive manner and signal my Django instance with the uploaded file paths once the files have been fully uploaded. The flipside is, I have no idea on how to discern between uploads originating from authenticated users, so basically I run the risk of anyone being able to generate lots of garbage upload requests if they chose to do so.
My question is... did I get everything right and am I overdoing it? Perhaps I don't understand that my existing Apache instance can be optimized for handling uploads in an inexpensive manner, close to what I'd be getting with my nginx reverse-proxy. If having a reverse-proxy, be it a separate nginx or httpd instance, is the only way, how can I validate incoming requests to make sure it's an authenticated Django user?
Thanks a lot and sorry for the long-winded post!
This question is marked "community wiki".
You can't authenticate your users if you don't fire up a Python interpreter. So, if you are trying to get rid of starting a Python interpreter by having the uploads handled by Nginx, you will give up the user checking.
I would suggest that you try doing some test uploads directly via your Django application. If any problem presents itself, then try to fix it.
answered Nov 18 '10 at 05:52
The upload form I construct using Django is just going to have