Skip to content
Old library catalogue drawers

Using NGINX’s X-Accel With Remote URLs

Ewen explains how to tackle serving up remote files from an S3 bucket to permitted users via a Django app using NGINX.

We’ve recently been working on a project here at Media Suite using Django served up by NGINX. We have files stored remotely within an S3 bucket and need end users to be able to download them after verifying they have permission. The metadata for the files are stored in the Django database.

I felt like this was a reasonably common problem but it took a bit of configuring to get it working in a way I was happy with, so I thought I’d share in the hopes that others find it useful.

The solution below assumes Django, but it should be easily adaptable to any other server backend.

Options

A direct link to the S3 bucket object wasn’t possible as we need to be able to check their permissions within Django. This is also a problem if you would like to perform any other actions (such as logging the download).

As I saw it, there were three options:

  • Generate a pre-signed S3 object URL that expires within a short time and redirect the user once you’ve checked permissions etc.
  • Download the file to your server in the background first, and then forward it onto them.
  • Proxy the download.

Originally I tried using the pre-signed object URL, but you lose control over the exact headers sent back to the user. S3 does allow you to configure some things here but, as the metadata was stored within the Django database, it wasn’t going to work. Specifically, I wanted to use the Content-Disposition header to set the filename, as well as forcing the browser to download the file (rather than opening directly depending on the mime-type). It does also mean that the link is shared to the user, which could be a security risk if the URL expiry wasn’t short enough.

Downloading the file in the background and then sending to the user would have worked pretty well. We would have had complete control over the response headers. However, it requires Django to be involved with the download streaming of potentially big files (which isn’t ideal). It may also require storing the file temporarily locally (streaming it would likely be possible, but I didn’t explore this option).

The above option was my fallback position, however I was keen to see if there was a way to proxy downloads through using NGINX.

X-Accel

NGINX has a mechanism (called X-Accel) to hand off a direct download, which works well for local files. From their documentation:

X-accel allows for internal redirection to a location determined by a header returned from a backend.

This allows you to handle authentication, logging or whatever else you please in your backend and then have NGINX handle serving the contents from redirected location to the end user, thus freeing up the backend to handle other requests. This feature is commonly known as X-Sendfile.

In a nutshell this allows the backend to pass off the actual transmission of the file to NGINX, but you still get an initial hook into protecting it or logging the download. It works by the backend – Django in our case – giving the URL in the response headers. NGINX will intercept that and serve the file.

Here is the simplest example, which would serve up /var/www/files/somefile.png.

response = HttpResponse()
response['X-Accel-Redirect'] = '/file_download/somefile.png'
return response
location /file_download {
internal;
alias /var/www/files;
}

This is handled pretty well for local files or where you’re doing a simple proxy_pass to another server but arbitrary URLs require something a bit different.

Using X-Accel for Remote URLs

Using this excellent post as a starting point, there is a way to pass the full URL to NGINX. The approach works by passing the protocol, host, and path of the URL through the X-Accel-Redirect header. The parts are then rebuilt by NGINX and used with proxy_pass.

Note: Just to be clear, this code assumes the remote_url in the File model is not one that a user would ever be able to set. Otherwise any random file on the internet could appear to come from your site. In actual code we use the AWS django-storage backend to create a pre-signed S3 object URL with a short expiry and only store the S3 reference in the File model.

from django.shortcuts import get_object_or_404
from django.http import HttpResponse
from .models import File
from urllib.parse import urlparse
def download(request, id):
file = get_object_or_404(File, id=id)
# Check permissions, do logging, whatever.
url = file.remote_url
file_name = file.file_name
protocol = urlparse(url).scheme
# Let NGINX handle it
response = HttpResponse()
response['X-Accel-Redirect'] = '/file_download/' + protocol + '/' + url.replace(protocol + '://', '')
response['Content-Disposition'] = 'attachment; filename="{}"'.format(file_name)
return response
location ~ ^/file_download/(.*?)/(.*?)/(.*) {
# Only allow internal redirects
internal;
# How to resove remote URLs, you may want to update this depending
# on your setup, in our case it’s inside a Docker container with
# dnsmasq running.
resolver 127.0.0.1 ipv6=off;
# Extract the remote URL parts
set $download_protocol $1;
set $download_host $2;
set $download_path $3;
# Reconstruct the remote URL
set $download_url $download_protocol://$download_host/$download_path;
# Headers for the remote server, unset Authorization and Cookie for security reasons.
proxy_set_header Host $download_host;
proxy_set_header Authorization '';
proxy_set_header Cookie '';
# Headers for the response, by using $upstream_http_... here we can inject
# other headers from Django, proxy_hide_header ensures the header from the
# remote server isn't passed through.
proxy_hide_header Content-Disposition;
add_header Content-Disposition $upstream_http_content_disposition;
# Stops the local disk from being written to (just forwards data through)
proxy_max_temp_file_size 0;
# Proxy the remote file through to the client
proxy_pass $download_url$is_args$args;
}

This works well. We have been able to run our permission and logging code as well as set a Content-Disposition response header from our Django code. At this point I was feeling pretty happy with myself but there is a problem with this if the remote server returns a redirect.

Handling Remote Redirects

If the remote server returns a redirect it is passed straight through to the user (and the users browser redirects directly to the remote resource) rather than being handled by NGINX. S3 will occasionally return redirects, especially if you haven’t specified the correct AWS Region.

Thanks to Stack Overflow I found a solution to this:

location ~ ^/file_download/(.*?)/(.*?)/(.*) {
# Only allow internal redirects
internal;
... as above
# Proxy the remote file through to the client
proxy_pass $download_url$is_args$args;
proxy_intercept_errors on;
error_page 301 302 307 = @handle_redirect;
}
location @handle_redirect {
resolver 127.0.0.1 ipv6=off;
set $saved_redirect_location '$upstream_http_location';
proxy_pass $saved_redirect_location;
}

By handling the relevant 3xx HTTP codes, we can deal with the redirects within NGINX.

Summary

So there we have it, a mechanism for directly proxying remote URLs. I was pretty happy with it in the end (although it took a fair amount of work getting it configured correctly). I found using Wireshark to check the exact headers really useful as I initially misunderstood some of the NGINX proxy_ directives.

If you want to see a simple app using this check out the repository. You can run it in a Docker container and try it out.

References:

 

Banner image: Photo by Erol Ahmed on Unsplash 

Media Suite
is now
MadeCurious.

All things change, and we change with them. But we're still here to help you build the right thing.

If you came looking for Media Suite, you've found us, we are now MadeCurious.

Media Suite MadeCurious.