hatestheinternet

I has a bucket

Now that we've got everything set up and our VPS is running a little better, it's time to throw in the last little bit of faster. See, right now, we're still on the hook for serving our content assets. Sure our server's on a good connection, but what could get these tubby beasts out there faster than serving them right out of S3? Exactly.

Most of the posts I read while researching this project involved using modules or arcane shell scripts to push and rsync and do all matter of things that would be perfectly acceptable in the past, but this is the future and we have FUSE (Filesystem in User SpacE) here. This means, basically, we're going to mounting that S3 bucket directly in our server's file system and it'll merrily go about its business without being any the wiser.

Before this can happen we obviously need a bucket to mount, so log in to your AWS account and create one, making sure to enable Static Website Hosting. Also remember that, since S3 is going to be handling the requests itself, it wants our bucket's name to match the hostname we're using otherwise it's not going to work.

After you've done this, the initial population is pretty easy: You can wait until we're done to mount and cp -a your content, or transfer it using something like CrossFTP. Either way, since this is also going to be accessible through our base Apache host, you're going to want to keep the .htaccess Drupal drops in files/ directory.

Now that we has our bucket, let's get FUSE going. Obviously, you'll need to check that your sever supports it, but that's pretty easy: Does /dev/fuse exist? Sometimes it's not active by default so you'll need to dig through the management side of things to turn it on, but it's most likely there somewhere. You should probably also make a note of the permissions since these tend to vary wildly as well.

If you jumped ahead of me and started googling what we're trying to do, odds are you've discovered s3fs. My advice to you is, unless you're a complete and utter masochist, forget you ever heard about s3fs: It's slow, it's unreliable, and it's just generally a pain in the ass. Instead, look for something called riofs (or, for our purposes here, my fork of riofs).

Rio is much faster, considerably more reliable, and although it doesn't support fstab very well, I consider it fire and forget because I start it in /etc/rc.local and the mount has never disappeared or hung or given me anything even remotely resembling a problem. The main difference between my fork and upstream is, instead of libmagic, I check the file's extension and use /etc/mime.types. This gives us finer grained control as the header set here is what S3's static webserver sends with its response.

After checking to make sure you have FUSE's development headers and libevent installed, check out my fork and simply:

sh autogen.sh
./configure --with-mimetypes --without-libmagic && \
    make && \
    make install

If you had to manually install libevent and it complains it's missing, add PKG_CONFIG_PATH to /usr/local/lib/pkgconfig before ./configure in the command above and you should be golden. You may need to tinker around to get the right development packages installed, but you really don't need that many.

Once you're up and running, open up your favourite text editor and check out /usr/local/etc/riofs.conf.xml. Specifically, the UID/GID children of the filesystem node. These should match whatever's running your Apache server if you're going to be running rio as another user.

Also, if you look under the S3 node, you'll find another one of my additions: cache_control. The value of this string is set as a Cache-Control header on objects we generate as S3's web server doesn't add any headers itself. You should be fine with public, max-age=31536000 here.

After you've pasted the AWS key and secret for the IAM user you made specifically for this (right... Right?), it's time to give it a test:

riofs -o nonempty,allow_other \
content.yourdomain.com \
/var/www/html/sites/yourdomain.com/files

The nonempty option tells riofs to not care if there's anything in that directory (and since my site repo has the default files .htaccess, there will always be something there), allow_other tells FUSE to let other users make use of this mount, which is useful since I start it from /etc/rc.local and it runs as root.

After making a quick stop by your DNS zone file to CNAME your content host's entry to the bucket's endpoint (which you can find through the AWS console), you should now be able to see and access the contents of the S3 bucket over the web and via your local file system.

If you're running Drupal, here's some magic you also might want to put in the redirection rules found under Static Website Hosting in the bucket's properties that will make your image renditions work again:

<RoutingRules>
  <RoutingRule>
    <Condition>
      <KeyPrefixEquals>styles/</KeyPrefixEquals>
      <HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals>
    </Condition>
    <Redirect>
      <HostName>yourdomain.com</HostName>
      <ReplaceKeyPrefixWith>sites/yourdomain.com/files/styles/</ReplaceKeyPrefixWith>
      <HttpRedirectCode>307</HttpRedirectCode>
    </Redirect>
  </RoutingRule>
</RoutingRules>

TL;DR: Any object whose key starts styles/ that we can't find, rewrite back to Drupal so it can generate the derivative.

Also, you'll probably need to add the following snippet to your bucket policy to make static website hosting work at all:

{
    "Version": "2008-10-17",
    "Statement": [{
        "Sid": "PublicReadGetObject",
        "Effect": "Allow",
        "Principal": {
            "AWS": "*"
        },
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::content.yourdomain.com/*"
    }]
}