Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.
X
Post

Mount S3 Buckets On Ubuntu Server With S3QL

Senary Drive icon set by Milos Mirkovic
Ubuntu Senary Drive icon by Milos Mirkovic

This post will show you how to mount an S3 bucket to your Ubuntu Server for unlimited pay-as-you-go storage that works just like a locally mounted drive. There are handful of open source projects out there for mounting S3 buckets on Linux. Here we’re highlighting the S3QL project created by Nikolaus Rath.

At the time of writing S3QL is the most actively developed project of the bunch with code commits as recent as the last few days. Its easy to install and setup, has great documentation and even though its Beta software, it comes with a solid set of features including; data de-duplication, dynamic sizing, encryption and a focus on performance.

Database icon by Barry Mieny
Database icon by Barry Mieny

Before getting started it’s helpful to understand a bit about how S3QL stores files. It works by taking the contents of your files and splitting them up into individual blocks. Each block is then stored inside your S3 bucket. Internally S3QL uses a SQLite database to keep track of how everything is stored so your data can be retrieved quickly and efficiently. Common operations like renaming, moving and copying files don’t even hit the network because they’re all cataloged in the database.

S3QL includes built-in support for data de-duplication. As new blocks of a file are created it intelligently checks to see if any of the new blocks are identical to the existing blocks its keeping track of in its database. If they match, it simply links the new block to the existing block on S3 so it never sends duplicate data over the network. Similarly when files are changed it only transports the individual blocks that changed which is a big performance win over constantly transporting large files when only a few small pieces have changed.

While they’re some great benefits to the approach S3QL takes there are some tradeoffs. Because of the way S3QL splits everything into blocks it tracks with a database it’s not designed with the intention of sharing buckets with other S3 clients. Accessing a S3QL managed bucket with a S3 client like the AWS Management console or tools like Forklift and Expandrive will only reveal the individual blocks. Only S3QL knows how to re-assemble those blocks back into your files. Along the same lines, buckets mounted with S3QL are designed to be mounted in one place at a time so it’s not a solution for sharing data between two servers using S3QL simultaneously. If a sharing is a must-have feature consider the Dropbox approach instead.

With that in mind S3QL is a great solution provided your goals are in-line with it’s features. Let’s start mounting S3 buckets!

Installing S3QL

S3QL packages are available for Ubuntu 10.04 (Lucid) and newer.

Install python-software-properties

sudo aptitude install python-software-properties

Add S3QL package repositories

sudo add-apt-repository ppa:nikratio/s3ql
sudo add-apt-repository ppa:ubuntu-rogerbinns/apsw

Update aptitude and install s3ql

sudo aptitude update
sudo aptitude install s3ql

Setup Your Authinfo File

Before you have S3QL create a new bucket it’s a good idea to decide on a bucket name and setup an authinfo file that it’ll use to pick-up your AWS Access Key ID, AWS Secret Access Key and a password you set to encrypt your data. The authinfo file isn’t required but will save you some typing when you’re managing your buckets.

Create the .s3ql directory and authinfo file

mkdir ~/.s3ql
vi ~/.s3ql/authinfo

Add these two lines to the authinfo file

backend s3 machine any login AWSAccessKeyID password AWSSecretAccessKey
storage-url s3://your-bucket-name password yourEncryptionPassword

Be sure to replace AWSAccessKeyID, AWSSecretAccessKey, your-bucket-name and yourEncryptionPassword in the lines above. Login to your AWS account to grab your Access Key ID and Secret Access Key. You can set the encryption password to anything you want (aim for 10 characters).

Set secure permissions on the authinfo file

chmod 600 ~/.s3ql/authinfo

Create a New File System and S3 Bucket

Use mkfs.s3ql to create a new S3QL file system and bucket

mkfs.s3ql s3://your-bucket-name --s3-location us-west-1

Depending upon where your located you may want to change the S3 location from the default EU to something closer to you. Here we’re setting it to us-west-1. You can read more about the different options available in the mkfs.s3ql documentation.

Mount Your S3 Bucket

The next step is to create the mount point for your bucket and make sure the user you’re logged in as has the correct permissions to access it.

Setup /mnt/cloud-drive as the mount point and set appropriate ownership

sudo mkdir /mnt/cloud-drive
sudo chown username /mnt/cloud-drive

Now your ready to mount your bucket to the new mount point!

Use mount.s3ql to mount your bucket

mount.s3ql --cachesize 204800 s3://your-bucket-name /mnt/cloud-drive

The –cachesize 204800 option set the cache size to 200MB instead of the 100MB default. You can read more about the different options available in the mount.s3ql documentation.

Now that your bucket is mounted you can begin using it to store data! If you run into errors you can always run the fsck.s3ql to help resolve them.

Un-mounting is just as easy as mounting.

Use umount.s3ql to un-mount the bucket

umount.s3ql /mnt/cloud-drive

Mount Your Bucket Automatically On Boot

Nikolaus created a handy script to automatically mount your bucket on boot.

description	"S3QL Backup File System"
author		"Nikolaus Rath <Nikolaus@rath.org>"
 
# This assumes that eth0 provides your internet connection
start on (filesystem and net-device-up IFACE=eth0)
stop on runlevel [016]
 
# Fill in your bucket name and mount point
env BUCKET="s3://your-bucket-name"
env MOUNTPOINT="/mnt/your-mount-point"
 
expect stop
 
script
    # Redirect stdout and stderr into the system log
    DIR=$(mktemp -d)
    mkfifo "$DIR/LOG_FIFO"
    logger -t s3ql -p local0.info < "$DIR/LOG_FIFO" &
    exec > "$DIR/LOG_FIFO"
    exec 2>&1
    rm -rf "$DIR"
 
    # Check and mount file system
    fsck.s3ql --batch "$BUCKET"
    exec mount.s3ql --upstart --allow-other "$BUCKET" "$MOUNTPOINT"
end script
 
pre-stop script
    umount.s3ql "$MOUNTPOINT"
end script

To use this script create a new file in your /etc/init directory.

sudo vi /etc/init/s3qlmount.conf

Copy the contents of the script to the new file. Then make sure to fill in your bucket name, mount point and network interface (eth0, eth1, etc.) with the correct values.

You’ll also need to make sure you setup your authinfo file as described above so S3QL can authenticate correctly. Finally you’ll need to copy the .s3ql folder so it’s also under /root/ which is where the startup script will be looking for the authinfo file.

Put a copy of .s3ql directory under /root/ and adjust permissions

sudo cp -R ~/.s3ql /root/
sudo chown -R root.root /root/.s3ql

When it’s all setup the next time your system boots up your S3 bucket will automatically be mounted for you!

That should be enough to get you started! Big thanks to Nikolaus Rath for sharing his software with the world! For more info checkout the official documentation, FAQ and Google Group.

  • http://pulse.yahoo.com/_2JKCOV26NQSZCXZ2PTKYPHBN7Y awakenedlion

    If the server crashes can the bucket be mounted to another server or the encryption can only be decrypted on the server that wrote the files to the bucket?

    Also when uploading large files from a remote location, does it have to be uploaded to the local server and then the local server directing it to S3 or it will directly upload to the S3 bucket? The reason I am asking is to know how bandwidth is being used on the local server side. I am trying to do this to free up bandwidth from remote uploads.

  • http://rawberg.com/blog David Feinberg

    I haven’t tested the first scenario so I can’t advise you there. Regarding the second question, yes bandwidth will be used when the server uploads data to S3. I hope you find a good solution for the use case you’re trying to support!

  • Ray Lance

    Where can I find out more about the ver 1.5 switch to authinfo2 and the apparently associated MissingSectionHeaderError: File contains no section headers?

  • http://www.omerp.net/2011/12/06/install-dropbox-on-ubuntu-server-10-11/ Install Dropbox On Ubuntu Server (10 & 11) | Open Mind

    [...] on your headless Ubuntu Server and link it up to your Dropbox account. Unlike the process of mounting an S3 bucket we looked at before the Dropbox approach is a much better solution for sharing files. If you’re [...]

  • Rick

    I can’t seem to get the authinfo file to work properly on Ubuntu 10.04. When I run “mount.s3ql –cachesize 204800 s3://your-bucket-name /mnt/cloud-drive” it asks me to enter the backend login info. When I enter that info, the drive mounts and I can send items to s3 no problem. However, when I apply the .conf script the drive fails to mount and I have this error in the fsck log file: “2012-01-15 21:32:13.909 [2008] MainThread: [root] Uncaught top-level exception. Traceback (most recent call last):
    File ‘/usr/bin/fsck.s3ql’, line 9, in ()
    load_entry_point(‘s3ql==1.8.1′, ‘console_scripts’, ‘fsck.s3ql’)()
    Current bindings:
    load_entry_point =

    File ‘/usr/lib/pymodules/python2.6/s3ql/fsck.py’, line 1065, in main(args=['--batch', 's3://myriad_backup'])
    bucket = get_bucket(options)
    Current bindings:
    bucket undefined
    get_bucket = (global)
    options = Namespace(authfile=’/root/.s3ql/authinfo2′, batc…>, quiet=False, storage_url=’s3://myriad_backup’)

    File ‘/usr/lib/pymodules/python2.6/s3ql/backends/common.py’, line 1094, in get_bucket(options=Namespace(authfile=’/root/.s3ql/authinfo2′, batc…>, quiet=False, storage_url=’s3://myriad_backup’), plain=False)
    return get_bucket_factory(options, plain)()
    Current bindings:
    get_bucket_factory = (global)
    options = Namespace(authfile=’/root/.s3ql/authinfo2′, batc…>, quiet=False, storage_url=’s3://myriad_backup’)
    plain = False

    File ‘/usr/lib/pymodules/python2.6/s3ql/backends/common.py’, line 1122, in get_bucket_factory(options=Namespace(authfile=’/root/.s3ql/authinfo2′, batc…>, quiet=False, storage_url=’s3://myriad_backup’), plain=False)
    config.read(options.authfile)
    Current bindings:
    config =
    config.read = <bound method SafeConfigParser.read of > (local)
    options = Namespace(authfile=’/root/.s3ql/authinfo2′, batc…>, quiet=False, storage_url=’s3://myriad_backup’)
    options.authfile = ‘/root/.s3ql/authinfo2′ (local)

    File ‘/usr/lib/python2.6/ConfigParser.py’, line 286, in read(self=, filenames=['/root/.s3ql/authinfo2'])
    self._read(fp, filename)
    Current bindings:
    self =
    self._read = <bound method SafeConfigParser._read of > (local)
    fp =
    filename = ‘/root/.s3ql/authinfo2′

    File ‘/usr/lib/python2.6/ConfigParser.py’, line 482, in _read(self=, fp=, fpname=’/root/.s3ql/authinfo2′)
    raise MissingSectionHeaderError(fpname, lineno, line)
    Current bindings:
    MissingSectionHeaderError = (global)
    fpname = ‘/root/.s3ql/authinfo2′
    lineno = 1
    line = ‘backend s3 machine any login XXXXXXX password XXXXXXXn’

    Exception: MissingSectionHeaderError: File contains no section headers.
    file: /root/.s3ql/authinfo2, line: 1
    ‘backend s3 machine any login XXXXXXX password XXXXXXXn’
    _Error__message = “File contains no section headers.nfile: /root/.s…sword XXXXXX\n’”
    _get_message =
    _set_message =
    append =
    args = (“File contains no section headers.nfile: /root/.s…sword XXXXXXXX\n’”,)
    filename = ‘/root/.s3ql/authinfo2′
    line = ‘backend s3 machine any login XXXXXXXXXX password XXXXXXXXXn’
    lineno = 1
    message = “File contains no section headers.nfile: /root/.s…sword XXXXXXXXX\n’”

    I have substituted XXX’s for password & secret word. I also created a “authinfo2″ file as well as a “authinfo”. Any idea what I might be doing wrong??

  • mp3foley

    I just had this problem also, looks like the format of the authinfo file has changed and is now called authinfo2.
    http://www.rath.org/s3ql-docs/authinfo.html

  • http://rawberg.com/blog David Feinberg

    Thanks for pointing out that update mp3foley!

  • Nikratio

    Please note that the ppa:ubuntu-rogerbinns/apsw ppa is actually incompatible with recent S3QL versions and must *not* be added.

  • Nikratio

    The bucket can be mounted on another server.

    An S3QL bucket appears as a local file system on the computer that mounted it. Uploading directly from a remote location into the bucket is therefore not possible.

  • Anonymous

    Well written article. Clean and Clear….Thanks.

  • Kampret

    how to use it after through the setup ? in my command line I do cd /mnt/s3 but the content only lost+found file. what is that mean ? thanks