Amazon S3 Extension

The Amazon S3 extension writes data to Amazon S3 cloud storage rather than to physical media. It is intended to replace the store action, but you can also use it alongside the store action if you'd prefer to backup your data in more than one place. This extension must be run after the stage action.

The underlying functionality relies on the AWS CLI toolset. Before you use this extension, you need to set up your Amazon S3 account and configure AWS CLI as detailed in Amazons's setup guide. The extension assumes that the backup is being executed as root, and switches over to the configured backup user to run the aws program. So, make sure you configure the AWS CLI tools as the backup user and not root. (This is different than the amazons3 sync tool extension, which executes AWS CLI command as the same user that is running the tool.)

You can use whichever Amazon-supported authentication mechanism you would like when setting up connectivity for the AWS CLI. It's best to set up a separate user in the IAM Console rather than using your main administrative user.

You probably want to lock down this user so that it can only take backup related actions in the AWS infrastructure. One option is to apply the AmazonS3FullAccess policy, which grants full access to the S3 infrastructure. If you would like to lock down the user even further, this appears to be the minimum set of permissions required for Cedar Backup, written as a JSON policy statement:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListObjects",
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::my-bucket",
                "arn:aws:s3:::my-bucket/*"
            ]
        }
    ]
}
      

In the Resource section, be sure to list the name of your S3 bucket instead of my-bucket.

When using physical media via the standard store action, there is an implicit limit to the size of a backup, since a backup must fit on a single disc. Since there is no physical media, no such limit exists for Amazon S3 backups. This leaves open the possibility that Cedar Backup might construct an unexpectedly-large backup that the administrator is not aware of. Over time, this might become expensive, either in terms of network bandwidth or in terms of Amazon S3 storage and I/O charges. To mitigate this risk, set a reasonable maximum size using the configuration elements shown below. If the backup fails, you have a chance to review what made the backup larger than you expected, and you can either correct the problem (i.e. remove a large temporary directory that got inadvertently included in the backup) or change configuration to take into account the new "normal" maximum size.

You can optionally configure Cedar Backup to encrypt data before sending it to S3. To do that, provide a complete command line using the ${input} and ${output} variables to represent the original input file and the encrypted output file. This command will be executed as the backup user.

For instance, you can use something like this with GPG:

/usr/bin/gpg -c --no-use-agent --batch --yes --passphrase-file /home/backup/.passphrase -o ${output} ${input}
      

The GPG mechanism depends on a strong passphrase for security. One way to generate a strong passphrase is using your system random number generator, i.e.:

dd if=/dev/urandom count=20 bs=1 | xxd -ps
      

(See StackExchange for more details about that advice.) If you decide to use encryption, make sure you save off the passphrase in a safe place, so you can get at your backup data later if you need to. And obviously, make sure to set permissions on the passphrase file so it can only be read by the backup user.

To enable this extension, add the following section to the Cedar Backup configuration file:

<extensions>
   <action>
      <name>amazons3</name>
      <module>CedarBackup2.extend.amazons3</module>
      <function>executeAction</function>
      <index>201</index> <!-- just after stage -->
   </action>
</extensions>
      

This extension relies on the options and staging configuration sections in the standard Cedar Backup configuration file, and then also requires its own amazons3 configuration section. This is an example configuration section with encryption disabled:

<amazons3>
      <s3_bucket>example.com-backup/staging</s3_bucket>
</amazons3>
      

The following elements are part of the Amazon S3 configuration section:

warn_midnite

Whether to generate warnings for crossing midnite.

This field indicates whether warnings should be generated if the Amazon S3 operation has to cross a midnite boundary in order to find data to write to the cloud. For instance, a warning would be generated if valid data was only found in the day before or day after the current day.

Configuration for some users is such that the amazons3 operation will always cross a midnite boundary, so they will not care about this warning. Other users will expect to never cross a boundary, and want to be notified that something strange might have happened.

This field is optional. If it doesn't exist, then N will be assumed.

Restrictions: Must be a boolean (Y or N).

s3_bucket

The name of the Amazon S3 bucket that data will be written to.

This field configures the S3 bucket that your data will be written to. In S3, buckets are named globally. For uniqueness, you would typically use the name of your domain followed by some suffix, such as example.com-backup. If you want, you can specify a subdirectory within the bucket, such as example.com-backup/staging.

Restrictions: Must be non-empty.

encrypt

Command used to encrypt backup data before upload to S3

If this field is provided, then data will be encrypted before it is uploaded to Amazon S3. You must provide the entire command used to encrypt a file, including the ${input} and ${output} variables. An example GPG command is shown above, but you can use any mechanism you choose. The command will be run as the configured backup user.

Restrictions: If provided, must be non-empty.

full_size_limit

Maximum size of a full backup

If this field is provided, then a size limit will be applied to full backups. If the total size of the selected staging directory is greater than the limit, then the backup will fail.

You can enter this value in two different forms. It can either be a simple number, in which case the value is assumed to be in bytes; or it can be a number followed by a unit (KB, MB, GB).

Valid examples are 10240, 250 MB or 1.1 GB.

Restrictions: Must be a value as described above, greater than zero.

incr_size_limit

Maximum size of an incremental backup

If this field is provided, then a size limit will be applied to incremental backups. If the total size of the selected staging directory is greater than the limit, then the backup will fail.

You can enter this value in two different forms. It can either be a simple number, in which case the value is assumed to be in bytes; or it can be a number followed by a unit (KB, MB, GB).

Valid examples are 10240, 250 MB or 1.1 GB.

Restrictions: Must be a value as described above, greater than zero.