The cback-amazons3-sync command

Introduction

The cback-amazons3-sync tool is used for synchronizing entire directories of files up to an Amazon S3 cloud storage bucket, outside of the normal Cedar Backup process.

This might be a good option for some types of data, as long as you understand the limitations around retrieving previous versions of objects that get modified or deleted as part of a sync. S3 does support versioning, but it won't be quite as easy to get at those previous versions as with an explicit incremental backup like cback provides. Cedar Backup does not provide any tooling that would help you retrieve previous versions.

The underlying functionality relies on the AWS CLI toolset. Before you use this extension, you need to set up your Amazon S3 account and configure AWS CLI as detailed in Amazons's setup guide. The aws command will be executed as the same user that is executing the cback-amazons3-sync command, so make sure you configure it as the proper user. (This is different than the amazons3 extension, which is designed to execute as root and switches over to the configured backup user to execute AWS CLI commands.)

Permissons

You can use whichever Amazon-supported authentication mechanism you would like when setting up connectivity for the AWS CLI. It's best to set up a separate user in the IAM Console rather than using your main administrative user.

You probably want to lock down this user so that it can only take backup related actions in the AWS infrastructure. One option is to apply the AmazonS3FullAccess policy, which grants full access to the S3 infrastructure. If you would like to lock down the user even further, this appears to be the minimum set of permissions required for the aws s3 sync action, written as a JSON policy statement:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket",
                "arn:aws:s3:::your-bucket/*"
            ]
        }
    ]
} 
			

In the Resource section, be sure to list the name of your S3 bucket instead of my-bucket.

Syntax

The cback-amazons3-sync command has the following syntax:

 Usage: cback-amazons3-sync [switches] sourceDir s3bucketUrl

 Cedar Backup Amazon S3 sync tool.

 This Cedar Backup utility synchronizes a local directory to an Amazon S3
 bucket.  After the sync is complete, a validation step is taken.  An
 error is reported if the contents of the bucket do not match the
 source directory, or if the indicated size for any file differs.
 This tool is a wrapper over the AWS CLI command-line tool.

 The following arguments are required:

   sourceDir            The local source directory on disk (must exist)
   s3BucketUrl          The URL to the target Amazon S3 bucket

 The following switches are accepted:

   -h, --help           Display this usage/help listing
   -V, --version        Display version information
   -b, --verbose        Print verbose output as well as logging to disk
   -q, --quiet          Run quietly (display no output to the screen)
   -l, --logfile        Path to logfile (default: /var/log/cback.log)
   -o, --owner          Logfile ownership, user:group (default: root:adm)
   -m, --mode           Octal logfile permissions mode (default: 640)
   -O, --output         Record some sub-command (i.e. aws) output to the log
   -d, --debug          Write debugging information to the log (implies --output)
   -s, --stack          Dump Python stack trace instead of swallowing exceptions
   -D, --diagnostics    Print runtime diagnostics to the screen and exit
   -v, --verifyOnly     Only verify the S3 bucket contents, do not make changes
   -w, --ignoreWarnings Ignore warnings about problematic filename encodings

 Typical usage would be something like:

   cback-amazons3-sync /home/myuser s3://example.com-backup/myuser

 This will sync the contents of /home/myuser into the indicated bucket.
         

Switches

-h, --help

Display usage/help listing.

-V, --version

Display version information.

-b, --verbose

Print verbose output to the screen as well writing to the logfile. When this option is enabled, most information that would normally be written to the logfile will also be written to the screen.

-q, --quiet

Run quietly (display no output to the screen).

-l, --logfile

Specify the path to an alternate logfile. The default logfile file is /var/log/cback.log.

-o, --owner

Specify the ownership of the logfile, in the form user:group. The default ownership is root:adm, to match the Debian standard for most logfiles. This value will only be used when creating a new logfile. If the logfile already exists when the cback-amazons3-sync command is executed, it will retain its existing ownership and mode. Only user and group names may be used, not numeric uid and gid values.

-m, --mode

Specify the permissions for the logfile, using the numeric mode as in chmod(1). The default mode is 0640 (-rw-r-----). This value will only be used when creating a new logfile. If the logfile already exists when the cback-amazons3-sync command is executed, it will retain its existing ownership and mode.

-O, --output

Record some sub-command output to the logfile. When this option is enabled, all output from system commands will be logged. This might be useful for debugging or just for reference.

-d, --debug

Write debugging information to the logfile. This option produces a high volume of output, and would generally only be needed when debugging a problem. This option implies the --output option, as well.

-s, --stack

Dump a Python stack trace instead of swallowing exceptions. This forces Cedar Backup to dump the entire Python stack trace associated with an error, rather than just propagating last message it received back up to the user interface. Under some circumstances, this is useful information to include along with a bug report.

-D, --diagnostics

Display runtime diagnostic information and then exit. This diagnostic information is often useful when filing a bug report.

-v, --verifyOnly

Only verify the S3 bucket contents against the directory on disk. Do not make any changes to the S3 bucket or transfer any files. This is intended as a quick check to see whether the sync is up-to-date.

Although no files are transferred, the tool will still execute the source filename encoding check, discussed below along with --ignoreWarnings.

-w, --ignoreWarnings

The AWS CLI S3 sync process is very picky about filename encoding. Files that the Linux filesystem handles with no problems can cause problems in S3 if the filename cannot be encoded properly in your configured locale. As of this writing, filenames like this will cause the sync process to abort without transferring all files as expected.

To avoid confusion, the cback-amazons3-sync tries to guess which files in the source directory will cause problems, and refuses to execute the AWS CLI S3 sync if any problematic files exist. If you'd rather proceed anyway, use --ignoreWarnings.

If problematic files are found, then you have basically two options: either correct your locale (i.e. if you have set LANG=C) or rename the file so it can be encoded properly in your locale. The error messages will tell you the expected encoding (from your locale) and the actual detected encoding for the filename.