Periodic Cleanup, Checking, etc. with cron and at

Because MH utilities are individual programs -- instead of a monolithic mail program like mail(1)-- you can run "batch" processes to manipulate your messages for you. Most UNIX systems have either or both of the programs at(1) and cron(8). These utilities execute other programs at the times and dates you choose.

For example, rmm doesn't remove messages; it renames them and leaves the "deleted" files in your mail folder. The .xmhcache files that xmh and exmh leave in folders can take a lot of disk space; you might want to remove those files from folders that haven't been used in a long time. Maybe you want to run a script like autoinc in the middle of the night, when the system isn't busy but you also aren't logged in. Periodically-run programs can do those things and many others. This section introduces the techniques and gives some examples.

Caution

Before we get started, here are things you should be aware of.

There's a risk in running a non-interactive program when you aren't there to supervise. If something goes wrong, the program could cause serious trouble in your mail folders -- and in non-mail directories, too. If you don't have experience with at(1) and cron(8), be careful! Start by running programs that don't remove or modify files.

Even if your job doesn't remove or modify anything, there's another problem. A "background" job can change your current folder or message if it isn't designed to avoid that. That can cause real confusion if you're using an interactive MH session or if you leave MH and come back later -- if you don't realize that the background job may have changed your current folder, message, or the contents of a sequence. The Section Multiple MH Sessions has tips.

It's a good idea to test your setup on a "dummy" account that no one uses. Or, at least, make a separate MH directory and run your jobs there. See the Section A Test Mail Setup.

One other problem that might affect you: If your shell reads a setup file whenever a new shell starts, and that setup file can affect your cron or at job. That's because cron and at start a shell to read the command lines you give them. For example, C shell users' .cshrc files will be read. But C shell .login files and Bourne shell .profile files won't be read because those files are only read when a login shell starts. For more information, check a good book on UNIX shells. (The chapter of UNIX Power Tools about setup files shows ways to customize which parts of the startup files are read by at and cron jobs -- and to work around bugs.)

Starting cron Jobs

cron(8) is a daemon program that runs every minute, checking for users who have jobs to do. On some systems, each user may have a personal crontab file. On other systems, personal crontab files may be disabled for some or all users -- or there may be only one system-wide crontab. Each line in crontab describes a job that cron should run at certain times and dates. The syntax varies a little, system-to-system; check your online crontab or cron manual page for details. Here's an example file:

    53 0 * * *      /bin/sh $HOME/.lib/at_cron/deltempfiles
    # mhmail WON'T SEND AN EMPTY MESSAGE; GREAT WHEN calendar SAYS NOTHING:
    6 6 * * 1-5     /bin/calendar | /usr/local/mh/mhmail $USER -s "calendar Output"
    
In general, it's a good idea to use absolute pathnames (starting with a slash, /) unless you know where they aren't needed. The shell's command search path may not have all the directories (like your personal bin) that you're accustomed to using in a login shell. The current directory for personal crontabs is usually the home directory, but that isn't always true.

The first five fields on a line, separated by spaces, are:

minutes hours day-of-month month day-of-week

Each of those can be a list (like 3,6,9) or a range (like 1-5). An asterisk (*) in a field matches all values. A hash mark (#) in the first column starts a comment.

So, in the example crontab above, the first line runs my deltempfiles script with the Bourne shell every day at 0053 hours (12:53 am). The /bin/sh isn't needed on systems that can execute scripts directly (Section How Does Your System Execute Files? tells how to find out if yours does). The second line is a comment. The third line runs at 6:06 a.m. every weekday morning (Monday is "day 1" on my system's cron).

To make entries in your crontab, there may be a command named crontab(1). Note that many crontab(1) programs are perfectly happy to delete your entire crontab file without confirmation; it's a good idea to save a backup copy of complex files.

Starting at Jobs

The at(1) command varies more, system-to-system, than cron(8) does. Some versions of at let you run jobs periodically. Some let you give fancy date and time specifications; others don't. Check your manual page.

The basic operation of at is simple. On its command line, give the time and an optional date when you want the job to run. (If you don't give the date, the job will run at the next scheduled time, in the next 24 hours.) Then at reads a list of command lines to run from its standard input. So, if you have a file named atjob that looks like this:

    /bin/sh $HOME/.lib/at_cron/deltempfiles
    echo "Deleted temp files from at -- check folders" | /usr/ucb/mail ehuser
    
You can submit it to run at 1:35 a.m. tonight with the following command:
    % at 0135 < atjob
    
(The < is the shell's operator that redirects standard input to come from a file.) That job will run just once. If you want it to run every night, have at resubmit the job. To do that, add sleep and at commands to the end of the file. For example, to run atjob at 1:35 every night, make the file:
    /bin/sh $HOME/.lib/at_cron/deltempfiles
    echo "Deleted temp files from at -- check folders" | /usr/ucb/mail ehuser
    sleep 60
    /usr/bin/at 0135 < atjob
    
The sleep 60 command waits 60 seconds to be sure that the job won't be submitted again at the same minute today; this guarantees that the job won't run twice in the same day. at runs jobs from the same current directory where you submitted the job, so < atjob doesn't need an absolute pathname.

Note about Times

It's a bad idea to schedule jobs at the top or bottom of an hour, like Midnight or 2:30 a.m. Lots of people tend to choose those times; the system can get very slow if 300 cron and at jobs all start at midnight. Choose a time when the system isn't likely to be too busy -- weekends or the middle of the night -- and make the time odd, like 3:24 a.m.

Output and Errors

Depending on your version of cron and at, you may get error messages from your jobs in email. You may get both the standard output and standard error, only the stderr, or nothing.

If your system doesn't show you the errors and output you want to see, redirect the output of jobs to a file or a mail message.

List Old Drafts

Here's a simple job that sends you a list of any old messages in your draft folder. Put the following lines in a file named olddrafts. Change the PATH to contain the MH binaries directory (with scan and mhmail) on your system. Set the address that mhmail mails to:

    #!/bin/sh
    PATH=/usr/local/mh:$PATH
    MHCONTEXT= 
    export PATH MHCONTEXT
    scan +drafts 2>/dev/null | mhmail youraddress -subj "Old drafts in +drafts"
    
The Bourne shell operator 2>/dev/null throws away errors like "no messages in +drafts" so they won't be mailed. mhmail doesn't send a message if there's no text on its standard input.

Remove Messages from rmmer

Here are two versions of a small script file named rm_msgs. It deletes messages removed with rmmer every night. If your system has the xargs command, use the second version; it's more efficient.

    #!/bin/sh
    cd /u/ehuser/Mail || exit
    find `find . -type d -name DELETE -print` -type f -mtime +4 -exec rm -f {} \:
    
Here's the xargs version:
    #!/bin/sh
    cd /u/ehuser/Mail || exit
    find `find . -type d -name DELETE -print` -type f -mtime +4 -print | xargs rm -f
    
The first line changes to your MH Mail directory and (important!) exits if the cd command fails. Otherwise, your find job could start deleting in some other directory. The second line runs two find jobs: one to find the DELETE folders, and one to find messages in those folders that have been there for four days. (If you have experience with the standard UNIX find(1), you might be wondering why I had to use the nested finds. It's because predicates like -name 'DELETE/[1-9]*' don't work.)

I used `find -type d ...` to get a list of folders, instead of using `folders -fast -recurse`, because there might be some other users' folders or read-only folders in the output of the folder command.

Cleaning Up Old Messages

The Section msg: `While You Were Out' Messages with comp shows a setup for telephone messages that leaves a folder which needs to be cleaned out periodically. This script removes copies from the msgs folder that are more than one week old. Of couse, you can adapt this script to do other automated cleanup. This is a file named msgs_clean. Put the following lines in it, making sure to use backquotes (`), not single quotes ('):

    #!/bin/sh
    MHCONTEXT=/tmp/clean$$
    export MHCONTEXT
    /usr/local/mh/rmm `/usr/local/mh/pick -before -7 -list +msgs`
    rm -f /tmp/clean$$
    

Check for Folder Changes

Here's a job that compares the list of folders you had before (yesterday, a week ago, last month...) to the current list of folders. The simplest version of this job just compares a list of folder names. I use it to catch folders that were created accidentally by mistyping a folder name in the fcc: header field and sending the message with push (which creates folders without asking).

    umask 77                                # MAKE LIST OF FOLDERS PRIVATE
    folders=/tmp/FOLDERS$$                  # CURRENT LIST OF FOLDERS
    lastfdrs=/u/jpeek/.lib/last_folders     # PREVIOUS $folders LIST
    folders -fast -recurse > $folders       # GET LIST OF FOLDER NAMES NOW
    diff $lastfdrs $folders ||              # COMPARE TO PREVIOUS LIST...
       cp $folders $lastfdrs                # ...COPY IF THERE WERE CHANGES
    rm -f $folders                          # CLEAN UP
    
The || operator tells the shell to run cp (to update the previous list of folders) only if the diff command returns a non-zero status (because the files were different). It isn't required.

You could also make this job show differences in the summary of folders' contents (the number of messages, the current message, etc.). To do that, don't use the -fast option. Then, instead of a list of folder names, folder will give one-line summaries and the diff will show any changes side-by-side. You'll see which folders are active, filling up quickly, and so on. This is handy if you use slocal or procmail to drop new mail into folders that you could forget to read.