Backing Up Your Blog Content Using HTTrack

Backing Up Your Blog Content Using HTTrack

I’m pretty strict on making sure I have my data backed up in numerous places and my blog content is no different. I would hate to lose all these years of babbling. In this post I cover how I back up this blog, and this will apply to any blog engine or indeed any website.

This blog is hosted on WordPress.com and I trust the guys at ‘Automatic’ to keep my data safe, but accidents do happen. Ideally I want an up to date backup of this blog together with any images used. Personally I?m not too concerned about having it in a WordPress format but rather actually prefer having the raw content that I could use to recreate the blog elsewhere.

The HTTrack Tool

The tool I use is HTTrack (http://www.httrack.com) which is a web site copying tool. It essentially re-creates a working copy of the site on the local disk (which you can navigate in a browser too). The tool has many features and includes command line interface which makes it easy to run via a scheduled task. The various command line switches are documented here http://www.httrack.com/html/fcguide.html , but I use this simple command below:

C:\HTTrack\httrack.exe http://richhewlett.com -O c:\TargetFolderPathHere

For interest the –O switch tells HTTrack to output the site to disk and hence produce a copy.

You can create a Windows Scheduled Task that periodically runs this command line and you have an automated backup. I personally go a bit further and wrap this command into a Windows PowerShell script. This script creates new folder each time with the current date and implements error handling which writes to the system eventlog.

This script is an example only and comes with no guarantees that it will work for you without modification:

#===============================================
# Backup blog to disk using HTTRACK
# (Created by Rich Hewlett, see blog at RichHewlett.com)
#==============================================
clear-host
write-output "---------------------Script Start--------------------"
write-output " HTTrack Site Backup Script"
write-output "-------------------------------------------------------"

# set file paths
$timestamp = Get-Date -format yyyy_MMM_dd_HHmmss
$TargetFolderPath="F:\MyBlogBackUp\$timestamp"
$HTTrackPath="C:\HTTrack\httrack.exe"

write-output "Backup target path is $TargetFolderPath"
write-output "HTTrack is at $HTTrackPath"

# set error action preference so errors don't stop and the trycatch
# kicks in to handle gracefully
$erroractionpreference = "Continue"

try
{
    write-output "Creating output folder $TargetFolderPath ..."
    New-Item $TargetFolderPath -type directory

    write-output "Download data ..."
    invoke-expression "$HTTrackPath http://MyBlog.com -O $TargetFolderPath"
    write-output "Done with downloading."    

    write-eventlog -LogName "Network" -Source "HTTrack" -EventId 1 -Message "Downloaded blog for backup"

}
catch
{
    # error occurred so lets report it
    write-output "ERROR OCCURRED DURING SCRIPT " $error

    # write an event to the event log
    write-output "Writing FAIL to EventLog"
    write-eventlog -LogName "Network" -Source "HTTrack" -EventId 1 -Message "Download blog for backup FAILED during execution. $error" -EntryType Error
}

write-output "------------------------------------Script end------------------------------------"

I run this job monthly via a Windows scheduled task.

UPDATE: A working Powershell script can be found on my GitHub site.

Using WordPress Export Feature

If you run a WordPress blog you can also do an export via the Dashboard which will export all site content (including comments) which is useful. In addition to the above raw HTML backup above I also use the export tool periodically manually (and therefore infrequently). For more information on this feature check out http://en.support.wordpress.com/export/