Friday, December 3, 2010

Carbonite - Workarounds to back up executables, videos and other files

Carbonite is the online backup service I've ultimately decided to use to protect my data off-site against those proverbial theft/fire/flood events. Unfortunately a Windows Home Server box, as awesome as it is, just isn't off-site (so it gets destroyed or stolen with everything else).

For me, Carbonite does a great job out of the box - documents, photos, music, program/web application source files - ASCX/ASPX/CS/CONFIG/C/CC/CPP/H/HTML/CSS/JS/PHP/Python (ironically including the .pyc files which are actually almost useless to me) /Ruby/Scala/ASM/eqn/JED, program data files - SQL/YAML/XML/XSD/in/out, text files, PDFs, Mercurial repositories (.d and .i for example), compressed and encrypted files - Zip/7z/TrueCrypt/Axcrypt and the vast majority of the other important stuff.


Of course, it's detailed by Carbonite that they do not automatically back up executable files, video files, or many other kinds of files.

In effect, you manually have to go into the relevant folder, right click and select backup (thankfully Ctrl+A to select all and then backing up does work for all files, just not folders).


Why it's a problem
The core is simply - your complete backup is no longer automatic. That means human error begins to creep in.

I'm a computer science student. I have dozens of different projects and assignments completed over the last 4 years, and I know if I come back to them in 5 years, it will be a lot easier to have their executable forms lying around so I can remember the application context more easily.

These are primarily .exe, .o, .jar, .dev and other miscellaneous file formats. Now I will be clear - Carbonite backs up the source forms as detailed above. I'm just saying I want the whole package because some things like compilers and IDEs can become misplaced, hard to find, etc over time. Stuff disappears from the internet all the time, for example if you missed it one of the major original content portals on the internet - Geocities - closed recently.


Workarounds
So, I said Carbonite does by default back up all zip files. This suggests a relatively simple workaround, just put each important file inside a .zip file. Easier said than done?

We'll need some programmatic way of manipulating these files (otherwise it's literally back to the Windows GUI). 7-zip provides such a useful method.

http://dotnetperls.com/7-zip-examples

7za.exe is the command-line version, this one worked for me (though YMMV)
http://downloads.sourceforge.net/sevenzip/7za452.zip?use_mirror=puzzle

Like many programmers, I see myself as pragmatic (e.g. Rasmus Lerdorf, father of PHP - http://itc.conversationsnetwork.org/shows/detail3298.html so I'm going to just do something simple and easy for me.

My backup strategy for these files is basically to call something like the following on the Windows command line, which will create one big .zip file:

C:\Users\Peter>7za.exe a -r -tzip myfiles.zip *.exe


Since this is fundamentally a 3rd tier backup for me, I'm satisfied even though it will take me a little longer to recover the data, and it's possible I've missed something (or Carbonite changes their program's rules, but I reckon they aren't looking to start a war because they'll only lose customers).

Now again being lazy (though I should say this is in the spirit of automation, removing human error), I'd rather not type that into the command line every time (and if I forget something?), so let's turn this into a file called backup_via_carbonite.bat :

GOTO EndComment
This BAT-file zips up executable files,
web site favicons, development files, DLLS,
installers, compilers and other miscellaneous
files so they are backed up by Carbonite.

Please run it through YOUR OWN TESTING if
you plan to use it as part of your backup
strategy!

Notes: - * is a wildcard meaning match all
- "a" means create archive
- the -r recurses through the entire
folder structure
- myfiles.zip is the name of the
resulting .zip archive
- the -x!Downloads\* excludes files in
the Downloads folder, same for AppData

Written by Peter Schmidt
03/DEC/2010
:EndComment
7za.exe a -r -tzip myfiles.zip *.a *.bak *.cab *.com *.dev *.dll *.exe *.ico *.ini *.jar *.lib *.msi *.o *.win -x!Downloads\* -x!AppData\*

Now I can just double click backup_via_carbonite.bat from Windows, put it in my Startup directory, or in the Event Scheduler so it happens as close to automatically as I'd like it to.

My understanding of .zip is it is not a solid compression format and so should be more resilient to small amounts of data corruption, i.e. one flipped bit will not corrupt on average half the files, just one of them.

Now I did say video files. That's another set of extensions - easy to add but hard to discover. Here's a start:

GOTO EndComment
This bat-file zips up some video files
so they are backed up by Carbonite

Written by Peter Schmidt
03/DEC/2010
:EndComment
7za.exe a -r -tzip myvideos.zip *.flv *.mpeg *.mpg *.mp4 *.m4v *.qt *.wmv


Of course anyone with a sizable video library will know this won't scale, will consume a ridiculous amount of space and well...just be unwieldy and bad. It's possible on many connections that this file would never even be completely uploaded (or may change too frequently) - resulting in no backup at all!

I don't have the solution, except to say Windows 7 does provide the lovely feature of libraries. Try compositing all your videos into one library so you can use the Ctrl+A backup above. If you store your videos in separate folders or across multiple drives - you'll need to add each drive/folder to the library unless you've got a more creative workaround - good luck thinking =)


Final note: Ironically because Carbonite stores older versions of files (including the .zip file that will now be being regenerated automatically on schedule or when I restart Windows), this solution will end up costing them significantly more storage space, bandwidth and time than if they just gave me, the informed paying customer, the option to back up what I wanted conveniently.

Eventually I might get around to splitting this up into separate archives for separate folders, but I'm probably too pragmatic with too much other stuff to get on with...330MB is not too bad a zip file to upload dozens of times over.