Setting Git to Automatically Back Up My Files
The Why’s and Wherefore’s
If you woke up to find your house burning, what would you grab as you ran out? Okay, yes, your children and significant other, but besides those?
As a data scientist, programmer, and lawyer, obviously I’d grab my computer, the caretaker of my personal information and thoughts, my gateway to education and opportunities.
After all, most of us in the digital age have no valuable papers, stocks, or bonds lying around (hey, that’s just how I imagine life 40 years ago, okay). If you’ve followed Marie Kondo’s advice, your box of mementos might be trim and small too.
Relying mainly on one piece of hardware allows unprecedented amounts of flexibility. That freedom has given rise to all of us (kind-of) digital nomads. Still, we can do better.
What if the computer dies, or gets lost or stolen? What if you’re traveling across borders and don’t want to lug a heavy computer around? And what if you’re aware that the US government claims the right to copy your hard drive (yes, US citizens too) every time you cross the border? Clearly they usually don’t do that, but still.
My computer isn’t what’s most valuable to me. The information on it is, and that’s what future-proofing is all about: the ability to hop machines as time passes.
So why not give myself even more flexibility by automatically backing up that information? Ultimately I’d like to create a remote development environment, accessible from any netbook, but making my information accessible wherever I am, and safe from local loss, is a great start.
With this system in place, I guess if my house burns down, my hands will be empty? What about my stash of chocolate chip cookies??
Whatever, let’s get started.
Finding git files
First things first: what do I even have lying around on my computer? Going to my
Documents folder in my shell, I enter: find . -name '.git' -type d
.
This command delivers a long list of Git repos - much more than I thought I had! Happily, I only see two (mostly empty) nested git folders, whew! I delete these folders.
Further Organizing Git: A Test
However, I kind of have a problem. I have a lot of git-tracked projects already underneath my Documents folder, and in the root of my Documents folder I have files I’d like to track.
As a result, I have two options. I can remove all the git folders at the “leaves” of my directory structure, and then make a parent git folder with submodules, under Documents. Or, I can stray into forbidden territory and try to nest regular git folders in a safe way.
Why is nesting git folders a bad idea? This can cause issues, as the outer folder is also trying to track the inner folder. Git submodules are certainly the way to go if possible, and I’d definitely use them if starting from scratch.
For instance, this Stackoverflow thread warns that git clean -dfx
in the outer
repo will remove the nested repo altogether! I don’t plan on running that
command, but let’s try it out, just in case.
To test, I mkdir gittest
and cd
into it, then create a subgitfolder
. I run git
init
in both. In gittest
, I create .gitignore
, containing only: */*
. This tells
git to ignore all subdirectories, like the subgitfolder I just made,
theoretically preventing them from being cleaned.
In both folders, I create test.txt. I git commit
in both folders, and now comes
the real test. I run git clean -f
, which cleans up all untracked files, and also
git clean -dfx
.
Ah, good: the subgitfolder
and its file is still there. As a result, I have
confidence that I can make Git ignore nested Git repos safely, even if this
isn’t best practice. And so I can track files in Documents, without moving them
or my existing Git repositories.
Setting up a new Git folder
All right, I git init
in my Documents folder, and set up the Git remote -
Bitbucket has free private repos, GitHub has paid ones for cheap.
Next, I can either set up .gitignore as explained here, like this:
# Ignore all files.
*
# Ignore all directories.
*/*
# But not this file.
!To-Do.org
# And not these directories.
!Writing/
!Courses/
# And if the files get moved to a different directory, don't ignore them there either.
*/
Or I can set up my .gitignore
the other way around, like this:
# Ignore this file.
Boring-File.
# Ignore this directory
BoringDirectory/*
Either way, I make sure that this Git repo will ignore all folders with Git repos inside them.
Setting up the Shell Script
Now that we’re all set up with Git, we need to write a shell script, and then set this script to run automatically.
We could use this command, find . -name '.git' -type d
, to find all the .git
folders, then add all their parent directories to an array, and for each one cd
into the directory and run git commit
and git push
.
As I’ve centralized the files I want to track automatically, it’s simpler for me to do this instead:
#!/bin/zsh
date
cd /path/to/myDocuments/
pwd
git add .
git commit -m "Automatic backup"
git push origin master
In the future, I can use this guide or this guide as references if I want to upgrade my shell script.
Finally, we need to make this script executable - we need to give our computer
permission to run it. You can google this, or run something like chmod 0700 backupGit.sh
,
which gives you as owner permission to run it.
Scheduling with Cron
Cron is a Unix utility, meaning it’s a simple command-line program that does one thing and does it clearly and well.
To start using, run crontab -e
on your command line. This will open your crontab
file for editing in vi. You can move down to the bottom of the screen with :
,
the colon, and then exit with x
, or save and exit with wq
.
See this great little page for more.
And as this guide backing up to Dropbox explains, your crontab job should look something like:
@hourly ID=backupgitrepos nice -n 19 /path/to/script/backupGit.sh >> /path/to/script/backupLog.txt 2>&1
This command will make Cron run your task every hour (although I used
@midnight
instead) and then put the log results into a text file.
If struggling with vim is too much for you, put the little crontab script above
into a txt file, then run crontab <filename>
on the command line.
Debugging Cron
You can check if Cron has your job with crontab -l
, which will show all tasks.
You can also put something like */1 * * * * ID=backupgitrepos nice -n 19 /path/to/script/backupGit.sh >> /path/to/script/backupLog.txt 2>&1
,
which will run the task every minute. You can then check your log file to see if Cron is successful.
If you get the Permission denied publickey
error, You may need to give shell
the SSH keys for your Github/Bitbucket repo. If you don’t have those set up,
their tutorials are pretty clear.
And then http://serverfault.com/a/236437 can guide you through installing Funtoo’s keychain (‘brew install keychain’ on OSX) and then adding a few lines to your bash/zsh profiles, and the backupGit.sh script. That way Cron will have the keys as well, and can run the job without you.
And that’s it!
All important documents will now be safely stored, automatically and securely. Naturally I’ve encrypted private documents as necessary, as we shouldn’t store private information unencrypted, and Org-Mode makes that easy.