Setting Git to Automatically Back Up My Files

1. The Why's and Wherefore's
2. Finding git files
3. Further Organizing Git: A Test
4. Setting up a new Git folder
5. Setting up the Shell Script
6. Scheduling with Crontab
7. And that's it!

The Why’s and Wherefore’s

If you woke up to find your house burning, what would you grab as you ran out? Okay, yes, your children and significant other, but besides those?

As a data scientist, programmer, and lawyer, obviously I’d grab my computer, the caretaker of my personal information and thoughts, my gateway to education and opportunities.

After all, most of us in the digital age have no valuable papers, stocks, or bonds lying around (hey, that’s just how I imagine life 40 years ago, okay). If you’ve followed Marie Kondo’s advice, your box of mementos might be trim and small too.

Relying mainly on one piece of hardware allows unprecedented amounts of flexibility. That freedom has given rise to all of us (kind-of) digital nomads. Still, we can do better.

What if the computer dies, or gets lost or stolen? What if you’re traveling across borders and don’t want to lug a heavy computer around? And what if you’re aware that the US government claims the right to copy your hard drive (yes, US citizens too) every time you cross the border? Clearly they usually don’t do that, but still.

My computer isn’t what’s most valuable to me. The information on it is, and that’s what future-proofing is all about: the ability to hop machines as time passes.

So why not give myself even more flexibility by automatically backing up that information? Ultimately I’d like to create a remote development environment, accessible from any netbook, but making my information accessible wherever I am, and safe from local loss, is a great start.

With this system in place, I guess if my house burns down, my hands will be empty? What about my stash of chocolate chip cookies??

Whatever, let’s get started.

Finding git files

First things first: what do I even have lying around on my computer? Going to my Documents folder in my shell, I enter: find . -name '.git' -type d.

This command delivers a long list of Git repos - much more than I thought I had! Happily, I only see two (mostly empty) nested git folders, whew! I delete these folders.

Further Organizing Git: A Test

However, I kind of have a problem. I have a lot of git-tracked projects already underneath my Documents folder, and in the root of my Documents folder I have files I’d like to track.

As a result, I have two options. I can remove all the git folders at the “leaves” of my directory structure, and then make a parent git folder with submodules, under Documents. Or, I can stray into forbidden territory and try to nest regular git folders in a safe way.

Why is nesting git folders a bad idea? This can cause issues, as the outer folder is also trying to track the inner folder. Git submodules are certainly the way to go if possible, and I’d definitely use them if starting from scratch.

For instance, this Stackoverflow thread warns that git clean -dfx in the outer repo will remove the nested repo altogether! I don’t plan on running that command, but let’s try it out, just in case.

To test, I mkdir gittest and cd into it, then create a subgitfolder. I run git init in both. In gittest, I create .gitignore, containing only: */*. This tells git to ignore all subdirectories, like the subgitfolder I just made, theoretically preventing them from being cleaned.

In both folders, I create test.txt. I git commit in both folders, and now comes the real test. I run git clean -f, which cleans up all untracked files, and also git clean -dfx.

Ah, good: the subgitfolder and its file is still there. As a result, I have confidence that I can make Git ignore nested Git repos safely, even if this isn’t best practice. And so I can track files in Documents, without moving them or my existing Git repositories.

Setting up a new Git folder

All right, I git init in my Documents folder, and set up the Git remote - Bitbucket has free private repos, GitHub has paid ones for cheap.

Next, I can either set up .gitignore as explained here, like this:

# Ignore all files.
*
# Ignore all directories.
*/*

# But not this file.
!To-Do.org

# And not these directories.
!Writing/
!Courses/
# And if the files get moved to a different directory, don't ignore them there either.
*/

Or I can set up my .gitignore the other way around, like this:

# Ignore this file.
Boring-File.

# Ignore this directory
BoringDirectory/*

Either way, I make sure that this Git repo will ignore all folders with Git repos inside them.

Setting up the Shell Script

Now that we’re all set up with Git, we need to write a shell script, and then set this script to run automatically.

We could use this command, find . -name '.git' -type d, to find all the .git folders, then add all their parent directories to an array, and for each one cd into the directory and run git commit and git push.

As I’ve centralized the files I want to track automatically, it’s simpler for me to do this instead:

#!/bin/zsh

date
cd /path/to/myDocuments/
pwd
git add .
git commit -m "Automatic backup"
git push origin master

In the future, I can use this guide or this guide as references if I want to upgrade my shell script.

Finally, we need to make this script executable - we need to give our computer permission to run it. You can google this, or run something like chmod 0700 backupGit.sh, which gives you as owner permission to run it.

Scheduling with Cron

Cron is a Unix utility, meaning it’s a simple command-line program that does one thing and does it clearly and well.

To start using, run crontab -e on your command line. This will open your crontab file for editing in vi. You can move down to the bottom of the screen with :, the colon, and then exit with x, or save and exit with wq. See this great little page for more.

And as this guide backing up to Dropbox explains, your crontab job should look something like: @hourly ID=backupgitrepos nice -n 19 /path/to/script/backupGit.sh >> /path/to/script/backupLog.txt 2>&1

This command will make Cron run your task every hour (although I used @midnight instead) and then put the log results into a text file.

If struggling with vim is too much for you, put the little crontab script above into a txt file, then run crontab <filename> on the command line.

Debugging Cron

You can check if Cron has your job with crontab -l, which will show all tasks.

You can also put something like */1 * * * * ID=backupgitrepos nice -n 19 /path/to/script/backupGit.sh >> /path/to/script/backupLog.txt 2>&1, which will run the task every minute. You can then check your log file to see if Cron is successful.

If you get the Permission denied publickey error, You may need to give shell the SSH keys for your Github/Bitbucket repo. If you don’t have those set up, their tutorials are pretty clear.

And then http://serverfault.com/a/236437 can guide you through installing Funtoo’s keychain (‘brew install keychain’ on OSX) and then adding a few lines to your bash/zsh profiles, and the backupGit.sh script. That way Cron will have the keys as well, and can run the job without you.

And that’s it!

All important documents will now be safely stored, automatically and securely. Naturally I’ve encrypted private documents as necessary, as we shouldn’t store private information unencrypted, and Org-Mode makes that easy.

Get the newsletter

Blockchain insights, movement ideas, book reviews, and other sundries delivered to your inbox