Disclaimer: It's hard to imagine a scenario where it would make sense to do this in real life, vs using git or some other version control program.

This long-winded tutorial is purely for fun and education. By implementing a rough file tracker, we can get a pretty comprehensive look at what can be done w/ bash shell scripting on Unix-like systems. If you're new to bash, Unix and learn by doing, read on...

We interact with computers and software in many different ways.  We click, double click, drag, drop, slide, type, talk, etc...

For building software or systems administration, it's essential that you are comfortable interacting with a computer through the command line interface, a powerful tool, also referred to as “the shell”.

‌‌A shell is sort of a special program.  One that takes commands typed from a user's keyboard and passes them to the operating system, which then executes the commands, which are themselves, more programs.  By just doing this and leveraging some of the ingenuity that went into the design and conventions in popular shell programs, we can support some powerful use cases.

While we won't go into all the depths of shell wizardry that can be obtained, we will cover the most common shell usage needed to manage, deploy and build applications on Unix-like operating systems.  

Due to all the variants of these operating systems,  we’ll refer to these as, “*nix”.

‌‌*nix systems are among the most popular in the world. They lay at the heart of our Macs, iPhones, iPads and Chrome devices, and for good reason. *nix OS's, which include Linux, Unix and BSD based systems are built on a popular concept in systems development: small pieces of software that do one thing really well, combined with the power to chain them together easily to accomplish larger tasks.  This is a theme that will frequently pop up in development, and you could literally spend a lifetime figuring out how to better align your work with this one simple concept, ie. interoperability.

‌‌*nix programs are prime examples of this. Through the shell, you can run programs and send the output to more programs, and so on and so on. And with the bash shell, you have a suite of tools to control flows, run conditional steps, loop through items, store and transform program output as variables, and much more.

You can think of each program as having 3 separate channels for data to flow through. There is STDIN (Standard Input), STDOUT (Standard Output), and STDERR (Standard Error).  This is true of all programs running on *nix.  All commands can be described as doing the following: Data is sent in, the program does some stuff, and data comes out through STDOUT or STDERR (if things didn't go so well).‌‌

Tutorial - A Shell Based Version Control System‌‌

Let’s get started!  We are going to write a really simple version control program.

Version control programs keep track of all the edits you make to files. Usually they're used to manage an application code base but you can use them to track just about anything.

As mentioned, we're creating a version control program as a fun way to explore the command line via bash.  If you do need version control you'd be best served by one of the popular open source tools, git, etc...  And if you do want to build your own, this would be a task for a programming environment that's more suited for full-blown application development.

While there are a few popular shell programs (sh, bash, csh, tcsh, ksh) we are going to use bash.  Bash has become very popular with developers and system admins alike.  In fact, bash itself is considered a programming environment, and bash scripts can facilitate all types of things, from making sure your system prompts are configured the way you'd like when you login, to deploying an enterprise app to a cluster of many different servers.

Open up your terminal application and enter your first command.

enter: pwd

What just happened?  

We entered the name of a program called pwd, the shell took the keyboard input on STDIN, found the program, ran it and outputted its results to STDOUT, printing them to the screen.

‌‌pwd stands for Print Working Directory.  In *nix, what we may have previously called folders are called directories.  They are simply buckets for files.  Directories can have directories inside them, and so on and so on.  pwd is extremely useful and you'll use it a lot. And particularly with *nix, if you forget where you are when you run certain programs you can mess things up quite a bit and have a bad afternoon. so 'pwd' often just to be safe :)‌

OK, back to directories.  /Users/rpastore is my home directory.  Whenever I log into my machine that's where it will start me off.

When looking at a path, a "/" indicates the level in the directory hierarchy, where on the left side of "/" is one higher than what's on the right, and so on and so on.

Let's go a few levels up and look around!

cd ../../
pwd

‌‌

cd is another very popular command, which changes our directory. The input you give to cd is a path. You can give it a complete path, such as, /home/marvin/projects/Q_36_Immodium_Space_Modulator, or a relative path like projects or ../logs.  In our case, we gave it two of "../", each representing one level to go up.

‌‌So two levels up and we are at the highest level on our machine. The path here is simply "/".  Let’s look around some more.

ls

This is the top level organizational structure of a *nix file system. You should acquaint yourself with what typically goes in here in detail, but here's a quick overview of some of the most important top-level directories that are common across most *nix distributions.

  • bin - primary system executables. These are the core set of programs that allow most of the things on the system to happen, coincidentally this is where ls and pwd live!
  • etc - this is where many different configuration files live.
  • var - semi-temporary files, including system logs, etc...
  • tmp - temporary files of all types, periodically cleaned out.‌‌
  • boot - files necessary to boot the OS
  • home - home directories of all users in the system except for the root user (the all-powerful administrator).
  • sbin - more executables, mainly stuff that is geared toward system admin.
  • usr - various files that are for system-wide use, man pages, libraries, more executables, etc..
  • dev - device files. *nix treats everything like a file.  Mounting particular devices, like cdrom, results in files in this directory for the OS to interact with (pipe data to, read data from, etc…).

Ok, let's go back to where we came from, we'll use a short-cut

cd -

‌‌

cd with a "-" after it will take you to the last directory you were in. Additionally, cd without any input, from wherever you are, will take you to your home directory.

‌‌For our versioning system, let's create some files and mess around with them. First we need a place to keep them all so we don't make a mess out of our home directory.  Let's create a new directory and cd into it.

mkdir versioning
cd versioning

mkdir does what you'd think, makes a new directory.  You can even give it multiple names as input and it will make all of them. ie: "mkdir music pics movies".

Now that we're inside, let's make some files.  First we'll create a basic text file. There are a few different text editor programs that are available from the shell, the most common, and my personal favorite, is vi.  There is a bit to learn there so we'll use an easier one for the sake of staying on track.‌‌

To create our files we'll use nano, which is not as powerful as vi but very approachable and easy to use for quick editing.  It's also available in most systems so you can rely on it being in just about any environment you'll be working in.  Just make sure nobody sees you using it or you'll lose your hacker creds. :)

nano some_text

‌‌

Here you can type freely.  Enter some text, it can be anything.  Along the bottom of the screen you'll see the program's menu, where options like ^X mean press the Ctrl key plus “x” to exit.  

Let's exit the text file with Ctrl + x.

ls

‌‌Now we see our new text file. Let's try ls with some more options.

ls -al

‌‌This is another very common task, so much so that you'll probably use ls -al more than you do ls by itself.  The "-al" means, pass two command line parameters, a and l to the program.  "a" means include any files that start with ".", which there can be a lot of.

‌‌Starting filenames with “.” is a common way for other programs to store data in files without cluttering up directories, they are hidden by default.  We'll actually use this in our versioning program.

‌‌The "l" means to print out the long format which gives us various details on all the files. It also will include the total bytes at the very top.

Perhaps the most awesome *nix program available is man.  man, then the name of a program will give you the man-pages, which are documentation on the program usage.  You can spend lots of time just looking around and seeing what’s in there. It's a bit of a rabbit hole but fun nonetheless.

OK, now we have a directory with a file with some text inside of it.  We want to keep track of changes to that file so we can look back at a how a file looked during a given time or how it changed over time.

What will we need to do?

Take some time to think about it and write down your steps.  It doesn't need to be technical, just conceptual. What things would need to happen?  Feel free to assume everything we'll need will be at our disposal somehow from the vast greatness of *nix systems and bash scripting capabilities.

Here's what I came up with:

  • Periodically check the file contents
  • If a files contents are different than the last recorded revision, create a new snapshot of the file.
  • If a files contents are not different, do nothing.
  • Store all file snapshots somewhere so they can be accessed later for future use.

The “periodic checking” part is something best left towards the end.  Assuming this can be easily done via a few ways on *nix systems, it’s better we get all the other parts working separately first, since they're more unknown. But also,  this will be easier to test as we’ll be working with isolated pieces.

Once they all work, we’ll just need to wrap them in some functionality to perform the “periodic” part.

On to our first step in the problem, checking revisions. Given that we only have one file with text, we don't really have any revisions with which to compare yet.   It would be tempting to muddy our code with functionality to check for this edge case but let's do something else.  

Going back to the *nix principal of many small pieces of software doing a great job at one thing, whatever that one thing is, you should incorporate some type of barrier to entry, if not, your code gets muddled with the handling of edge cases, which in short, means more opportunities to create bugs.‌‌

Initialization Script

So, let's create our barrier to entry (the input), an initialization script!‌‌

We have a directory where we want to start some type of version control. Let's create a solution where a user has to run one script to initialize the directory for all future use.

We'll need to decide how we want to store versions.  What we'll need to store are the contents of the version and the date and time it was saved.  The OS will actually do a lot of this for us.  All files have their date and time of last save/update stored, we'll just need to retrieve that info.

Also, we'll be putting many versions of the same file in a single directory so we'll need a way to make the file names unique.  Appending a system-generated timestamp to the filename should do it, something like SampleFile__1377272601.  *nix timestamps are integer representations of the number of seconds since 1970-01-01, what is called the epoch.

Aside from being a character from the Matrix movie, an epoch refers to the origin point of a chronological period. In early versions of unix there were system limitations that only allowed representations of time spans of up to 829 days. For the sake of getting more time they choose an epoch that was relatively recent, 1971-01-01. Though, short after, new advances allowed for 136 year time representations.  With the new allowance of time, the epoch was rounded down for neatness sake to 1970-1-1.

Ok, back to our init script.

date +%s

‌‌

This is the number of seconds since the epoch.  What we'll need to do is: for every file in our directory, create a copy of it with a timestamp in the name.  We don't want to muddy up our directory with all these copies, so let's create a hidden directory to store them.  If you’ll remember back, ls -al enabled us to see files that begin with ".", these won't be shown in display until ls -al is called so it’s considered a hidden file.  The same goes for directory names.‌‌

In *nix, directories conceptually store files but in reality directories are files themselves.  Back to the notion of chain-ability, *nix treats just about everything like a file.  For instance you can write data to a socket which is then sent out over a network in the same way you would write data to a file that is stored on disk.  A directory is just a file.  You can even try and open a directory in nano to see what's in there.

First we'll need a new file wherein we can put our script.  Create a new file called initialize.sh and inside it put the command to create a new directory called, ".versions".  Save it and exit.  Hint: mkdir .versions

This is our script.  Let's see if it runs.

# don't worry if this doesn't work yet :)
./initialize.sh

Notice here we prefix “./” to the file script name.  *nix has a concept of PATH, where PATH is a list of directories that contain executable files (AKA programs, scripts, applications, executables, etc..).   This allows us to run things like ls or mkdir without actually supplying the full path to those programs.  To let bash know that we are trying to run the local initialize.sh file and not looking for an iniatialize.sh file somewhere in our PATH, we us “./”.

‌‌Also you may notice, this didn’t run, why not?

‌‌*nix has a way of declaring which files can be run as programs, called "executables".  *nix applies permissions to all files, and associates those permissions with a concept of users and groups.  This is pretty powerful and allows many users on a single system to operate without messing up important system files or each others stuff.  We'll get more into this in a bit but essentially you can grant types of permissions to a file, which include permission to write, read and execute.  And you can grant these permissions to three distinct categories of people:  file owner, file group owner and everyone else.

An executable is just a file with the executable permission applied to it.  So let's give our initialization script that permission, we'll use the chmod (man chmod to read more about it) program, which changes the modes/permission for a given file.

Here we simply add the executable permission w/ the command with "+x" and filename as arguments. Note before you run it, check out the current permissions so you can see how they change.

ls -al
chmod +x initialize.sh
ls -al

Now let's try and run our script again.‌  This time it should work!‌

./initialize.sh
ls -al

If it worked (it should), we now have a ".versions" directory!

The next thing we’ll have to do is create the first version of all of our files.  We have a small problem though, we already created our .versions directory so if we run our script again during development/testing it will try to create the same directory, we'll see output like mkdir: .versions: File exists.  Not the end of the world but it's a bit sloppy.  What are our choices?

We can:

1: check to see if directory exists before we create it

I don't really like this one because I don't think a user should run initialize more than once. Perhaps later we can turn this script into something a bit more multi-purposed that’s ran like "./versioning --action=initialize".  But let's not get ahead of ourselves.  

2: Add some logic to the script that can prompt the user to see if they want to wipe out the versions directory and start with clean slate.

I like this because it's a nice feature, but maybe it's also not the right time. Let's get the basics working first and remember this for later on.

3: Not worry about it for now!

I like this one! It only affects development and we are the only ones working on it.  So as long as we can deal with seeing the “file exists” message it’s OK.  If we had teammates who would be confused by the message, it would be a different story.

Let's go with option 3 and get back to the task at hand.  We now need a way to create a copy of all of the files in our directory into the .versions directory.  Let's again break the problem into steps:

  1. We need a list of all files in the directory.
  2. We need to consume our list in a way that gets us strictly the filename so we can pipe (we'll learn how to do that don't worry) that into our next step.
  3. For each file, create a copy in the .versions directory.

One thing i find useful in all types of programming is to be able to test out small pieces outside the context of your program. Let's get all the above concepts working alone in the shell prompt before we combine them in our initialize.sh script.

For the first part, we know how to get a list of files in a directory with ls -al, but the shell has a much simpler mechanism for operating on files.  It allows wildcards.  For example, we can run a command on all files in a directory with the wildcard character, *.  Let's use the "file" command as an example.  file is a great utility to figure out what's in one or more files at a quick glance.

‌‌

file *

‌‌You can also use wildcards to match other patterns.  For example, if you want to only operate on the html files within a directory, you could do so with "*.html".  This is pretty powerful.  A step further would be to use regular expressions, which are a much more comprehensive pattern matching tool, but definitely overkill here.

Now we have our list.  Only,  we don't want to make versions of our initialize.sh script.  Should we find a way to filter it out of "*"?  That seems a bit fraught with peril as we'll eventually have many different file types and extensions that will be versioned, those can potentially mess up our filter and create bugs. I think a better approach is to take all of our versioning executables and put them in a safe place that won't be in the way.  A very common convention is to create a ".bin" or sometimes a "scripts" dir for this.  

Let's do that, then move our initialize.sh script.

mkdir .bin
mv initialize.sh .bin/

The mv command moves files.  Moving a file to a directory puts that file in the directory.  Note above, the trailing slash is not needed but this is a good practice.  mv is a destructive command which means you cannot undo it. If you accidentally move a file to another file that wasn't a directory you'd be overwriting that second file, losing its contents forever (unless you have version control of course!).  By adding the trailing slash, if you accidentally put another file as the target location, you'll just get a message to tell you the target is not a directory, a much better outcome than losing data.

‌‌

So let's make sure things are still working after our invocation of mv, now we have a slightly different way to run our script since the path has changed.‌‌

.bin/initialize.sh

‌‌Ok, we've moved our initialize script out of the way and now we can safely loop through all files in our directory.  Bash, along with all other shell programs, have mechanisms for scripting which include ways to loop over lists of items and operate on each item.  With bash this is a for loop.  Let's loop over the files in our directory and print out their names.

for f in *; do echo "A file here called, ${f}.."; done‌‌

Now we see the loop in action and we're using a program called echo which simply writes its STDIN to STDOUT.  

Looking a bit closer, we have 3 statements in this above snippet. They are separated by ";", which tells bash that that is the end of one statement to be executed.  In the context of a script, you could put each statement on a separate line and bash will execute them in order. The ";" essentially allows us to combine statements onto one line.  Where you use them depends on preference.

Broken down, here’s what each part says.‌‌

for f in *
This means take a list, the contents of "*" which will match all files in our current directory, and loop over each item.  This sets up the next part which is a command that we can run for each item as we are looping.
do echo "A file here called, ${f}.."
do means to invoke the following command, which is echoing some text to STDOUT.  The text contains ${f}, which is a variable holding the current (mid-loop) file’s name.  "for f" sets up that variable.  By adding the "$" in front we are then telling bash we wish to access the variable, "f", and not just print the alphabetic character, “f”.

Wrapping the f in brackets allows us to use that variable in strings of text in a safe way.  We could have done without the curly braces here but in general it's easier to use them instead of  remembering when they are needed and perhaps forgetting, causing bugs.

done
done tells bash that what we want to do inside our loop is complete.  It also tells bash it can now execute the code.

‌‌

To recap, for starts the interpretation of a statement, and done tells bash we're ready to run the entirety of it in the context of the loop.

Now let’s change the printing-of-names part to what we really need: copying the files to our .versions directory.  

We'll use the cp command which copies files.  cp can be invoked like "cp hmm .verions/", which would just copy the file into .versions or it could be used to copy a file and give it a new name like, "cp hmm .version/ahh".  We'll use the latter in conjunction with our timestamp generating shell command from earlier, date +%s, to create a new and unique filename.

‌‌

for f in *; do cp $f .versions/${f}__`date +%s`; done

‌‌‌‌

Here we just replaced:

do echo “A file here called, ${f}..

with:

do cp $f .versions/${f}__`date +%s`

‌‌‌‌

This copies the file, represented by $f, to the .versions directory with the name generated by, ${f}__`date +%s`.  In that naming expression we are using string interpolation ie. we are taking a string of text and using syntax to replace parts of that text with something else, in our case, both with the contents of a variable and the output of another command.‌‌

Notice the two ` characters surrounding the date +%s,  "back-ticks". They are used for command substitution, which is like calling a command from within a command, often using the output of the inner command as part of the outer command.  

Calling date within a command to help build a filename is a pretty common practice.  It’s useful for things like rotating log files to manage their size, later finding production issues on a given date, ie: production.log.2013-03-10.

OK, now we know this piece works.  So open up .bin/initialize.sh and add the complete for loop snippet directly below the first command.  While we're at it, let's add one more line below that to let us know, explicitly, that all commands have finished.‌

echo "done";

Let’s give it a try and run it in entirety.  See below for troubleshooting.

Checking for Revisions

‌‌

We have initialize.sh working, now we need the other parts, checking file contents against the revision history, etc…  Let's start a new script for that part, called check_for_versions.sh.‌‌

What will go in this script? Take a minute and write some rough steps

You guessed it!  We need to only write a new  .versions file if the current file has changed from the most recent .versions file.  In programming logic, “we need to conditionally write the file if the data in the source files doesn't equal the data in the last known revision file”.

This means a few things:

  • We need to conditionally write a file based on something, so we need an "if" statement.
  • We need to determine if the data in one file equals the data in another.
  • Third we need a way to find the last known revision.

Luckily, these are all common tasks, some within itself bash and some with the help of popular *nix commands.‌‌

Before we start coding sometimes useful is to give ourselves a good starting point, something similar that already works.  The for loop in the initialize.sh is a great starter because it does a lot of what we need to do for the 'diff' check, looping, creating files, etc..  So let's add that loop statement to the file and see if it works.

This code alone will create more versions every time you run it but we'll just have to do some small changes to make it do what we want.  

for f in *; do cp $f .versions/${f}__`date +%s`; done

Since we now have more to add we'll want to break this into multiple lines for the sake of readability. We can lose the semi-colons since there will be lines to separate the statements, though we don’t have to. The general rule of thumb with all code is a line should never exceed the length where another developer would have to start scrolling or have a hard time reading for some other reason.  Even if you're on a super wide monitor, your teammates may be looking at your code on their laptops.  If you keep your lines relatively short and concise you should be all good.

for f in *
    do cp $f .versions/${f}__`date +%s`
done
echo "done"

‌‌‌‌

We need to only create the version file if there is a difference between the current file and the latest version.  We need a few things here: the "if", the "diff" and the grabbing of the last known revision.

‌‌A good practice is start from the inside out, meaning, get all the pieces working separately before we put them together,  starting with the retrieval of the last known revision since that is needed for the diff.‌‌‌‌

We know ls lists files, there is also a very handy option for sorting by most recently edited, "-t". The problem now is the output is still not in an easy format to access a single file from the results. There are a few ways we can do this.  Luckily, ls as an option, "1" , that will force the output into one line per entry and doesn't include anything but the filename.‌‌

ls -t1 .versions

This will  get us a sorted list but we just want the first line.  To get the first line or last line in output like this there are two handy commands, head and tail.  head has an argument, "n", that will let us specify the amount of lines to show from the beginning of the output.  Here we just want one.

head -n 1

But how do we get the output of ls into head?‌‌

Piping!

This is one of most powerful shell features, the ability to pipe the STDOUT of one command into the STDIN of another command, all with a single character, "|", beauty!‌‌

ls -1t .versions | head -n1
# hmm__1376154098

‌‌Here we pipe the results of our listing to head which will grab the first item.  This almost completes our need to get the last known edit of a file for our next need step, which is determining if there is a difference.

‌‌Right now we are listing all the files in the versions directory but what we really need is to list only those version files that relate to the source file we are checking for diffs against.

grep!

grep is a frequently-used tool for many.  It's absolutely essential, from small program development to large systems debugging.  grep takes lines as input and prints those lines to STDOUT that match a given pattern.  Of course the pattern can have wildcards or other powerful matching expressions.

Back to the task at hand.  Remember, we are in the context of looping through all of our files, and for each, getting the last known revision and checking for a difference.  Let's use grep to limit the output of "ls -1t .versions".   We are already piping to "head -n1", let's just add one pipe in between and grep for our filename, which if you'll remember is stored in an in-loop variable.

ls -1t .versions | grep "hmm__" | head -n1

Notice in our script, "hmm__" will be replaced by a variable that represents the filename while mid-loop. For now we'll hard code a name in there to see if it works by itself.  Here, "hmm__" is the filename plus the "__" we used to separate the filename from the version/timestamp.  ‌‌‌‌

Let's keep things a little more clean and put the output of this in a variable.

last_edit=$(ls -1t .versions | grep hmm__ | head -n1)
echo $last_edit

‌‌Be careful with bash variables. A space before or after the "=" would have broken this line. Kind of a pain, need to always remember this one.

Now that we can obtain our last-saved version file.  We need to see if there is a difference between that file and the current file in our loop.  For a second, let's continue to forget about our loop and the context of the larger script and focus on the single statement that gets us our difference (or lack thereof).

You may be noticing a pattern here, *nix has utilities for just about everything you need! So of course, there is a command called diff, which simply takes two files and prints out any difference between them.

Here is an example of one file with a single line saying "hello" and another with a single line that says something else.‌‌

diff file1 file2‌‌
#1c1
#< hello
#---
#> i'm not the same

‌‌

If there were no difference, the output of diff would be empty.  Here's another very handy way to see if the output of a shell command is empty, piping STDOUT to wc.  wc stands for word count, and like everything else in *nix has some great options, including the ability to count lines. So if the output of diff has no lines, there is no difference. If it has any lines, there is a difference. And no matter how small, every difference gets a version file.  Here's an example:

diff file1 file2 | wc -l
#4‌‌
diff file1 copy_of_file1 | wc -l
#0

So let's put the diff count in a variable as well.

diff_lines=$(diff .versions/$last_edit hmm | wc -l)
echo $diff_lines

‌‌

Now we're almost there.  The only missing step is to conditionally create the copy in the versions directory based on the output of the diff statement, what we're holding in $diff_lines.  Just like other programming environments, bash has an if statement that can wrap code that’s only meant to run if certain conditions exist.

Bash’s conditional syntax is a bit tricky, as there are a lot of different ways to do the same thing. If you find yourself writing a lot of shell scripts you should get to know it well. The basic outline is:

if [ $diff_lines -eq "1" ]
    then echo "hey"
fi

In the first line, if starts the conditional.  The square brackets, [], hold the expression that will evaluate to true or false.  then precedes the functionality that will run if the expression evaluates to true.  fi basically tells the bash interpreter we're done with our conditional statement.

‌‌

Let's put it all together!

‌‌

Here, we'll replace any hard-coded values from previous examples with the variables created by the for loop. We'll also add a bunch of printf statements that will tell us what's going on while the script is running.  printf is better than echo and is generally more portable as it complies with standards across different *nix systems.  After each statement, you'll also see "\n" at the end, which just means “insert a newline character here”.

In raw text files newlines are actually character or characters sequences that represent the line. They are hidden in most textual tools by default.  Configuring them to be visible can sometimes be helpful if you’re trying to debug problems where new lines could be having an adverse effect.‌‌

So here its is, check_for_edits.sh in entirety:

‌‌

printf "Beginning check for duplicates\n"
for f in *
    do
        last_edit=$(ls -1t .versions | grep ${f}__ | head -n1)
        printf "Checking last versioned file if $f...\n"

        diff_lines=$(diff .versions/$last_edit $f | wc -l)
        printf "Found ${diff_lines} line differences...\n"

        if [ $diff_lines != "0" ]
            then cp $f .versions/${f}__`date +%s`
            printf "Created new version file for $f...\n"
        fi
    done
printf "done"

‌‌

Periodic Checking For Versions

‌‌

Finally, we reach the last thing we need.  The ability to run this periodically to get all our edit history. *nix systems have a scheduling utility called cron for scheduling tasks. It's very handy and powerful but our purposes, let’s use something a bit more explicitly under our control.‌‌

A useful way to implement programs that should just run until they are stopped is to use a "while" loop. A while loop runs while a certain condition is true.  We can hardcode the value to be true, as such, the expression being checked will always evaluate to true so the program will run until you stop it.

Try this...‌‌

while false; do echo "this won't do anything"; done;

Now try:

while true; do echo "whoa"; done;

‌‌

You can exit the loop with "Ctrl + c" (hold down control and hit c).

‌‌This is a bit over the top as it’s constantly checking, wasting the machines resources.  So let's only do this every 3 seconds.  To do that we'll add only one more command called "sleep", which just makes the process wait for a given amount of seconds before continuing.  This is much less expensive from a machine/resource perspective.  

Try this...‌‌

while true; do sleep 3; echo "ahh, this is more relaxing"; done;

So let's wrap our entire script in a while loop...  See if you can get it to work before copy/pasting the below.

‌‌

while true
    do
        printf "Beginning check for duplicates\n"
        for f in *
            do
                last_edit=$(ls -1t .versions | grep ${f}__ | head -n1)
                printf "Checking last versioned file if $f...\n"
   
                diff_lines=$(diff .versions/$last_edit $f | wc -l)
                printf "Found ${diff_lines} line differences...\n"

                if [ $diff_lines != "0" ]
                    then cp $f .versions/${f}__`date +%s`
                    printf "Created new version file for $f...\n"
                fi
            done
        printf "done"
        sleep 3
    done

‌‌‌‌

Ok, now this works.  It runs every 3 seconds but has a major setback: it never stops running so how could we go onto editing our files!?

Foreground and Background Processes

‌‌

We could open another shell window but that would give our program a pretty lame user experience.

Luckily, *nix has a concept of foreground and background processes.  Right now, we're running this in the foreground, meaning, it's directly connecting to our current shell session. Let’s run this in the background, so we can keep it running, yet not in our face, leaving our shell open to do other things like edit the files and see our versions successfully start to pile up.  To run a process in the background you just need to add an & to the end of the command.

‌‌

Output Redirection

‌‌

We want things to run in the background but we also don't want to lose track of what our program is doing, so let's direct all the existing output (STDOUT, the results of printf statements) into a file.

‌‌

Remember, earlier we talked about how shell commands have 3 basic channels or file descriptors for data: STDIN, STDOUT and STDERR.  There is a bash syntax we can use to redirect these channels accordingly.  The 3 file descriptors have integer representations we can use to access them and just about everything in *nix behaves like a file.

0 - STDIN

1 - STDOUT

2 - STDERR

We can use > to stream data into a file and >> to append data.  Here are some examples:

‌
echo "hello" > hello.txt
ls
cat hello.txt
‌‌‌‌

In addition to redirecting STDOUT into a file, we can append data to a file as well.‌‌

echo " world" >> hello.txt
cat hello.txt
#hello world

‌‌‌‌

Here we send STDERR to a file.  Since there are no errors, errors.txt is empty.

‌‌

echo "no errors here" 2> errors.txt
cat errors.txt

‌‌

Here we do the same as above only let’s cause an error.

‌‌

diff non_existent_file another_non_existent_file 2> errors.txt
cat errors.txt

‌‌‌‌

A common thing to do is to combine STDOUT and STDERR into a single file handle…  This is what we'll do for our script, output everything into a log file.  Note:  This is not a best practice for application logging.  

.bin/check_for_versions.sh > .logs/versioning.log 2>&1

‌‌

This appears to do nothing, because all output is now going into the versioning.log file instead of our screen. Let it sit for a few seconds, then "Ctrl + c" out of it.  Let’s see all our data in the log file now.

cat .logs/versioning.log

‌‌

We now have our output going someplace reasonably responsible.  Let's run this thing in the background and really see our versioning system in action!  To do this, we just need the & at the very end of our command.

.bin/check_for_versions.sh > .logs/versioning.log 2>&1 &
[1] 13948

‌‌

IMPORTANT:  Notice the output.  This is the process number that we've dispatched in the background.  Keep track of this number as we'll need it to stop our process later on.  We'll also discuss a way to automatically track this number.  Let's see how many version files we have.  Edit the file, hmm, save, then check the count of version files again.

ls -al .versions | wc -l

‌‌

Looks like it is working but we forgot something important!

What happens if we add new files?  Create a file called “foo” and put some text in it.  In our log file you may now see an entry that looks like:‌‌

Beginning check for duplicates
Checking last versioned file of foo...
diff: .versions/foo: No such file or directory
Found        0 line differences...

‌‌

We initialize at the beginning, but we forgot to initialize the first versions of new files!  No biggie, what can we do to fix this?  Take a look at your check_for_edits.sh file.  We just want to make sure that for every file we think that we're versioning, we have at least one file in the .versions dir.

last_edit=$(ls -1t .versions | grep ${f}__ | head -n1)

So if last_edit is empty, let's create our first version file.  In a line directly below this one, let's put:

if [ "$last_edit" == "" ]
    then cp $f .versions/${f}__`date +%s`
    printf "Created first version file for $f...\n"
fi

‌‌

Notice this time we put $last_edit in quotes.  This is particular to bash, you can get weird errors if you are doing comparisons with variables that could potentially be empty.

‌‌We made a new change so we need to stop our current background process and restart it.  Let's use the kill command which sends signals to processes.  There are few different signals, in most cases, your program can catch these signals and do something.  In the case of -9/KILL there is nothing it can do, it must exit.  Give kill your process number from above.  In case you lost this you can find it again with "jobs -p"

kill -9 13948 # usen your proccess number here

‌‌

Now restart.

‌‌

.bin/check_for_versions.sh > .logs/versioning.log 2>&1 &

‌‌

We should be able to create new files and versions will be created for them. Give it a shot.

‌‌

ls -al .versions | wc -l

‌‌

Create a new file with some text and save...

‌‌

ls -al .versions | wc -l

‌‌

Congratulations!

‌‌

You have created a basic version control program using the *nix shell.  Hoped you liked the tutorial!

Where to Go From Here? To keep learning maybe try some these:

‌‌

Our script only works in one location!  Make use of the *nix PATH environment variable and put our “.bin” contents in a better location.  Also, our scripts will then need to be made aware of where they are running from.  They should still create a .versions directory wherever they are called from, but they will need some help to make the paths all work correctly.
‌‌Our method for starting and stopping the script is a bit heavy and hard to remember.  Make a start.sh and stop.sh that start the process in the background and store the process ID (PID) in a file that can then be used in the stop.sh script to kill the process.  This is a very common method for managing background process.  HINT: you can use backticks and a common program for reading file contents to STDOUT

This program currently only works in a single directory, what if our directory has subdirectories inside it, then subdirectories inside those!?  Add support for recursively tracking files.

‌‌