Unit and functional testing git with RSpec

Git, RSpec, Ruby, Testing Posted on

If you have ever written a Ruby application that interacts with git, you are probably already aware of the pains of testing such behavior. As if checking if git is installed at the proper version and catching all the crazy typos was not enough, git's algorithm's for calculating commit SHAs make functional testing quite cumbersome!

Unit testing git with RSpec

Unit testing git with RSpec has always been relatively easy. As you may recall, unit tests are all about message sending. Unit tests answer the question:

Did I send the correct message to the system?

So in this context:

Did I run the "correct" git command?

Please note that unit tests do not answer the question:

Did the git command run successfully?

Consider the following class, which fetches new commits if they are present and clones the repository if it does not exist on disk:

class GitFetcher
  def initialize(url, revision = 'master')
    @url = url
    @revision = revision
  end

  def git_clone
    git("clone #{@url}")
  end

  def git_reset
    git("fetch --all")
    git("reset --hard #{target_revision}")
  end

  def target_revision
    git("rev-parse #{@revision}")
  end

  private

  def git(command)
    shellout!("git #{command}")
  end
end

It is a fairly common practice to wrap the git call in a private method as shown above, making it very easy to unit test the code:

describe GitFetcher do
  describe '#git_clone' do
    let(:url) { 'https://fake.repo.git' }

    subject { described_class.new(url) }

    before { allow(subject).to receive(:git) }

    it 'clones the remote repository' do
      expect(subject).to receive(:git).with("git clone #{url}").once
      subject.clone
    end
  end
end

And this process is repeatable for any of the git calls in your class. However, unit testing does not actually invoke git. You could have completely passing unit tests, but if the git binary is not present on the system, the application will fail to execute. You could also have a syntax error or invalid flag that unit tests would happily accept (since it is just string matching), but functional tests would raise an exception.


Functional testing git with RSpec

Functional testing git with RSpec is a bit more difficult. Since functional tests actually execute and modify the state of the system, we must be especially careful.

Functional tests provide the answer to our earlier question:

Did the git command run successfully?

Functional testing with git would involve:

  1. Creating a fake git remote
  2. Including some tags and branches
  3. Pushing some commits

We could perform this action once, package the resulting item as a "fixture" and utilize the fixture in our tests. This, however, becomes problematic for a few reasons. What happens if the fixture accidentally becomes tainted due to developer error? How do we reproduce the fixture? What if we need to test another "thing"?

For the utmost flexibility, we should generate our fake git remotes and revisions on-the-fly at runtime. Here is a tiny Ruby helper that I have used in a few projects like Omnibus and Berkshelf:

module GitHelpers
  def remote_git_repo(name, options = {})
    path = File.join("spec/tmp/git_remotes", name)
    remote_url = "file://#{path}"

    FileUtils.mkdir_p(path)

    Dir.chdir(path) do
      git %|init --bare|
      git %|config core.sharedrepository 1|
      git %|config receive.denyNonFastforwards true|
      git %|config receive.denyCurrentBranch ignore|
    end

    Dir.chdir(git_scratch) do
      # Create a bogus file
      File.open('file', 'w') { |f| f.write('hello') }

      git %|init .|
      git %|add .|
      git %|commit -am "Initial commit for #{name}..."|
      git %|remote add origin "#{remote_url}"|
      git %|push origin master|

      options[:tags].each do |tag|
        File.open('tag', 'w') { |f| f.write(tag) }
        git %|add tag|
        git %|commit -am "Create tag #{tag}"|
        git %|tag "#{tag}"|
        git %|push origin "#{tag}"|
      end if options[:tags]

      options[:branches].each do |branch|
        git %|checkout -b #{branch} master|
        File.open('branch', 'w') { |f| f.write(branch) }
        git %|add branch|
        git %|commit -am "Create branch #{branch}"|
        git %|push origin "#{branch}"|
        git %|checkout master|
      end if options[:branches]
    end

    path
  end

  def git(command)
    shellout!("git #{command}")
  end
end

This helper encapsulates the logic required to create a remote git repository. To eliminate the network and bandwidth constraints, the remote git repository is actually created locally on the target system, in a different directory. When you git push, the remote destination is actually just a different place on the file system. Kinda cool :).

To gain access to the fake_git_remote and git helper methods in RSpec, we include this helper method in RSpec:

RSpec.configure do |config|
  config.include(GitHelpers)
end

Now that we have the ability to generate an on-the-fly remote git repository, we can clone the remote repository and assert that the value for HEAD is correct:

describe GitFetcher do
  describe '#git_clone' do
    let(:repo) { remote_git_repo('fake') }

    subject { described_class.new(repo) }

    it 'clones the repository' do
      subject.git_clone
      expect(git('rev-parse HEAD')).to eq('52c72c4a5cca61b15399ff5b53402ce715ez7146')
    end
  end
end

If you execute this functional test a few times, you will see that the resulting git SHA for HEAD changes on each run! It is actually incredibly difficult to functional test git because of the algorithm used to generate the unique SHA for each commit. Unlike a simple directory SHA would could be easily reproduced, git takes into account the following:

  • The author's name
  • The author's email
  • The author date
  • The committer's name
  • The committer's email
  • The commit date
  • The commit message
  • The SHA of the directory

So, even though the commit messages and file contents were exactly the same, git is generating a different, unique SHA for each of our commits due to the changing timestamp (in this case).

Most of these attributes are "fakable", but it requires a bit of magic. In addition to having the same directory structure and commit message, the following environment variables must match exactly:

GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL
GIT_AUTHOR_DATE
GIT_COMMITTER_NAME
GIT_COMMITTER_EMAIL
GIT_COMMITTER_DATE

The _DATE fields are actually named inappropriately as they are more timestamps than dates. Git is very particular about the format of these timestamps as well:

Tue Jul 23 00:00:00 1991 +0000

Thankfully, Ruby makes generating this format easy:

Time.at(680227200).utc.strftime('%c %z')

In your tests, you will want to set a "static" time for all commits to ensure the SHAs are generated correctly. The format of the remaining values does not matter, but they must be set exactly the same for each commit. It is also important that you forge the author and committer values as well. By default, git pulls these values from your local git configuration, however, these values are likely to differ on your coworker's machines or the continuous integration environment. Forcing these attributes to be the same in all environments will help ensure a consistently-passing test suite. Here is the expanded git call, in which we forge the git environment.

module GitHelpers
  # ...

  #
  # Execute the given git command, forging author, commiter, and timestamps to
  # ensure consistent SHAs.
  #
  # @param [String] command
  #
  def git(command)
    original_env = ENV.to_hash

    time = Time.at(680227200).utc.strftime('%c %z')

    ENV['GIT_AUTHOR_NAME']     = 'my name'
    ENV['GIT_AUTHOR_EMAIL']    = 'me@example.com'
    ENV['GIT_AUTHOR_DATE']     = time
    ENV['GIT_COMMITTER_NAME']  = 'my other name'
    ENV['GIT_COMMITTER_EMAIL'] = 'me2@example.com'
    ENV['GIT_COMMITTER_DATE']  = time

    shellout!("git #{command}")
  ensure
    ENV.replace(original.to_hash)
  end
end

If you are not familiar with Ruby, this method:

  1. Saves the current environment (env) to a variable
  2. Generates a timestamp in the correct format
  3. Forges the git-related environment variables
  4. Executes the git command
  5. Restores the original environment

In this example, I used the same timestamp twice, but you could also use different timestamps for the author and commit dates. However, the timestamps must be consistent for each call. If you change any of these values in the future, the git-generated SHAs will also all change!

Now that our helper uses the enhanced git method, git will always generate the same SHA, provided the same directory tree and commit message. In my experience, the easiest way to get the correct SHA is to run the test with bogus data, look at the spec failure, and then copy the correct SHA into place.

describe GitFetcher do
  describe '#git_reset'  do
    let(:revision) { git('rev-parse HEAD') }

    context 'when the version is a tag' do
      let(:version)  { 'v1.2.3' }
      let(:remote)   { remote_git_repo('zlib', tags: [version]) }

      it 'parses the tag' do
        subject.git_reset
        expect(revision).to eq('53c72c4abcc961b153996f5b5f402ce715e47146')
      end
    end

    context 'when the version is a branch' do
      let(:version) { 'sethvargo/magic_ponies' }
      let(:remote)  { remote_git_repo('zlib', branches: [version]) }

      it 'parses the branch' do
        subject.git_reset
        expect(revision).to eq('171a1aec35ac0a050f8dccd9c9ef4609b1d8d8ea')
      end
    end

    context 'when the version is a ref' do
      let(:version) { '45ded6d' }
      let(:remote)  { remote_git_repo('zlib') }

      it 'parses the ref' do
        subject.git_reset
        expect(revision).to eq('45ded6d3b1a35d66ed866b2c3eb418426e6382b0')
      end
    end
  end
end

I hope you enjoyed this post on unit and functional testing with git. At the time of this writing, I have not been able to figure out how to forge a specific revision in a git repository.

Do you have another tip, trick, idea, or suggestion? Please leave a comment below!

About Seth

Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.