The No.1 i-Technology Magazine in the World !
   
 
Clint Shank's Blog

Git And Continuous Integration

posted 13 Nov 2008
Subversion is the de facto version control system of the day, but Git is the rising star.  More and more people are using Git, but I'm a bit concerned about its effects on Continuous Integration (CI).

First some background...

Subversion follows a centralized repository model.  So does CVS and many others.  There's one server that everybody commits to.  Git can follow suit or be configured in a distributed fashion.  In a fully distributed model, each developer has a private copy of the repository.  Mercurial is an example of a distributed version control system.

My concern is really with the distributed model.  One of the appeals of this model, and particularly with having your own private repository, is the ability to experiment and check-in/commit at a much finer-grained rate than with a centralized model.  With a centralized model, you need to be more careful of your commits.  Otherwise, you could break the build and everybody else.  So consequently, you commit less frequently here.

But what's better in the context of CI?  I have to be honest and admit that I've never used the distributed model on a project, so I'm being theoretical here.  My gut feel is that with a distributed model, integration will be less frequent.  I believe people will check in to private repositories more often, but will push those changes to the main integration branch less frequently than people committing straight to the main branch in a centralized model.  In a centralized model, integration is in your face.  You can't commit without thinking about it.  In a distributed model, you can get carried away in your own little world.  And we've learned in the agile community that we should be integrating early and often, haven't we?

I compare the distributed model with multiple repositories to a centralized model with multiple branches.  If you can accept this analogy, you can probably see how integration would be less frequent.  I fear with Git that people will adopt more of a distributed model and thus, CI will suffer.  This is pure speculation on my part, and I'm interested to see how things play out.

So my point here is to be mindful of using lots of repositories with Git and its potential negative consequences on CI.

For some more comments on using Git and CI, particularly on larger teams, see this.

tags:        

links: digg this    del.icio.us    technorati    reddit




1. bartman left...
13 Nov 2008 8:31 pm :: http://www.jukie.net/~bart/

Here is my take on the whole distributed versus centralized approach.

At first you would think that people integrate more often in a centralized revision control system. This may be true for simple changes. It fails when you consider larger blocks of code, or bugs that have taken weeks to debug and solve. In these cases developers have been working in isolation on a particular problem and have been completely disconnected. Since their large chunk of code or bug fix is not completely finished yet they end up developing out of tree so that they "don't break the build", as you say. Eventually they commit.

I personally prefer to make small commits that capture my progress even though it may not be perfect. This way if a particular chunk of code does not work out, I can revert it back to something I did an hour ago.

The bonus with a distributed system (where each time a developer checks out they get a branch) over a centralized system, is that are permitted to make "micro commits" without breaking the build.


2. Chris Read left...
14 Nov 2008 4:56 am :: http://www.chris-read.net

Greetings...

Just using a distributed SCM does not stop you from having a central "source of truth" repository that your CI runs off. We use Mercurial for the development of Cruise and have a central repository that Cruise runs off and developers keep in sync with.

Chris


3. Jakub Narebski left...
14 Nov 2008 12:32 pm

First, nothing prevents each developer to have CI enabled on his/her repository, as pre-commit hook. But I think better solution would be to have some "build" repository with CI, which would refuse push if it fails continuous integration tests...


4. Matt Doar left...
18 Nov 2008 1:14 pm :: http://www.pobox.com

I see it as a matter of the commit and integration steps being separate with a distributed version control system. The commits will happen as before, but the integration will now occur separately, just as with plain old branches. The integration step may be manual, but for CI to be useful integration will have to be automated, perhaps with a manually set "not yet ready for integration" flag somewhere. The same tests for integration will still apply - did it build, do the tests pass.


5. Slava Imeshev left...

Clint,

You might want to check our Parabuild. It provides Continuous Integration for Git. You can set it up to cover a private repository and a remote repository to that you are going to push changes to.


6. Enrique Amodeo left...
29 Mar 2010 6:23 am :: http://eamodeorubio.wordpress.com

Hi, We started recently to experiment with distributed version control and immediately we found that CI cannot be done in the traditional way. You cannot have a centralized CI and use distributed version control, it doesn't make sense. We are changing our mindset and we use now what we call multi-level CI or staged CI. It is simple, each developer has his own repository. Each team has an integration repository, and the whole project has a dev repository and a stable repository. So we have different CIs processes arranged in a hierarchical way. When a developer thinks it has worthy changes, it push changes to the team integration repository. It could need to merge and perhaps resolve conflicts. When all branches merged without conflict then it triggers CI on a team level. If team level CI is ok, then the team level changes are pushed to the project dev and a project level CI is made, and so on. We can specify different levels of build for each CI level. For example at team level we run unit tests, and integration test but mocking infraestructure. At project level we deploy at an integration enviroment and run full test suite in a real env. This way of doing things makes sense since we could have several teams in different cities working in different aspects of the same project. If you have one single collocated team, you need only to use one level CI. We have yet little experience since we are experimenting, but it is working good by now. Need to enhance tools perhaps. One good thing is that the changes arrive at global project level less frequently and we can run a more exhaustive test suite. And these changes has been tested at team level so they usually are less buggy.