Git Merge Deep Dive

In my last post I outlined how my team standardized our process for diffing and merging code by configuring Meld for use in Visual Studio and other IDEs. During this process, I did a lot of research on git-merge and came away with several key insights that I want to document in this post.

I have a lot of experience with git, and because I follow a pretty strict gitflow policy, I rarely experience issues with merging. When other developers – especially the juniors – approached me with questions about why their merge isn't making sense, I never really had good answers. I gave them my naïve assumptions about how merging works, but I couldn't explain specifically why Visual Studio picked one side of the diff or the other.

I started my research by trying to reverse engineer what Visual Studio does when you merge and resolve conflicts. That lead me to explore some other tools, like Meld, and to experiment with some of the optional merge strategies within git. Here is what I found.

Setup

For the proceeding examples I will be using my own fork of the official ASP.NET MVC demo project. I started by cloning and checking out a new branch called develop.

$ git clone https://github.com/akmolina28/AspNetDocs.git

$ git checkout -b develop

First, I committed a simple change in the AccountController refactoring the name of a variable.

https://github.com/akmolina28/AspNetDocs/commit/b8e0bfb74d177a81f0fc85f54b62b30692a5f503

Next, I checked out the main branch and made some changes in the same file, which will conflict with my changes in the develop branch.

$ git checkout main
https://github.com/akmolina28/AspNetDocs/commit/0254b15ee14fa425e23823ab59ebe8006e867f5f

Now, when I initiate a merge from develop into main, I would expect conflicts because some of the same code was changed in both branches. Below is the result.

$ git merge develop
Auto-merging aspnet/mvc/overview/getting-started/introduction/sample/MvcMovie/MvcMovie/Controllers/AccountController.cs
CONFLICT (content): Merge conflict in aspnet/mvc/overview/getting-started/introduction/sample/MvcMovie/MvcMovie/Controllers/AccountController.cs
Automatic merge failed; fix conflicts and then commit the result.

Resolving Conflicts in Visual Studio

Visual Studio ships with Microsoft's built-in tool for diffing and merging, vsdiffmerge. When Visual Studio is installed and configured for git, it will automatically configure vsdiffmerge as the global mergetool for resolving merge conflicts. Here is what happens when you try to resolve the file that I set up in the previous section.

Visual Studio conflict resolution (vsdiffmerge)

If you have experience with git in Visual Studio, this result should feel very familiar. Visual Studio does a good job of figuring out which changes to take from either side. Here is what the conflict looks like, as expected:

Let's take stock of what we have at this point. The left side has our incoming version from develop, the right side has our local version in main, and at the bottom is the result of our merge, where we make our final changes. 5 changes were auto-merged, which Visual Studio shows with the pre-checked, green selections. The two conflicting changes are highlighted in red.

At this point, I immediately have several questions:

  1. How does Visual Studio know which side to take when it AutoMerges? What exactly does "AutoMerge" mean?
  2. Does git perform the AutoMerge? Or is Visual Studio doing it?
  3. How is the Result file in the bottom pane generated? Is it created by git or by Visual Studio?

In the next sections I will deconstruct the merge process to try to answer these questions.

Back to Basics

For now, let's step away from Visual Studio and understand how to resolve this merge at the most basic level. I will abort the merge-in-progress and start over, just in case Visual Studio changed something.

$ git merge --abort
$ git merge develop

We have generated the same conflict once again. Let's see what git did with AccountController.cs by looking at the file in a text editor:

// ...

namespace MvcMovie.Controllers
{
    [Authorize]
    public class AccountController : Controller
    {
        private ApplicationSignInManager _signInManager;
        private ApplicationUserManager _applicationUserManager;

        public AccountController()
        {
        }

        public AccountController(ApplicationUserManager userManager, ApplicationSignInManager signInManager )
        {
            UserManager = userManager;
            SignInManager = signInManager;
        }

        public ApplicationSignInManager SignInManager
        {
            get
            {
                if (_signInManager == null)
                {
                    return HttpContext.GetOwinContext().Get<ApplicationSignInManager>();
                }
                else
                {
                    return _signInManager;
                }
            }
            private set 
            { 
                _signInManager = value; 
            }
        }

        public ApplicationUserManager UserManager
        {
            get
            {
<<<<<<< HEAD
                if (_userManager == null)
                {
                    HttpContext.GetOwinContext().GetUserManager<ApplicationUserManager>();
                }
                else
                {
                    return _userManager;
                }
=======
                return _applicationUserManager ?? HttpContext.GetOwinContext().GetUserManager<ApplicationUserManager>();
>>>>>>> develop
            }
            private set
            {
                _applicationUserManager = value;
            }
        }
        
// ...
Excerpt from AccountController.cs (MERGED)

It appears that git auto-merged the non-conflicting differences in the same way that Visual Studio did. So, git is definitely doing its own auto-merging here. All of the changes from both branches are reflected in the merged result, except for the conflict where we see the merge markers.

Between <<<<<<< and ======= we have our code from main. And between ======= and >>>>>>> we have our changes from develop. My job here is simple – edit the code by hand and resolve both changes. I am familiar with the code in both branches, so this conflict is trivial for me.

If however I was not familiar with both of these changes, I might be confused and unsure how to resolve this conflict. This is very common in large teams, or long-running projects where several weeks pass before code gets merged. According to the git docs, you can add more context here by using the diff3 conflict style. Let's see what that does.

$ git merge --abort
$ git config --global merge.conflictStyle diff3
$ git merge develop
Redo the merge using the diff3 conflict style

Here is what the conflict looks like now:

        public ApplicationUserManager UserManager
        {
            get
            {
<<<<<<< HEAD
                if (_userManager == null)
                {
                    HttpContext.GetOwinContext().GetUserManager<ApplicationUserManager>();
                }
                else
                {
                    return _userManager;
                }
||||||| af5e62ba8
                return _userManager ?? HttpContext.GetOwinContext().GetUserManager<ApplicationUserManager>();
=======
                return _applicationUserManager ?? HttpContext.GetOwinContext().GetUserManager<ApplicationUserManager>();
>>>>>>> develop
            }
            private set
            {
                _applicationUserManager = value;
            }
        }
Excerpt from AccountController.cs (MERGED)

Now there is a third version of the code between the main and develop versions. This version is called the "common ancestor", also referred to as the BASE in the context of a merge. The common ancestor is the most recent version before main and develop diverged. You can think of git branches as a tree structure, and git is tracing each branch back, recursively, until a common version is found.

In fact, you can actually visualize the tree structure in the terminal using git log:

$ git log --graph --oneline main develop
Git log graph
Git Bash output

In the output above, the stars represent commits and the lines represent their ancestry. We can see my commit to main, and the commit to develop which branches out from main. We can trace the ancestry of both of my commits to the common ancestor with the hash af5e62ba8. Indeed, that's the same commit hash we see in the middle part of the conflict markers!

Having the base code from the common ancestor makes it much easier to understand the actual changes in both branches because it gives us a common starting point for comparison. And if we go back and look at the "Result" file from Visual Studio, it appears to be using the common ancestor as the placeholder for the conflict. Now we are starting to answer some of our questions about how Visual Studio does what it does...

Let's keep digging.

Git Mergetool

My example conflict above is fairly easy to resolve by hand, especially once we have the base code as a reference. But not all merges will be that simple. Sometimes conflicts can be tens or hundreds of lines long. When the conflicts start getting too long, you need to put them side-by-side so you can see all the changes at once. This is where we get into merge tools.

The default mergetool that ships with git is vimdiff. Here is what vimdiff looks like in Git Bash on Windows:

$ git mergetool -t vimdiff
vimdiff merge tool

Here we see four panes with different versions of the AccountController.cs:

  1. LOCAL - "our" code, as it currently exists in the main branch
  2. BASE - the common ancestor between main and develop
  3. REMOTE - "their" code, from the incoming develop branch
  4. MERGED - the merged result, from git, containing the conflict markers

Git has automatically pulled each of these versions of AccountController.cs out of the index and passed them into the vimdiff program, which displays those versions side-by-side. I'm not as comfortable with vim as I wish I was, so I'm going to switch to a more user-friendly tool called Meld. see post below for more on setting up Meld on Windows

Meld Merge on Windows and Visual Studio - Setup and Configuration
I set out to standardize our tooling for diffing and merging code on our git repos so that everyone has the same experience when visualizing code differences, regardless of IDE or operating system. Ultimately I settled on Meld.

Here is a very basic configuration for meld:

$ git config --global mergetool.meld.cmd '"C:\Program Files (x86)\Meld\Meld.exe" "$LOCAL" "$BASE" "$REMOTE"'
$ git mergetool -t meld
Configuring and Running Meld on Windows
Meld Merge - No Auto-merging

Now we can see all three versions – LOCAL, BASE, and REMOTE – side-by-side with the differences highlighted. But, notice that no auto-merge has been done here. The result file in the middle is completely identical to the BASE.

We know that Git performed an auto-merge in the MERGED version, but that version is not represented here. In this example, we have to manually merge all of the differences, even the ones with no conflicts. Not very helpful!

But notice what happens if you click Changes > Merge All in the Meld menu:

Meld Merge - After Auto-merging

Aha! Meld has auto-merged to give us the same result that Visual Studio gives us. Now we can compare the result before and after auto-merging to try to understand what Meld is actually doing. Here is a deconstruction of the process:

  1. Find the common ancestor (the BASE) of the LOCAL and REMOTE versions
  2. For each part of the code where LOCAL made a change, if REMOTE did NOT make a change, then take the changes from LOCAL.
  3. For each part of the code where REMOTE made a change, if LOCAL did NOT make a change, then take the changes from REMOTE.
  4. If LOCAL and REMOTE both changed the same part of the code, take neither side and generate a conflict.

Take line 19 for example:

  • LOCAL: private ApplicationUserManager _userManager;
  • BASE: private ApplicationUserManager _userManager;
  • REMOTE: private ApplicationUserManager _applicationUserManager;

The LOCAL version from main matches the BASE version from the common ancestor. This means that there were no commits in LOCAL which changed that line of code.

The REMOTE version from develop does NOT match the BASE version. This means that someone made a change in develop, so the auto-merge is going to accept that change.

This is how every implementation of auto-merge works, at a basic level. Visual Studio, meld, and git itself follow this same algorithm. What makes each tool different is how they determine what makes a chunk of code different. For example, some tools can be configured to ignore differences in white space or encoding type. But, for the most part, every merge tool is going to agree on how to automerge and where to present conflicts.

Conclusion

Let's revisit my questions from the beginning and try to answer them.

1. How does Visual Studio know which side to take when it AutoMerges? What exactly does "AutoMerge" mean?

A: Visual Studio is using the same basic auto-merge algorithm that every merge tool uses, including git. When you start a merge on a conflicted file, Visual Studio uses git to pull out the BASE version of the file to compare it with the two heads being merged together. Every difference between the BASE and either one of the heads will get auto-merged.

2. Does git perform the AutoMerge? Or is Visual Studio doing it?

A: Git performs its own auto-merge and generates the MERGED version of the file, which becomes part of the working tree (when you open the conflicted file in a text editor, you are opening the MERGED version). Visual Studio also performs its own auto-merge when you start to resolve a conflict. VS does not use the MERGED result from git. VS creates its own result by comparing the branch versions with the common ancestor. This is how every mergetool works.

3. How is the Result file in the bottom pane generated? Is it created by git or by Visual Studio?

A: The Result file is the BASE version, the common ancestor, plus all of the differences introduced by the AutoMerge.

Understanding git merge is really important if you want to lead a large project. Especially when junior devs have questions about how this process works, you need to know what you are talking about. Now that I am more comfortable with the default merge algorithm and the terminology, I find myself reaching for more advanced merge options because I actually understand how they differ from each other. I also feel more comfortable managing my team's gitflow now because I know what will and will not happen when two branches are merged together.

Thanks for reading!