Git Merge Deep Dive
I started my research by trying to reverse engineer what Visual Studio does when you merge and resolve conflicts. That lead me to explore some other tools, like Meld, and to experiment with some of the optional merge strategies within git. Here is what I found.
In my last post I outlined how my team standardized our process for diffing and merging code by configuring Meld for use in Visual Studio and other IDEs. During this process, I did a lot of research on git-merge and came away with several key insights that I want to document in this post.
I have a lot of experience with git, and because I follow a pretty strict gitflow policy, I rarely experience issues with merging. When other developers – especially the juniors – approached me with questions about why their merge isn't making sense, I never really had good answers. I gave them my naïve assumptions about how merging works, but I couldn't explain specifically why Visual Studio picked one side of the diff or the other.
I started my research by trying to reverse engineer what Visual Studio does when you merge and resolve conflicts. That lead me to explore some other tools, like Meld, and to experiment with some of the optional merge strategies within git. Here is what I found.
Setup
For the proceeding examples I will be using my own fork of the official ASP.NET MVC demo project. I started by cloning and checking out a new branch called develop
.
$ git clone https://github.com/akmolina28/AspNetDocs.git
$ git checkout -b develop
First, I committed a simple change in the AccountController refactoring the name of a variable.
Next, I checked out the main
branch and made some changes in the same file, which will conflict with my changes in the develop
branch.
$ git checkout main
Now, when I initiate a merge from develop
into main
, I would expect conflicts because some of the same code was changed in both branches. Below is the result.
$ git merge develop
Auto-merging aspnet/mvc/overview/getting-started/introduction/sample/MvcMovie/MvcMovie/Controllers/AccountController.cs
CONFLICT (content): Merge conflict in aspnet/mvc/overview/getting-started/introduction/sample/MvcMovie/MvcMovie/Controllers/AccountController.cs
Automatic merge failed; fix conflicts and then commit the result.
Resolving Conflicts in Visual Studio
Visual Studio ships with Microsoft's built-in tool for diffing and merging, vsdiffmerge. When Visual Studio is installed and configured for git, it will automatically configure vsdiffmerge as the global mergetool for resolving merge conflicts. Here is what happens when you try to resolve the file that I set up in the previous section.
If you have experience with git in Visual Studio, this result should feel very familiar. Visual Studio does a good job of figuring out which changes to take from either side. Here is what the conflict looks like, as expected:
Let's take stock of what we have at this point. The left side has our incoming version from develop
, the right side has our local version in main
, and at the bottom is the result of our merge, where we make our final changes. 5 changes were auto-merged, which Visual Studio shows with the pre-checked, green selections. The two conflicting changes are highlighted in red.
At this point, I immediately have several questions:
- How does Visual Studio know which side to take when it AutoMerges? What exactly does "AutoMerge" mean?
- Does git perform the AutoMerge? Or is Visual Studio doing it?
- How is the Result file in the bottom pane generated? Is it created by git or by Visual Studio?
In the next sections I will deconstruct the merge process to try to answer these questions.
Back to Basics
For now, let's step away from Visual Studio and understand how to resolve this merge at the most basic level. I will abort the merge-in-progress and start over, just in case Visual Studio changed something.
$ git merge --abort
$ git merge develop
We have generated the same conflict once again. Let's see what git did with AccountController.cs by looking at the file in a text editor:
It appears that git auto-merged the non-conflicting differences in the same way that Visual Studio did. So, git is definitely doing its own auto-merging here. All of the changes from both branches are reflected in the merged result, except for the conflict where we see the merge markers.
Between <<<<<<<
and =======
we have our code from main
. And between =======
and >>>>>>>
we have our changes from develop
. My job here is simple – edit the code by hand and resolve both changes. I am familiar with the code in both branches, so this conflict is trivial for me.
If however I was not familiar with both of these changes, I might be confused and unsure how to resolve this conflict. This is very common in large teams, or long-running projects where several weeks pass before code gets merged. According to the git docs, you can add more context here by using the diff3 conflict style. Let's see what that does.
Here is what the conflict looks like now:
Now there is a third version of the code between the main
and develop
versions. This version is called the "common ancestor", also referred to as the BASE in the context of a merge. The common ancestor is the most recent version before main
and develop
diverged. You can think of git branches as a tree structure, and git is tracing each branch back, recursively, until a common version is found.
In fact, you can actually visualize the tree structure in the terminal using git log
:
In the output above, the stars represent commits and the lines represent their ancestry. We can see my commit to main
, and the commit to develop
which branches out from main
. We can trace the ancestry of both of my commits to the common ancestor with the hash af5e62ba8
. Indeed, that's the same commit hash we see in the middle part of the conflict markers!
Having the base code from the common ancestor makes it much easier to understand the actual changes in both branches because it gives us a common starting point for comparison. And if we go back and look at the "Result" file from Visual Studio, it appears to be using the common ancestor as the placeholder for the conflict. Now we are starting to answer some of our questions about how Visual Studio does what it does...
Let's keep digging.
Git Mergetool
My example conflict above is fairly easy to resolve by hand, especially once we have the base code as a reference. But not all merges will be that simple. Sometimes conflicts can be tens or hundreds of lines long. When the conflicts start getting too long, you need to put them side-by-side so you can see all the changes at once. This is where we get into merge tools.
The default mergetool that ships with git is vimdiff. Here is what vimdiff looks like in Git Bash on Windows:
$ git mergetool -t vimdiff
Here we see four panes with different versions of the AccountController.cs:
- LOCAL - "our" code, as it currently exists in the
main
branch - BASE - the common ancestor between
main
anddevelop
- REMOTE - "their" code, from the incoming
develop
branch - MERGED - the merged result, from git, containing the conflict markers
Git has automatically pulled each of these versions of AccountController.cs out of the index and passed them into the vimdiff program, which displays those versions side-by-side. I'm not as comfortable with vim as I wish I was, so I'm going to switch to a more user-friendly tool called Meld. see post below for more on setting up Meld on Windows
Here is a very basic configuration for meld:
Now we can see all three versions – LOCAL, BASE, and REMOTE – side-by-side with the differences highlighted. But, notice that no auto-merge has been done here. The result file in the middle is completely identical to the BASE.
We know that Git performed an auto-merge in the MERGED version, but that version is not represented here. In this example, we have to manually merge all of the differences, even the ones with no conflicts. Not very helpful!
But notice what happens if you click Changes > Merge All in the Meld menu:
Aha! Meld has auto-merged to give us the same result that Visual Studio gives us. Now we can compare the result before and after auto-merging to try to understand what Meld is actually doing. Here is a deconstruction of the process:
- Find the common ancestor (the BASE) of the LOCAL and REMOTE versions
- For each part of the code where LOCAL made a change, if REMOTE did NOT make a change, then take the changes from LOCAL.
- For each part of the code where REMOTE made a change, if LOCAL did NOT make a change, then take the changes from REMOTE.
- If LOCAL and REMOTE both changed the same part of the code, take neither side and generate a conflict.
Take line 19 for example:
- LOCAL:
private ApplicationUserManager _userManager;
- BASE:
private ApplicationUserManager _userManager;
- REMOTE:
private ApplicationUserManager _applicationUserManager;
The LOCAL version from main
matches the BASE version from the common ancestor. This means that there were no commits in LOCAL which changed that line of code.
The REMOTE version from develop
does NOT match the BASE version. This means that someone made a change in develop, so the auto-merge is going to accept that change.
This is how every implementation of auto-merge works, at a basic level. Visual Studio, meld, and git itself follow this same algorithm. What makes each tool different is how they determine what makes a chunk of code different. For example, some tools can be configured to ignore differences in white space or encoding type. But, for the most part, every merge tool is going to agree on how to automerge and where to present conflicts.
Conclusion
Let's revisit my questions from the beginning and try to answer them.
1. How does Visual Studio know which side to take when it AutoMerges? What exactly does "AutoMerge" mean?
A: Visual Studio is using the same basic auto-merge algorithm that every merge tool uses, including git. When you start a merge on a conflicted file, Visual Studio uses git to pull out the BASE version of the file to compare it with the two heads being merged together. Every difference between the BASE and either one of the heads will get auto-merged.
2. Does git perform the AutoMerge? Or is Visual Studio doing it?
A: Git performs its own auto-merge and generates the MERGED version of the file, which becomes part of the working tree (when you open the conflicted file in a text editor, you are opening the MERGED version). Visual Studio also performs its own auto-merge when you start to resolve a conflict. VS does not use the MERGED result from git. VS creates its own result by comparing the branch versions with the common ancestor. This is how every mergetool works.
3. How is the Result file in the bottom pane generated? Is it created by git or by Visual Studio?
A: The Result file is the BASE version, the common ancestor, plus all of the differences introduced by the AutoMerge.
Understanding git merge is really important if you want to lead a large project. Especially when junior devs have questions about how this process works, you need to know what you are talking about. Now that I am more comfortable with the default merge algorithm and the terminology, I find myself reaching for more advanced merge options because I actually understand how they differ from each other. I also feel more comfortable managing my team's gitflow now because I know what will and will not happen when two branches are merged together.
Thanks for reading!