April 26, 2016 - Tom Hacohen

How to Retain History When Moving Files Between Git Repositories

As my friends and colleagues know, I think the history of a project is very important for development. Being able to bisect, blame or read the log have proven very useful for finding bugs and understanding why a piece of code was written the way it was. Therefore, it makes sense to do whatever is possible to make sure history is preserved when moving files across repositories.

Luckily for us, git has made it extremely easy.

Merging Repositories

Merging a repository (bar) into another repository (foo) is easy.

This is it. It is very simple and retains all of the history from bar while maintaining the same commit hashes! This means that for example daed567e will point to the same commit in both foo and bar.

Unfortunately it is not always that simple. Sometimes you may face conflicts, if for example you had a README file in both repositories, the merge operation will fail. Luckily, this is also easy to solve.

First, abort the failed merge (if you already tried to merge):

Now switch to a temporary branch that holds bar:

Now you can deal with the conflicting files by either removing them, moving all of bar into a directory such as bar_directory or renaming them individually.

We can finally switch back to master and merge our branch again:

We’re done! Do not forget to push your changes.

Splitting Repositories

Splitting repositories is slightly more involved compared to merging them because we would like to remove all of the unrelated files and commits from history so our new repository is clean.

There are two approaches for this stage. The whitelist (we only keep a list of files) and the blacklist (we keep everything except for the list of files). I prefer the whitelist approach, so I will only cover it in this article.

For this example we will split bar out of foobar.

Let us first start by switching to a temporary branch we can work on.

Now we need to decide which files we would like to preserve.


Optional: Retain files that have been renamed throughout history.

If we have a file called a that has been renamed to b at some point in history, we would like to preserve both a and b. A useful command to find all of the past names of a file is:

Just add both names of the file into the script we will create below.


 

Now we will create a script that moves the correct files into a new temporary directory and run it on all of our repository’s history.

Run this script on the project history:

After that, we should have a new repository with a directory called newroot that contains all of the files we wish to preserve. If we spotted an issue, we can just reset our branch to the initial state (git reset --hard master) and try again, otherwise, we can move to the next step: filtering the repository to be only this directory.

Assuming everything is correct we can go on and push it to our new repository as master.

That’s it! You have now split bar out of foo. The last remaining thing to do is to delete the remaining bar related files from our foobar repository and commit the changes.

Moving arbitrary files between repositories

Moving arbitrary files is very easy when you consider it is just a split from one repository followed by a merge to another. For this reason I will not elaborate further, just follow the two sections above.

Finishing notes

This is a very simple guide. In some more complex cases you will probably have to write more complex scripts or use some optimization techniques. I suggest you also take a look at the slides for a talk I gave about migrating the Enlightenment project from SVN to git. They contain some useful tips and tricks. Especially if you have a big project with a very rich history.

Please let me know if you encountered any issues or have any suggestions.

This article was originally posted on Tom’s personal blog, and has been approved to be posted here.

Tom Hacohen

About Tom Hacohen

Tom has been using Linux since 2003. Previously a core developer and part of the leading team at SHR (Openmoko), he is currently a core developer for the EFL (www.enlightenment.org). He has also contributed to many other Open Source projects over the years. In 2010 he started working at Samsung's open source group on the Tizen Linux platform.

Image Credits: ZyMOS

Development / General Development Process / git /

Leave a Reply

Your email address will not be published. Required fields are marked *

Comments Protected by WP-SpamShield Anti-Spam