Your Git Submodule and You
(Pssst. Check out my Git Submodules Cheat Sheet for a quick reference.)
This post is the result of my investigations into how Git submodules work and how to use them. My goal in investigating submodules was to decide if they would be an effective way to share specs among the various Ruby FFI implementations (Ruby-FFI for MatzRuby, JRuby's FFI, Rubinius' FFI, etc.). We wanted all the projects to be able to include the specs as a subdirectory of their main repository so that they could easily run them, yet we also needed an easy way to keep all the projects in sync.
This guide is meant to give enough information to get started and to understand what's going on during day-to-day operation. There are many other resources to check out for more information, or if this guide "doesn't do it for you". Here are a few that I found useful:
- Git Book chapter on submodules
- "Git Submodules Screencast" at GitCasts
- "Understanding Git Submodules" by Fraser Speirs
- git-submodule man page (command reference)
The most important thing to understand about submodules is that they are essentially a more convenient way of putting a clone of a different git repository into a subdirectory of your main repository working copy. So the core mechanism is essentially the same as doing
git clone from within your working copy of the main repository. The submodule repository isn't part of your main repository now, it just happens to be stored in a subdirectory of your main repository working copy.
The difference between just cloning the other repository and using a submodule, is that submodules make it easier to maintain the relationship between the main repository and the submodule repository. For example, submodules make it easy for someone who has just cloned the main repository to also clone the submodule repository into the correct directory, and to check out the correct commit of the other repository, without the user needing to know any URLs or SHA1 hashes.
Submodules work by keeping a record in the main repository of the URL of the submodule repository, the path of the local directory where it should be cloned (e.g. "./my/submodule/"), and the SHA1 hash of the commit to check out in the submodule repository. These details are saved in certain files (.gitmodules, plus one other file per submodule) in the main repository, and are committed and version controlled. No files from the submodule repository are actually stored in the main repository.
- Users will not automatically get the submodule contents when they clone the main repository. A command must be run to do that:
git submodule update --init. A script or Rake task could be provided to assist with that. In the case of specs being in a submodule, the task could even run automatically when they do
rake specthe first time.
- Users will not (necessarily) get the most recent revision from the submodule repository. Instead, they will get the revision of the submodule that is recorded in the revision of the main repository they are viewing. This is actually good, because it means users won't get incompatible submodule contents. But, it also means there is a bit of extra work to keep the reference in the main repository in sync with the submodule repository.
- A new commit in the submodule does not make a new commit in the main repository. And likewise, committing from the main repository will not cause a commit in the submodule. When the HEAD in the submodule has changed (i.e. a different commit is checked out),
git statuswill also show that "my/submodule" has changed. To record the new reference in the main repository, commit that special file (
git commit ./my/submodule, or
git commit -a). The submodule should always be committed first, then the main repository committed to store the new reference.
git statusfrom within the main repository will not give you any hints that there are changed (but not committed) files in the submodule. You must do
git statusfrom within the submodule directory. Likewise, you cannot do
git commit my/submodule/whatever.rbfrom within the main repository. You must
cdinto the submodule directory first. But if your IDE/editor has decent Git support you should be able to commit from within the editor correctly.
- If someone else updates the reference in the main repository, you can use
git submodule updateto cause the submodule to check out the new revision. You would probably want to do some merging in the submodule (
git submodule update --mergefrom the main repo, or
git pull origin/masterfrom inside the submodule ) to combine your submodule HEAD with the new reference. Or you could just do
git submodule update(which makes a detached/branchless HEAD), then cd into the submodule and make a new branch to hold that HEAD.
Remember, the submodule is just another repository that happens to be in a subdirectory of your main working copy. So, you edit and manage it like any other repository. Committing, pulling, pushing, branching and merging, cherry picking, rebasing, etc. all work just like in any other repository.
- To add, edit, or remove files in the submodule, commit them from within the submodule directory.
- To get other people's changes from upstream (e.g. Github), pull (or fetch + merge) from within the submodule directory.
- To push to upstream, push from within the submodule.
- Etc. etc.
git submodule updatecauses the submodule to fetch from upstream and switch to the commit that is recorded in the main repository. This is useful when someone changes the main repository's reference upstream. You'll also need to run it after changing branches your main repository.
updatenormally makes a detached HEAD, so your submodule branches are unaffected. You can do
git submodule update --mergeif you want to merge it into your current submodule branch.
git submodule statusprints out information about your submodule(s). The output looks like this:
05ed2a3751ccdcec51a782e026c2b97fb275f587 my/submodule (heads/master)
- If there's a "-" in front of the hash, you need to run
git submodule initand
git submodule update— or
git submodule update --initto do both in one shot.
- If there's a "+" in front of the hash, it means the submodule repository's HEAD is different from the recorded one, so you should
git commit my/submoduleto record the new reference.
- If there's neither a "-" nor a "+" (like in the above example), you're okay and don't need to do anything.
- If there's no output at all, you don't have any submodules set up!
- If there's a "-" in front of the hash, you need to run
git submodule summaryprints out a summary of the difference between the submodule's HEAD and the one recorded in main repository. The output looks like this:
* my/submodule 05ed2a3...f1d61b6 (2): < Added a new method to whatever.rb > Fixed a typo in foo.rb
<lines indicate commits that the reference commit has that your submodule HEAD doesn't have.
>lines indicate the reverse.
Git submodules can be a useful tool, but they're not always the right one.
It's unfortunate that they are not cloned automatically when the main repository is cloned, as it makes getting the full source more complicated and inconvenient for users. Likewise, because the submodule contents are not part of the main repository, automatic tarballs and zips (like you can download for Github projects) won't contain the submodule contents. It's likely that gems auto-generated on Github won't, either.
That means submodules are not a good fit for important code, unless the repository is meant only for yourself and other developers working on the project, but not for users to clone. Submodules would be excellent for cloning Rails plugins into your vendor folder, for example.
In the end, submodules are probably not right for the FFI specs. So, I think we'll be looking into other ways to keep things synced between projects.
But, submodules might be right for your project, and if this guide helps you understand them, then my time was well spent.