Your Git Submodule and You
(Pssst. Check out my Git Submodules Cheat Sheet for a quick reference.)
This post is the result of my investigations into how Git submodules work and how to use them. My goal in investigating submodules was to decide if they would be an effective way to share specs among the various Ruby FFI implementations (Ruby-FFI for MatzRuby, JRuby's FFI, Rubinius' FFI, etc.). We wanted all the projects to be able to include the specs as a subdirectory of their main repository so that they could easily run them, yet we also needed an easy way to keep all the projects in sync.
This guide is meant to give enough information to get started and to understand what's going on during day-to-day operation. There are many other resources to check out for more information, or if this guide "doesn't do it for you". Here are a few that I found useful:
- Git Book chapter on submodules
- "Git Submodules Screencast" at GitCasts
- "Understanding Git Submodules" by Fraser Speirs
- git-submodule man page (command reference)
Understanding Submodules
The most important thing to understand about submodules is that they are essentially a more convenient way of putting a clone of a different git repository into a subdirectory of your main repository working copy. So the core mechanism is essentially the same as doing git clone
from within your working copy of the main repository. The submodule repository isn't part of your main repository now, it just happens to be stored in a subdirectory of your main repository working copy.
The difference between just cloning the other repository and using a submodule, is that submodules make it easier to maintain the relationship between the main repository and the submodule repository. For example, submodules make it easy for someone who has just cloned the main repository to also clone the submodule repository into the correct directory, and to check out the correct commit of the other repository, without the user needing to know any URLs or SHA1 hashes.
Submodules work by keeping a record in the main repository of the URL of the submodule repository, the path of the local directory where it should be cloned (e.g. "./my/submodule/"), and the SHA1 hash of the commit to check out in the submodule repository. These details are saved in certain files (.gitmodules, plus one other file per submodule) in the main repository, and are committed and version controlled. No files from the submodule repository are actually stored in the main repository.
Practical Considerations
- Users will not automatically get the submodule contents when they clone the main repository. A command must be run to do that:
git submodule update --init
. A script or Rake task could be provided to assist with that. In the case of specs being in a submodule, the task could even run automatically when they dorake spec
the first time. - Users will not (necessarily) get the most recent revision from the submodule repository. Instead, they will get the revision of the submodule that is recorded in the revision of the main repository they are viewing. This is actually good, because it means users won't get incompatible submodule contents. But, it also means there is a bit of extra work to keep the reference in the main repository in sync with the submodule repository.
- A new commit in the submodule does not make a new commit in the main repository. And likewise, committing from the main repository will not cause a commit in the submodule. When the HEAD in the submodule has changed (i.e. a different commit is checked out),
git status
will also show that "my/submodule" has changed. To record the new reference in the main repository, commit that special file (git commit ./my/submodule
, orgit commit -a
). The submodule should always be committed first, then the main repository committed to store the new reference. - Using
git status
from within the main repository will not give you any hints that there are changed (but not committed) files in the submodule. You must dogit status
from within the submodule directory. Likewise, you cannot dogit commit my/submodule/whatever.rb
from within the main repository. You mustcd
into the submodule directory first. But if your IDE/editor has decent Git support you should be able to commit from within the editor correctly. - If someone else updates the reference in the main repository, you can use
git submodule update
to cause the submodule to check out the new revision. You would probably want to do some merging in the submodule (git submodule update --merge
from the main repo, orgit pull origin/master
from inside the submodule ) to combine your submodule HEAD with the new reference. Or you could just dogit submodule update
(which makes a detached/branchless HEAD), then cd into the submodule and make a new branch to hold that HEAD. -
Remember, the submodule is just another repository that happens to be in a subdirectory of your main working copy. So, you edit and manage it like any other repository. Committing, pulling, pushing, branching and merging, cherry picking, rebasing, etc. all work just like in any other repository.
- To add, edit, or remove files in the submodule, commit them from within the submodule directory.
- To get other people's changes from upstream (e.g. Github), pull (or fetch + merge) from within the submodule directory.
- To push to upstream, push from within the submodule.
- Etc. etc.
Useful Commands
-
git submodule update
causes the submodule to fetch from upstream and switch to the commit that is recorded in the main repository. This is useful when someone changes the main repository's reference upstream. You'll also need to run it after changing branches your main repository.update
normally makes a detached HEAD, so your submodule branches are unaffected. You can dogit submodule update --merge
if you want to merge it into your current submodule branch. -
git submodule status
prints out information about your submodule(s). The output looks like this:05ed2a3751ccdcec51a782e026c2b97fb275f587 my/submodule (heads/master)
- If there's a "-" in front of the hash, you need to run
git submodule init
andgit submodule update
— orgit submodule update --init
to do both in one shot. - If there's a "+" in front of the hash, it means the submodule repository's HEAD is different from the recorded one, so you should
git commit my/submodule
to record the new reference. - If there's neither a "-" nor a "+" (like in the above example), you're okay and don't need to do anything.
- If there's no output at all, you don't have any submodules set up!
- If there's a "-" in front of the hash, you need to run
-
git submodule summary
prints out a summary of the difference between the submodule's HEAD and the one recorded in main repository. The output looks like this:* my/submodule 05ed2a3...f1d61b6 (2): < Added a new method to whatever.rb > Fixed a typo in foo.rb
<
lines indicate commits that the reference commit has that your submodule HEAD doesn't have.>
lines indicate the reverse.
Conclusions
Git submodules can be a useful tool, but they're not always the right one.
It's unfortunate that they are not cloned automatically when the main repository is cloned, as it makes getting the full source more complicated and inconvenient for users. Likewise, because the submodule contents are not part of the main repository, automatic tarballs and zips (like you can download for Github projects) won't contain the submodule contents. It's likely that gems auto-generated on Github won't, either.
That means submodules are not a good fit for important code, unless the repository is meant only for yourself and other developers working on the project, but not for users to clone. Submodules would be excellent for cloning Rails plugins into your vendor folder, for example.
In the end, submodules are probably not right for the FFI specs. So, I think we'll be looking into other ways to keep things synced between projects.
But, submodules might be right for your project, and if this guide helps you understand them, then my time was well spent.
Comments
on
Thanks for helping me understand the basics of submodules:
a) Users will not (necessarily) get the most recent revision from the submodule repository. Instead, they will get the revision of the submodule that is recorded in the revision of the main repository they are viewing.
b) To get other people's changes from upstream (e.g. Github), pull (or fetch + merge) from within the submodule directory.
I was lost without these.
on
Nice guidelines!
I wonder is there any possibility to have a super-repository to always point to the HEAD of a submodule?
Developers tend to avoid extra work with advancing/updating references to submodules after committing own changes into a submodule. Anyway we're developing on the "trunk" ("main branch") of both main sources and submodules...
on
Hi,
In your nice git submodule cheatsheet in the section "Edit and commit files in your submodule" I'm missing the checkout of a branch.
I'm not a pro, but committing on a detached head doesn't sound like a good idea.
Just my 2 Cents,
Klaus
on
Hi
Cloning is not working properly for sub modules.
I have a Main repo and added two repos as sub modules.Inside that sub modules also i have some repos.
for Eg: Repo main ->sub module repo A(A having two repos as sub modules B and C).
While cloning (git clone --recursive git @IP…:Repo main)
I am able to see sub module repo A.but it is not listing the sub module repos inside A(B and C)
But i am able to clone submodule repo B and C seperately.
on
Thanks for the post John, helped as a novice get into understanding this submodule "thing". :)
Comments are closed.