Your Git Submodule and You

(Pssst. Check out my Git Submodules Cheat Sheet for a quick reference.)

This post is the result of my investigations into how Git submodules work and how to use them. My goal in investigating submodules was to decide if they would be an effective way to share specs among the various Ruby FFI implementations (Ruby-FFI for MatzRuby, JRuby's FFI, Rubinius' FFI, etc.). We wanted all the projects to be able to include the specs as a subdirectory of their main repository so that they could easily run them, yet we also needed an easy way to keep all the projects in sync.

This guide is meant to give enough information to get started and to understand what's going on during day-to-day operation. There are many other resources to check out for more information, or if this guide "doesn't do it for you". Here are a few that I found useful:

Understanding Submodules

The most important thing to understand about submodules is that they are essentially a more convenient way of putting a clone of a different git repository into a subdirectory of your main repository working copy. So the core mechanism is essentially the same as doing git clone ./my/submodule/ from within your working copy of the main repository. The submodule repository isn't part of your main repository now, it just happens to be stored in a subdirectory of your main repository working copy.

The difference between just cloning the other repository and using a submodule, is that submodules make it easier to maintain the relationship between the main repository and the submodule repository. For example, submodules make it easy for someone who has just cloned the main repository to also clone the submodule repository into the correct directory, and to check out the correct commit of the other repository, without the user needing to know any URLs or SHA1 hashes.

Submodules work by keeping a record in the main repository of the URL of the submodule repository, the path of the local directory where it should be cloned (e.g. "./my/submodule/"), and the SHA1 hash of the commit to check out in the submodule repository. These details are saved in certain files (.gitmodules, plus one other file per submodule) in the main repository, and are committed and version controlled. No files from the submodule repository are actually stored in the main repository.

Practical Considerations

  1. Users will not automatically get the submodule contents when they clone the main repository. A command must be run to do that: git submodule update --init. A script or Rake task could be provided to assist with that. In the case of specs being in a submodule, the task could even run automatically when they do rake spec the first time.
  2. Users will not (necessarily) get the most recent revision from the submodule repository. Instead, they will get the revision of the submodule that is recorded in the revision of the main repository they are viewing. This is actually good, because it means users won't get incompatible submodule contents. But, it also means there is a bit of extra work to keep the reference in the main repository in sync with the submodule repository.
  3. A new commit in the submodule does not make a new commit in the main repository. And likewise, committing from the main repository will not cause a commit in the submodule. When the HEAD in the submodule has changed (i.e. a different commit is checked out), git status will also show that "my/submodule" has changed. To record the new reference in the main repository, commit that special file (git commit ./my/submodule, or git commit -a). The submodule should always be committed first, then the main repository committed to store the new reference.
  4. Using git status from within the main repository will not give you any hints that there are changed (but not committed) files in the submodule. You must do git status from within the submodule directory. Likewise, you cannot do git commit my/submodule/whatever.rb from within the main repository. You must cd into the submodule directory first. But if your IDE/editor has decent Git support you should be able to commit from within the editor correctly.
  5. If someone else updates the reference in the main repository, you can use git submodule update to cause the submodule to check out the new revision. You would probably want to do some merging in the submodule (git submodule update --merge from the main repo, or git pull origin/master from inside the submodule ) to combine your submodule HEAD with the new reference. Or you could just do git submodule update (which makes a detached/branchless HEAD), then cd into the submodule and make a new branch to hold that HEAD.
  6. Remember, the submodule is just another repository that happens to be in a subdirectory of your main working copy. So, you edit and manage it like any other repository. Committing, pulling, pushing, branching and merging, cherry picking, rebasing, etc. all work just like in any other repository.
    • To add, edit, or remove files in the submodule, commit them from within the submodule directory.
    • To get other people's changes from upstream (e.g. Github), pull (or fetch + merge) from within the submodule directory.
    • To push to upstream, push from within the submodule.
    • Etc. etc.

Useful Commands

Conclusions

Git submodules can be a useful tool, but they're not always the right one.

It's unfortunate that they are not cloned automatically when the main repository is cloned, as it makes getting the full source more complicated and inconvenient for users. Likewise, because the submodule contents are not part of the main repository, automatic tarballs and zips (like you can download for Github projects) won't contain the submodule contents. It's likely that gems auto-generated on Github won't, either.

That means submodules are not a good fit for important code, unless the repository is meant only for yourself and other developers working on the project, but not for users to clone. Submodules would be excellent for cloning Rails plugins into your vendor folder, for example.

In the end, submodules are probably not right for the FFI specs. So, I think we'll be looking into other ways to keep things synced between projects.

But, submodules might be right for your project, and if this guide helps you understand them, then my time was well spent.


Comments


Thanks for helping me understand the basics of submodules:
a) Users will not (necessarily) get the most recent revision from the submodule repository. Instead, they will get the revision of the submodule that is recorded in the revision of the main repository they are viewing.
b) To get other people's changes from upstream (e.g. Github), pull (or fetch + merge) from within the submodule directory.
I was lost without these.


Nice guidelines!
I wonder is there any possibility to have a super-repository to always point to the HEAD of a submodule?
Developers tend to avoid extra work with advancing/updating references to submodules after committing own changes into a submodule. Anyway we're developing on the "trunk" ("main branch") of both main sources and submodules...


Hi,
In your nice git submodule cheatsheet in the section "Edit and commit files in your submodule" I'm missing the checkout of a branch.
I'm not a pro, but committing on a detached head doesn't sound like a good idea.

Just my 2 Cents,
Klaus


Hi
Cloning is not working properly for sub modules.
I have a Main repo and added two repos as sub modules.Inside that sub modules also i have some repos.

for Eg: Repo main ->sub module repo A(A having two repos as sub modules B and C).

While cloning (git clone --recursive git @IP…:Repo main)

I am able to see sub module repo A.but it is not listing the sub module repos inside A(B and C)

But i am able to clone submodule repo B and C seperately.


Thanks for the post John, helped as a novice get into understanding this submodule "thing". :)


Comments are closed.

Previous post: Nice-FFI 0.1 Next post: Migrated from Mephisto to WordPress