Explore using Nerdbank.GitVersioning's ManagedGit
implementation for reading Git repositories
#343
Labels
design
For design discussion issues
As documented in #243, there going to be some benefits to moving away from
libgit2sharp
and towards a different solution.The
Nerdbank.GitVersioning
project has experimented with this in the past as well. Their discussion in dotnet/Nerdbank.GitVersioning#505 is a very interesting read. Pretty much all of the objections that I have to us continuing to uselibgit2sharp
are well stated in that discussion. While reading through what they implemented as part of dotnet/Nerdbank.GitVersioning#521, I realized that their implementation is marked public, which means we can try using it for our needs.Since their intended use for the git repository is very similar to ours (just reading through the history with no interest in making changes/commits to the repository), then I think that there's a really good chance that we could use their code to pull out the contents of dependency manifest files along with their history.
Our current usage of
git
requires us to make a clone of the repository if it doesn't exist already. TheNerdbank.GitVersioning
ManagedGit
code does implement thegit clone
command because for its use case it makes sense to assume that they already have access to a clone. So that's something that we'd have to build ourselves.Our current usage also assumes that the history is stored directly on the filesystem, and that's something that I'd love to break our dependency on. We could try to avoid the need to rely on the filesystem by instead implementing our own
object
/pack
storage mechanism in memory (similar to what the C librarylibgit2
and Go librarygo-git
support).Since we're going to have to build our own equivalent of the clone command and the git data transfer protocols, starting out with support for performing a clone operation directly into an in-memory store is something that we should consider including.
It's worth noting that since
libgit2
has functionality for using custom Gitobject
storage mechanisms, it might be exposed vialibgit2sharp
as well. Even if this is the case, I'm still not a fan of us depending onlibgit2sharp
, if we can avoid it. I'd rather keep our dependency on the filesystem and remove our dependency onlibgit2sharp
than the other way around.Implementing our own clone command does open other possibilities that are worth considering. We could make our implementation smart enough to only grab
objects
,packs
,commits
, andtrees
that contain the files that we're interested in reading. We don't need all of the source code, just the dependency manifest information. This would potentially save us a bunch of time that we're currently spending waiting for a fullgit clone
command to complete. If we could walk the commits on the remote to find ones that reference dependency manifest files, then we could just request theobjects
/packs
that contain the versions of those files that we need. This would result in a much smaller data transfer in terms of the raw number of bytes. It is possible that the extra processing that we'd have to do would negate any performance benefit as measured in seconds. We could do some profiling to assess that, though. And I suspect that transferring less data would be a big win for really large repositories.There is a potential risk that needs to be noted if we go forward with this idea. The
Nerdbank.GitVersioning
team might not be excited to learn that we plan on using theirManagedGit
code directly. They might react by marking those classesinternal
. The discussion in dotnet/Nerdbank.GitVersioning#505 included some back-and-forth about where theManagedGit
implementation should live, with one of the options being to move it into a separate package. Perhaps that's an extraction effort that we could assist them with in the event that they object to us usingNerdbank.GitVersioning
as a dependency just for the purpose of consuming theManagedGit
code that it contains.The text was updated successfully, but these errors were encountered: