git-sizer GitHub
winget install --id=GitHub.git-sizer -e
git-sizer computes various size metrics for a local Git repository, flagging those that might cause you problems or inconvenience.
git-sizer is a command-line tool designed to analyze the size and health of a local Git repository. It computes various metrics, such as overall repository size, number of references (branches and tags), object counts, and file sizes, flagging potential issues that could impact performance or usability.
Key Features:
- Repository Size Analysis: Identifies if the repository is too large, providing recommendations to optimize storage and reduce clone times.
- Reference Management: Flags excessive branches or tags, suggesting strategies to streamline and manage references effectively.
- Large File Detection: Highlights oversized blobs (files) and suggests alternatives like Git-LFS for handling them efficiently.
- Tree Structure Evaluation: Detects directories with an unusually high number of entries, offering guidance on sharding files into smaller directories.
- Duplicate Files Identification: Warns about repeated or similar files across paths, suggesting better organization practices.
Audience & Benefit:
Ideal for developers and teams managing Git repositories to maintain performance and ensure optimal usage. By identifying size-related issues early, git-sizer helps prevent common pain points such as slow cloning, excessive disk usage, and inefficient operations, ensuring a healthy and performant repository.
The tool can be installed via winget on Windows, making it accessible for developers across different environments.
README
Happy Git repositories are all alike; every unhappy Git repository is unhappy in its own way. —Linus Tolstoy
git-sizer
Is your Git repository bursting at the seams?
git-sizer
computes various size metrics for a local Git repository, flagging those that might cause you problems or inconvenience. For example:
-
Is the repository too big overall? Ideally, Git repositories should be under 1 GiB, and (without special handling) they start to get unwieldy over 5 GiB. Big repositories take a long time to clone and repack, and take a lot of disk space. Suggestions:
-
Avoid storing generated files (e.g., compiler output, JAR files) in Git. It would be better to regenerate them when necessary, or store them in a package registry or even a fileserver.
-
Avoid storing large media assets in Git. You might want to look into Git-LFS or git-annex, which allow you to version your media assets in Git while actually storing them outside of your repository.
-
Avoid storing file archives (e.g., ZIP files, tarballs) in Git, especially if compressed. Different versions of such files don't delta well against each other, so Git can't store them efficiently. It would be better to store the individual files in your repository, or store the archive elsewhere.
-
-
Does the repository have too many references (branches and/or tags)? They all have to be transferred to the client for every fetch, even if your clone is up-to-date. Try to limit them to a few tens of thousands at most. Suggestions:
-
Delete unneeded tags and branches.
-
Avoid pushing your "remote-tracking" branches to a shared repository.
-
Consider using "git notes" rather than tags to attach auxiliary information to commits (for example, CI build results).
-
Perhaps store some of your rarely-needed tags and branches in a separate fork of your repository that is not fetched from by normal developers.
-
-
Does the repository include too many objects? The more objects, the longer it takes for Git to traverse the repository's history, for example when garbage-collecting. Suggestions:
-
Think about whether you are storing very many tiny files that could easily be collected into a few bigger files.
-
Consider breaking your project up into multiple subprojects.
-
-
Does the repository include gigantic blobs (files)? Git works best with small- to medium-sized files. It's OK to have a few files in the megabyte range, but they should generally be the exception. Suggestions:
-
Consider using Git-LFS for storing your large files, especially those (e.g., media assets) that don't diff and merge usefully.
-
See also the section "Is the repository too big overall?"
-
-
Does the repository include many, many versions of large text files, each one slightly changed from the one before? Such files delta very well, so they might not cause your repository to grow alarmingly. But it is expensive for Git to reconstruct the full files and to diff them, which it needs to do internally for many operations. Suggestions:
-
Avoid storing log files and database dumps in Git.
-
Avoid storing giant data files (e.g., enormous XML files) in Git, especially if they are modified frequently. Consider using a database instead.
-
-
Does the repository include gigantic trees (directories)? Every time a file is modified, Git has to create a new copy of every tree (i.e., every directory in the path) leading to the file. Huge trees make this expensive. Moreover, it is very expensive to traverse through history that contains huge trees, for example for
git blame
. Suggestions:-
Avoid creating directories with more than a couple of thousand entries each.
-
If you must store very many files, it is better to shard them into a hierarchy of multiple, smaller directories.
-
-
Does the repository have the same (or very similar) files repeated over and over again at different paths in a single commit? If so, the repository might have a reasonable overall size, but when you check it out it balloons into an enormous working copy. (Taken to an extreme, this is called a "git bomb"; see below.) Suggestions:
- Perhaps you can achieve your goals more effectively by using tags and branches or a build-time configuration system.
-
Does the repository include absurdly long path names? That's probably not going to work well with other tools. One or two hundred characters should be enough, even if you're writing Java.
-
Are there other bizarre and questionable things in the repository?
-
Annotated tags pointing at one another in long chains?
-
Octopus merges with dozens of parents?
-
Commits with gigantic log messages?
-
git-sizer
computes many size-related statistics about your repository that can help reveal all of the problems described above. These practices are not wrong per se, but the more that you stretch Git beyond its sweet spot, the less you will be able to enjoy Git's legendary speed and performance. Especially if your Git repository statistics seem out of proportion to your project size, you might be able to make your life easier by adjusting how you use Git.
Getting started
-
Make sure that you have the Git command-line client installed, version >= 2.6. NOTE:
git-sizer
invokesgit
commands to examine the contents of your repository, so it is required that thegit
command be in yourPATH
when you rungit-sizer
. -
Install
git-sizer
. Either:a. Install a released version of
git-sizer
(recommended):- Go to the releases page and download the ZIP file corresponding to your platform.
- Unzip the file.
- Move the executable file (
git-sizer
orgit-sizer.exe
) into yourPATH
.
b. Build and install from source. See the instructions in
docs/BUILDING.md
. -
Change to the directory containing a full, non-shallow clone of the Git repository that you'd like to analyze. Then run
git-sizer [