Packfiles
Alternative to loose objects.
Stores multiple objects per file under:
.git/objects/packs/patck-<SHA>.pack
.git/objects/packs/patck-<SHA>.idx
Unlike loose objects, packfiles can store diffs (deltas) between blob versions, which is specially important since one line changes on large blobs / trees are common.
Each packfile can be independently unpacked from other packfiles: it contains therefore all the deltas for each chain.
Optimizing packfiles is probably an NP complete problem. So Git uses some heuristics to do it: https://github.com/gitster/git/blob/master/Documentation/technical/pack-heuristics.txt
Normally packfiles only contain reachable objects.
The .idx
file is just an index to speed up lookup: it can be generated at any time from a .pack
file with index-pack
.
TODO are packfiles also used to push?
Sources
man git-pack-objects
- https://github.com/gitster/git/blob/master/Documentation/technical/pack-format.txt
- http://git-scm.com/book/en/Git-Internals-Packfiles
- http://stackoverflow.com/questions/9478023/is-the-git-binary-diff-algorithm-delta-storage-standardized
- http://stackoverflow.com/questions/5176225/are-gits-pack-files-deltas-rather-than-snapshots
Packfile format
TODO
http://stefan.saasen.me/articles/git-clone-in-haskell-from-the-bottom-up/#pack_file_format
Delta format
This is the data that is stored in the delta entries of the packfile.
pack-objects
Low level pack creation.
Starting from the min-sane
test repository, run:
printf '07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
496d6428b9cf92981dc9495211e6e1120fb6f2ba
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391' \
| git pack-objects --stdout
This will output the generated .pack
to stdout.
To also generate the .idx
and save to a file, run:
git pack-objects a
This will generate the .idx
.pack
pair with names a-<SHA>.{idx,pack}
.
You can confirm the files generated by this command are the same as git repack
.
The SHA
on the filenames is the SHA of TODO what? It is not the SHA of the content:
git hash-object a-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
unpack-objects
Start from min-sane
, run
git repack
git prune-packed
mv ./git/objects/pack/pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack .
Then:
git unpack-objects pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
And tree
outputs:
.git/objects
|-- 07
| `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
| `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
| `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack
The unpacking only happens for objects that are not already present in the repository.
unpack-file
Generate a file in the local directory with the contents of the given blob, and name of the form .merge_file_XXXXXX
:
git unpack-file e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
Outputs the name of the file.
repack
prune-packed
Porcelain.
Pack all possible reachable objects or try to improve the packing efficiency.
Example: start from the min-sane
test repository:
Then .git/objects
looks like:
We have the three usual objects: commit, tree and blob:
.git/objects
|-- 07
| `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
| `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
| `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack
Now run:
git repack
Output:
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 0 (delta 0)
Ha, this is what we see on git clone
!
Now the objects look like:
.git/objects
|-- 07
| `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
| `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
| `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
| `-- packs
`-- pack
|-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
`-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
Notice how the loose objects were not removed, only packed.
To do that, we can use prune-packed
:
git prune-packed
And now the tree looks like:
.git/objects
|-- info
| `-- packs
`-- pack
|-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
`-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
since all objects had been packed.
git gc
by default does both repack
and prune-packed
, so we could have used it instead.
count-objects
Porcelain.
Count unpacked objects and show their sizes.
Major application: decide how much a repack
or gc
might benefit you.
Sample output with -vH
:
count: 6324
size: 25.58 MiB
in-pack: 108316
packs: 23
size-pack: 100.02 MiB
prune-packable: 518
garbage: 0
size-garbage: 0 bytes
TODO understand
verify-pack
Check that an idx
/ pack
pair is not corrupted:
git verify-pack .git/objects/pack/pack-<SHA>.idx
Returns 0
if OK.
For lots of information on the pack in interactive usage, use -v
.
Output for the min-sane
test repository after git gc
:
07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760 commit 110 90 12
496d6428b9cf92981dc9495211e6e1120fb6f2ba tree 29 40 102
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob 0 9 142
non delta: 3 objects
pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack: ok
The format is:
- SHA
- uncompressed payload size
- compressed size. Can be larger for small files because of Zlib’s overhead.
- offset into the packfile where the object is located
For a more complex repository, the output could look something like:
2431da676938450a4d72e260db3bf7b0f587bbc1 commit 223 155 12
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree 136 136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree 6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree 8 19 1331 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
b042a60ef7dff760008df33cee372b945b6e884e blob 22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob 9 20 7262 1 b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob 10 19 7282
(many more lines like the above)
non delta: 15 objects
chain length = 1: 3 objects
chain length = 2: 1 object
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok
Note that some entries have 2 extra columns:
- the depth of the object, i.e., how many deltas you have to resolve to get to it
- the object to take the delta from
Those are deltified objects: their payload contains only a delta from another object.
On the first part, there are two kinds of line:
-
raw objects, of the form:
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree 136 136 1178
TODO what are the three numbers at the end?
- file size
- TODO
- TODO
-
delta versions of the form:
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree 6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
TODO what is the
1
?The most recent versions of files are kept and deltas are done backwards
Here,
d982c
is an older version ofdeef2
. See how this is very smallYou can cat the version of the object as usual:
git cat-file -p d982c7cb2c2a972ee391a85da481fc1f9127a01d
but only a pack was stored.
pack-redundant
Start from the min-sane
test repository packed.
TODO
show-index
Plumbing.
Get information about given index file. Subset of git verify-pack -v
.
Example:
git show-index < .git/objects/pack/pack-<SHA>.idx
Sample output:
133 496d6428b9cf92981dc9495211e6e1120fb6f2ba (0f49d649)
12 860e5247c071721c8e286c73c3633509c77cf538 (198b73d3)
173 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 (6e760029)
TODO understand. 133
, 12
, … are probably offsets into the pack file, then the object SHA, then what?
index-pack
Plumbing.
Build idx
file for a given .pack
:
git index-pack .git/objects/pack/pack-<SHA>.pack
Generates the .idx
on the same directory as the .pack
.