Packfiles
Alternative to loose objects.
Stores multiple objects per file under:
.git/objects/packs/patck-<SHA>.pack.git/objects/packs/patck-<SHA>.idx
Unlike loose objects, packfiles can store diffs (deltas) between blob versions, which is specially important since one line changes on large blobs / trees are common.
Each packfile can be independently unpacked from other packfiles: it contains therefore all the deltas for each chain.
Optimizing packfiles is probably an NP complete problem. So Git uses some heuristics to do it: https://github.com/gitster/git/blob/master/Documentation/technical/pack-heuristics.txt
Normally packfiles only contain reachable objects.
The .idx file is just an index to speed up lookup: it can be generated at any time from a .pack file with index-pack.
TODO are packfiles also used to push?
Sources
man git-pack-objects
- https://github.com/gitster/git/blob/master/Documentation/technical/pack-format.txt
 - http://git-scm.com/book/en/Git-Internals-Packfiles
 - http://stackoverflow.com/questions/9478023/is-the-git-binary-diff-algorithm-delta-storage-standardized
 - http://stackoverflow.com/questions/5176225/are-gits-pack-files-deltas-rather-than-snapshots
 
Packfile format
TODO
http://stefan.saasen.me/articles/git-clone-in-haskell-from-the-bottom-up/#pack_file_format
Delta format
This is the data that is stored in the delta entries of the packfile.
pack-objects
Low level pack creation.
Starting from the min-sane test repository, run:
printf '07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
496d6428b9cf92981dc9495211e6e1120fb6f2ba
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391' \
| git pack-objects --stdout
This will output the generated .pack to stdout.
To also generate the .idx and save to a file, run:
git pack-objects a
This will generate the .idx .pack pair with names a-<SHA>.{idx,pack}.
You can confirm the files generated by this command are the same as git repack.
The SHA on the filenames is the SHA of TODO what? It is not the SHA of the content:
git hash-object a-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
unpack-objects
Start from min-sane, run
git repack
git prune-packed
mv ./git/objects/pack/pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack .
Then:
git unpack-objects pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
And tree outputs:
.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack
The unpacking only happens for objects that are not already present in the repository.
unpack-file
Generate a file in the local directory with the contents of the given blob, and name of the form .merge_file_XXXXXX:
git unpack-file e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
Outputs the name of the file.
repack
prune-packed
Porcelain.
Pack all possible reachable objects or try to improve the packing efficiency.
Example: start from the min-sane test repository:
Then .git/objects looks like:
We have the three usual objects: commit, tree and blob:
.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack
Now run:
git repack
Output:
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 0 (delta 0)
Ha, this is what we see on git clone!
Now the objects look like:
.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
|   `-- packs
`-- pack
    |-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
    `-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
Notice how the loose objects were not removed, only packed.
To do that, we can use prune-packed:
git prune-packed
And now the tree looks like:
.git/objects
|-- info
|   `-- packs
`-- pack
    |-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
    `-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack
since all objects had been packed.
git gc by default does both repack and prune-packed, so we could have used it instead.
count-objects
Porcelain.
Count unpacked objects and show their sizes.
Major application: decide how much a repack or gc might benefit you.
Sample output with -vH:
count: 6324
size: 25.58 MiB
in-pack: 108316
packs: 23
size-pack: 100.02 MiB
prune-packable: 518
garbage: 0
size-garbage: 0 bytes
TODO understand
verify-pack
Check that an idx / pack pair is not corrupted:
git verify-pack .git/objects/pack/pack-<SHA>.idx
Returns 0 if OK.
For lots of information on the pack in interactive usage, use -v.
Output for the min-sane test repository after git gc:
07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760 commit 110 90 12
496d6428b9cf92981dc9495211e6e1120fb6f2ba tree   29 40 102
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob   0 9 142
non delta: 3 objects
pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack: ok
The format is:
- SHA
 - uncompressed payload size
 - compressed size. Can be larger for small files because of Zlib’s overhead.
 - offset into the packfile where the object is located
 
For a more complex repository, the output could look something like:
2431da676938450a4d72e260db3bf7b0f587bbc1 commit 223 155 12
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree   136 136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree   6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree   8 19 1331 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
b042a60ef7dff760008df33cee372b945b6e884e blob   22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob   9 20 7262 1 b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob   10 19 7282
(many more lines like the above)
non delta: 15 objects
chain length = 1: 3 objects
chain length = 2: 1 object
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok
Note that some entries have 2 extra columns:
- the depth of the object, i.e., how many deltas you have to resolve to get to it
 - the object to take the delta from
 
Those are deltified objects: their payload contains only a delta from another object.
On the first part, there are two kinds of line:
- 
    
raw objects, of the form:
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree 136 136 1178TODO what are the three numbers at the end?
- file size
 - TODO
 - TODO
 
 - 
    
delta versions of the form:
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree 6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506TODO what is the
1?The most recent versions of files are kept and deltas are done backwards
Here,
d982cis an older version ofdeef2. See how this is very smallYou can cat the version of the object as usual:
git cat-file -p d982c7cb2c2a972ee391a85da481fc1f9127a01dbut only a pack was stored.
 
pack-redundant
Start from the min-sane test repository packed.
TODO
show-index
Plumbing.
Get information about given index file. Subset of git verify-pack -v.
Example:
git show-index < .git/objects/pack/pack-<SHA>.idx
Sample output:
133 496d6428b9cf92981dc9495211e6e1120fb6f2ba (0f49d649)
12 860e5247c071721c8e286c73c3633509c77cf538 (198b73d3)
173 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 (6e760029)
TODO understand. 133, 12, … are probably offsets into the pack file, then the object SHA, then what?
index-pack
Plumbing.
Build idx file for a given .pack:
git index-pack .git/objects/pack/pack-<SHA>.pack
Generates the .idx on the same directory as the .pack.