How to reconstruct source code from an exposed .git directory.
When attacking an application, obtaining the application’s source code can be extremely helpful for constructing an exploit. This is because some bugs, like SQL injections, are way easier to find using static code analysis compared to black-box testing.
Obtaining an application’s source also often means getting a hold of developer comments, hardcoded API keys, and other sensitive data. So the source code of an application should always be protected from public view.
Finding .git directory information leaks
A way that applications accidentally expose source code to the public is through an exposed .git directory.
When a developer uses Git to version control a project’s source code, a git directory (located at project.com/.git) is used to store all the version control information of the project, including the commit history of project files. Normally, the .git folder should not be accessible to the public. But sometimes the .git folder is accidentally made available, and this is when information leaks happen.
To check if an application’s .git folder is exposed, simply go to the application’s root directory, for example project.com, and add /.git to the URL. There are three possibilities that can happen when you browse to the /.git directory:
- If you get a 404 error, this means that the .git directory of the application is not made available to the public, and you won’t be able to leak information this way.
- If you get a 403 error, the .git directory is available on the server, but you won’t be able to directly access the folder’s root, and therefore will not be able to list all the files contained in the directory.
- If you don’t get an error and the server responds with the document tree of the .git directory, you can directly browse the folder’s contents and retrieve any information contained in it.
Reconstructing project source from .git directory
If directory listing is enabled, an attacker can simply browse through the files and retrieve the leaked information. She can also use the wget command in recursive mode (-r) to mass-download the contents of the directory.
> wget -r project.com/.git
But if directory listing is not enabled and the directory’s files are not shown, there are still ways for an attacker to reconstruct the entire .git directory. To understand how this is done, we must first understand the structure of .git directories.
.git directory structure
The .git directory is laid out in a specific way. When you execute the command:
> ls .git
In the command line, you would probably see this:
COMMIT_EDITMSG HEAD branches config description hooks index info logs objects refs
Here are a few standard files and folders in the .git directory that is important in reconstructing the project’s source.
- The /objects folder
The /objects directory is used to store Git objects. This directory contains additional folders that each have two character names. These subdirectories are named after the first two characters of the SHA1 hash of the git objects stored in it.
Within these subdirectories, there are files named after the SHA1 hash of the git object stored in it.
For example, the command below will return a list of folders:
> ls .git/objects 00 0a 14 5a 64 6e 82 8c 96 a0 aa b4 be c8 d2 dc e6 f0 fa info pack
And this command will reveal the git objects stored in that particular folder:
> ls .git/objects/0a 082f2656a655c8b0a87956c7bcdc93dfda23f8 4a1ee2f3a3d406411a72e1bea63507560092bd 66452433322af3d319a377415a890c70bbd263 8c20ea4482c6d2b0c9cdaf73d4b05c2c8c44e9 ee44c60c73c5a622bb1733338d3fa964b333f0 0ec99d617a7b78c5466daa1e6317cbd8ee07cc 52113e4f248648117bc4511da04dd4634e6753 72e6850ef963c6aeee4121d38cf9de773865d8
Git objects are stored in /objects according to the first two characters of their SHA1 hash. For example, the Git object with a hash of 0a082f2656a655c8b0a87956c7bcdc93dfda23f8 will be stored with the file name of 082f2656a655c8b0a87956c7bcdc93dfda23f8 in the directory .git/objects/0a.
Git stores different types of objects in .git/objects. An object stored here could either be a commit, a tree, a blob, and an annotated tag. You can determine the type of an object by using the command:
> git cat-file -t OBJECT-HASH
Commit objects store information about the commit’s directory tree object hash, parent commit, author, committer, date, and message of a commit. Tree objects contain the directory listings for commits. Blob objects contain copies of files that were committed (read: actual source code!). Whereas tag objects contain information about tagged objects and their associated tag names.
You can display the file associated with a Git object by using the command:
> git cat-file -p OBJECT-HASH
- The /config file is the Git configuration file for the project.
- The /HEAD file is a file that contains a reference to the current branch.
> cat .git/HEAD ref: refs/heads/master
Confirming that files are accessible
If you are not able to access the .git directory listing, you’ll need to confirm that the folder’s contents are indeed available to the public. You can do this by trying to access the config file of the .git directory.
> curl https://project.com/.git/config
If this file is accessible, you might be able to download the entire contents of the .git directory.
Downloading the files
If you cannot access the /.git folder’s directory listing, you have to download each file you want instead of recursively downloading from the directory root.
But how do you find out which files on the server are available when object files have complex paths such as “.git/objects/0a/72e6850ef963c6aeee4121d38cf9de773865d8”?
You start with file paths that you already know exist, like ”.git/HEAD”! Reading this file will give you a reference to the current branch (for example, .git/refs/heads/master) that you can use to find more files on the system.
> cat .git/HEAD ref: refs/heads/master > cat .git/refs/heads/master 0a66452433322af3d319a377415a890c70bbd263 > git cat-file -t 0a66452433322af3d319a377415a890c70bbd263 commit > git cat-file -p 0a66452433322af3d319a377415a890c70bbd263 tree 0a72e6850ef963c6aeee4121d38cf9de773865d8
The .git/refs/heads/master file will point you to the corresponding object hash that stores the directory tree of the commit. From there, you can see that the object is a commit and is associated with a tree object, 0a72e6850ef963c6aeee4121d38cf9de773865d8.
Now when you examine the tree object stored at 0a72e6850ef963c6aeee4121d38cf9de773865d8:
> git cat-file -p 0a72e6850ef963c6aeee4121d38cf9de773865d8 100644 blob 6ad5fb6b9a351a77c396b5f1163cc3b0abcde895 .gitignore 040000 blob 4b66088945aab8b967da07ddd8d3cf8c47a3f53c source.py 040000 blob 9a3227dca45b3977423bb1296bbc312316c2aa0d README 040000 tree 3b1127d12ee43977423bb1296b8900a316c2ee32 resources
Bingo! You discover some source code files and additional object trees to explore.
On a remote server, your requests to discovering the different files would look more like this:
https://project.com/.git/HEAD (to determine the HEAD) https://project.com/.git/refs/heads/master (to find the object stored in that HEAD) https://project.com/.git/objects/0a/72e6850ef963c6aeee4121d38cf9de773865d8 (to access the tree associated with the commit) https://project.com/.git/objects/9a/3227dca45b3977423bb1296bbc312316c2aa0d (to download the source code stored in the README file)
On a remote server like this, you will need to decompress the downloaded object file before you read it. This can be done using Ruby:
ruby -rzlib -e 'print Zlib::Inflate.new.inflate(STDIN.read)' < OBJECT_FILE
Finding useful information
After recovering the project’s source code, you can grep for hardcoded credentials, encryption keys and developer comments for quick wins. You should also look for new and deprecated endpoints and record them for further analysis. If you have time, you can simply browse through the entire recovered codebase to find potential vulnerabilities.