Do products of a build process belong in a repository?

For example:

  • we don’t put python compilation files (.pyc’s) into the repo, probably because python generates them automatically.

  • In a java house, do they commit the .jars to their repos?

  • Traditionally, I believe, C source went into the repo with headers, link libraries? or whatever it took to do a compile, but never output of the build, like the executable itself.

In this specific instance I’m thinking of the web scenario, if we build stuff with nodejs, does the output of that process belong in the repo, or just the sources? Do we commit just the stuff that is necessary to make the build, or do we commit what we need to do the deploy (thus avoiding all the tooling).

In the absence of in-house rules, what is the guiding principle?


No. The binaries generated do not belong in your repository. You need to keep them in some kind of store, but committing them to your history is a bad idea.

All VCS systems I’m aware of have some kind of issue with storing the resulting binaries, although the particular problems will differ depending on the implementation. Some are common to both systems though.

Centralized systems, like TFS, need to frequently compare the files on your local system to the latest version on the server. They have a harder time comparing binaries than than regular text files, so making it constantly check those binaries for changes will slow down your development environment.

Git, on the other hand, uses hashes of the repositories content to determine if a file has changed. (Fun fact: The sha is actually the resulting hash of the entire repo.) That makes the check very quick, but because Git stores the entire file at every commit, instead of just the delta, storing slightly different binaries will bloat the size of your repository greatly.

All systems fall to the same flaw when it comes to storing build outputs in your repository though. Its an utter PITA to have your VCS think something has changed just because you built the solution.

Imagine this scenario happening a dozen times a day:

You pull down the latest source code. You’re paranoid, so before changing anything, you build the project to make sure your environment is set up correctly and the last guy to commit didn’t break the build. Your VCS now thinks something has changed, even though nothing has.

So, you ignore those pending changes and go on with your work. You’re a good developer, so you commit a half dozen small changes and submit them for review. When your coworkers go to review the changes, all they see is a dozen changed binaries. They have to sift through all those files in order to find the one line change you made.

Does this sound efficient to you?


I realize you meant a source code repository, but it’s interesting that you just said “a repository,” because many organizations do store their binaries in a different kind of repository called an artifact repository, especially companies with microservices architectures or other highly componentized architectures.

This allows you to just compile the microservice you’re working on and pull in already compiled binaries for the rest of the system for local integration testing. Tools like maven or gradle can do dependency resolution of your company’s libraries in your local repo and third party libraries in repositories on the internet.

But no, in source control you want to only store the preferred form for making modifications.

This is just to add on to RubberDuck’s answer.

Everything he said is true. But there’s another reason binary files don’t play with with a VCS. VCSs are designed with programming in mind, and the fundamental file type that programmers work with are source files. Source files are text-based, meaning they’re structured as a sequence of lines (which are themselves sequences of characters) separated by the newline-character. The other common file types that programmers work with, such as setting and configuration files, are also usually text-based.

Thus, with this format in mind, the VCS compares files line-by-line (although some will work at an even finer granularity). This technique fails utterly for non-text (binary) files. Of course, binary is a large category; any binary format will of course have its own internal format (jpg, mp3, rdb, etc), but these differ wildly, and they aren’t line-based.

For a VCS to work with any file type, it has to understand its structure. You could conceivably create a VCS that understands the binary format of some database dump, and can compare databases table-by-table or even row-by-row. But this is infeasible to do for every file type, and the ROI is low anyhow. That’s why we stick to text files.


Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *