This is a mobile version, full one is here.

Yegor Bugayenko
5 September 2018

Monolithic Repos Are Evil

We all keep our code in Git version control repositories. The question is whether we should create a new repository for each new module or try to keep as much as possible in a single so called “monolithic” repo (or simply monorepo). Market leaders, like Facebook and Google, advocate the second approach. I believe they are wrong.

Let’s use the following JavaScript function as an example. It downloads a JSON document from a Zold node (using jQuery) and places part of its content on the HTML page. Then it colors the data according to its value.

// main.js
function main() {
  $.getJSON('http://b1.zold.io/', function(json) {
    var $body = $('body');
    $body.text(json.nscore);
    var color = 'red';
    if (json.nscore > 500) {
      color = 'green';
    }
    $body.css('color', color);
  });
}

Pretty obvious, isn’t it? Just a single main.js file which does everything we need. We simply add it to the HTML and it works:

<html>
  <head>
    <script src="https://code.jquery.com/jquery-3.3.1.min.js"/>
    <script src="main.js"/>
  </head>
  <body onload="main();">loading...</body>
</html>

Now, let me refactor it. Let me break it into two pieces. The first piece will load the data and the second one will be a jQuery plugin to colorize HTML content according to the data it contains. This is how the plugin will look:

// colorize.js
$.fn.colorize = function() {
  var data = parseFloat(this.text());
  var keys = Object.keys(colors)
    .map(function (k) { return parseInt(k); })
    .sort(function (a,b) { return a - b; })
    .reverse();
  for (i = 0; i < keys.length; ++i) {
    var max = keys[i];
    if (data >= max) {
      this.addClass(colors[max]);
      return;
    }
    this.removeClass(colors[max]);
  }
  return this;
}

The main.js will look like this:

// main.js
function main() {
  $.getJSON('http://b1.zold.io/', function(json) {
    $('body')
      .text(json.nscore)
      .colorize({ 500: 'green', 0: 'red' });
  });
}

Now, instead of a single monolithic piece of code, we have two smaller pieces which have to be loaded together into the target HTML:

<html>
  <head>
    <script src="https://code.jquery.com/jquery-3.3.1.min.js"/>
    <script src="colorize.js"/>
    <script src="main.js"/>
  </head>
  <body onload="main();">loading...</body>
</html>

Two pieces are better than one? It seems that Google, Digital Ocean and Mozilla don’t think so.

I disagree.

To illustrate my point I extracted the JavaScript function into a new standalone jQuery plugin. Here is what I did:

It took almost three weeks of waiting and four hours of work, just to move a small piece of JavaScript code to a new repository and release it separately. Was it worth it? Well, I think it was. But many most other blog post authors, who I managed to find, think that it would be better to keep everything in a single monolithic repo, mostly because it’s better for productivity. For example, Advantages of monorepos by Dan Luu, Advantages and Disadvantages of a Monolithic Repository (a case study at Google) by Ciera Jaspan et al., and How Monolithic Repository in Open Source saved my Laziness by Tomas Votruba.

There are also a few good analyses of both approaches, for example Monolithic repositories vs. Many repositories speech by Fabien Potencier at dotScale 2016 and Repo Style Wars: Mono vs Multi by Peter Seibel.

In a nutshell, they all claim that productivity is higher with a monolithic repo because the amount of operations one has to do in order to make a change is smaller. Indeed, in a monorepo there will be a single branch, a single set of commits, a single pull request, a single merge, deploy and release. Also it will be easier to test, both manually and via unit testing. Continuous integration is easier to configure, and so on and so forth.

All these “reasonable” arguments remind me of what I hear when preaching object decomposition and suggesting that multiple objects are better than a single large one. Imagine a large class of 3,000 lines of code, which does many things and they are all very tightly coupled. It’s “easy” to test it, to make changes, to deploy, to review, etc. Because everything stays in one file, right? We don’t need to jump from class to class in order to understand the design. We just look at one screen, scroll it up and down, and that’s it. Right? Totally wrong!

I guess I don’t need to explain why it’s wrong. We don’t design our software that way anymore. We know that tight coupling is a bad idea. We know that a set of smaller components is better than a larger solid piece.

Why can’t we apply the same logic to repositories? I believe we can. Of course, just like in object-oriented programming, a fine-grained design requires more skills and time. Look at what I had to do with this small jQuery plugin. I’ve spent hours of coding and thinking. I even had to learn Gulp and Jasmine, which I most probably will not use anymore. But the benefits we are getting from it are enormous. This is my short list of them:

Thus, I believe that the smaller the repositories and modules, the better. Ideally, I would say, the largest acceptable size for a code base is 50,000 lines of code. Everything that goes above this line is a perfect candidate for decomposition.

What do you think is better, a bigger code repository with everything inside, or many smaller ones with their own builds, dependencies, issues, and pull requests?

— Yegor Bugayenko (@yegor256) October 21, 2018