This is a mobile version, full one is here.
8 May 2018
An Open Code Base Is Not Yet an Open Source Project
A few weeks ago someone suggested I should try to integrate IntelliJ IDEA’s static analysis rules into Qulice, our aggregator of Checkstyle, PMD, FindBugs, and some other analyzers. I do love IDEA’s rules—some of them are unique and very useful. I asked whether I could find them somewhere in Maven Central (they are written in Java) and the answer was “You’ll have to figure out yourself how to use them, but they are open source.” Here comes my opinion about this situation: I believe that open source doesn’t just mean the code is readable without authorization. It means something much bigger.
Just making a piece of code publicly accessible is not what it takes to call it open source software. Actually, it only harms the product, and the reputation of its author, if it’s open but not ready for reuse (which is what the open source world is all about). As Eric Raymond said in his famous piece The Cathedral and the Bazaar, “Good programmers know what to write. Great ones know what to rewrite (and reuse).”
It’s the responsibility of the software product’s author to help those “good” programmers to reuse the code. Coding, testing, debugging, and making sure “it works on my laptop” is one thing. Making it readable and reusable is a totally different piece of work, which may take much more time.
As Karl Fogel said in Producing Open Source Software: “Most free software projects fail.” They fail (on top of many other factors) because not enough attention is paid to the following basic things (in no particular order):
I’m sure you host your product on GitHub. (If not, what’s wrong with you?)
There must be a
file in the root directory that
explains what the product is all about and how we should use it.
A few good examples:
yegor256/takes (this one is mine).
A few bad examples:
junit4/blob (don’t be like these guys).
No matter how rich you’ve made your website, Javadoc, Wiki, mailing list, and Twitter, the README is the place where we expect to see everything. Only if and when we get interested will we investigate further and deeper. Read the README files in other projects and copy their best ideas. README is your showcase, it must shine.
License. Most of us don’t pay attention to this bureaucracy. I didn’t either, until recently. I thought that the moment my code is open I can forget about any rights and royalties. They will just use my code and I won’t see any profit, ever. The license I attach to it won’t matter—nobody reads it anyway. This is exactly what happens in most cases. But only while those users are small potatoes.
A few years ago I was an architect on a software project and we had to create an analyzer of hardware components, like CPU, memory, hard disc, etc. We had to make sure all of them worked as expected after running pretty complex and customized tests. My obvious suggestion was to use open source tools, which would do the hard work for us. We would only have to integrate them. It was an awesome idea, until some of us decided to check the licenses of those tools.
That was the moment I realized that I was so wrong for not paying attention to what licenses say. GPL, for example, which we found in a few tools, didn’t allow us to reuse the code if our product wasn’t open source too. Since we were creating proprietary software, we understood that we weren’t able to use copyleft modules, only MIT, BSD or similar.
I’m suggesting you think about the license before publishing the product. I’ve used MIT in all my products since 2016.
A mere collection of
.rb files is not reusable Ruby code. Well, maybe for
I despise so much, it is. But for professional
who are too lazy
to read their own code, let alone someone else’s,
it definitely isn’t.
“Take it from GitHub” is not a polite way to treat us—your fellow programmers—anymore. It was, twenty years ago, but now we have repositories. You have to distribute your product as an “artifact” through one of those public repositories, and make it possible for us to fetch it from there, skipping the testing and packaging, and just using it as a product (a Ruby gem or Java JAR, for example).
I’m talking about repositories like Maven Central, npmjs, or RubyGems. You have to find a way to deploy your product there. It’s not an easy task, even though those repositories do their best to simplify the process. We use Rultor in all our projects, which helps us streamline the deployment:
- yegor256/takes to Maven Central (details);
- yegor256/xcop to RubyGems (details);
- yegor256/tacit to npmjs.
Package managers like Maven, NPM, Rake, Grunt, Gradle and others, are the standard and traditional way of reusing open source software (proprietary too). If your product is not available in a public repository, it’s not a product; it’s just a code base.
Javadoc. We all hate writing documentation. And we hate libraries that are not documented. I usually find it boring to write Javadoc blocks for my classes, but I understand that without them the code I’m writing inside those classes will not interest anyone.
The best format for those Javadoc blocks is “by example.” Instead of prose I’d recommend you demonstrate how to use the class, especially in combination with it neighbors. Moreover, I’d suggest you don’t write documentation anywhere else apart from those Javadoc blocks. (They exist in other languages too, but have different names.)
The problem with Javadoc is that it’s not so easy to format the text so that it looks visually attractive. Maybe that’s why many programmers still rely on Wikis or project websites. I’d recommend you stay inside Javadoc blocks and learn their formatting syntax.
Badges. As you can see, I like badges. First and foremost they make a repository look as if it’s being “actively maintained,” especially if those badges are green. They don’t really deliver any valuable information. They mostly say: “Our author has very good taste, see how perfectly our colors match!”
Jokes aside, it’s not so easy to add all those badges. Each badge will take you some time, to integrate a third party system, to make sure the numbers are good enough to be proud of, and to keep it under control. If the repository is not being watched over, the badges will eventually start failing.
Continuous Integration. In order to use your code we have to trust it, meaning that we have to be sure that it works, or at least passes automated tests. (Do I have to say that you must have tests?) How can we be sure it works? CI is the answer. We must be able to see the logs of the recent CI build and make sure it is clean.
It’s a matter of trust. You may never use those Travis builds and simply ignore their red and green signals, but they are important for us—your clients. I add Travis badges to all projects of mine, right after I create a new repository.
Contribution Guidelines. For a regular GitHub addict it’s not a problem to figure out how to send you a pull request. However, the majority of us, at least initially, will consist of active users, not contributors. We will try to use your product and will attempt to customize it for our needs. If we get lost, we will leave, frustrated.
To prevent this, you have to explain what a disciplined contributor
has to do in order to make changes to your code base. Here are
the questions I’d recommend you answer in your
- How do I run an automated build?
- How big/small does a pull request have to be in order to be accepted?
- What are your style guidelines?
- How do bugs have to be reported, tagged, explained?
- What makes a good bug report?
Here is the text I
use in all my projects:
Quality Wall. Finally, if you are lucky, we will use your product and will be interested in contributing. You will start getting our pull requests. The question is how fast we will ruin your code base. We will, if you don’t protect yourself.
If you strictly review each pull request and reject anything that doesn’t look like “great” code, you will lose us, your contributors. We don’t want to write great code, we want to make changes to your product so that it becomes more suitable for our needs. The greatness of the code is your concern, not ours.
On the other hand, if you accept whatever comes in, the architecture will lose its robustness (if it ever had any) and you again will lose us, your contributors. This time you will lose us because the product will become bad and difficult to maintain and contribute to.
The best way to keep the balance is to “hire” a tool to help you: build automation, static analysis, automated tests, and coverage control. You have to configure the product to fail when the changes someone introduces violate its internal quality expectations. I use Rultor for that too.
Did I forget anything?