QR code

Public Static Literals ... Are Not a Solution for Data Duplication

OOPoop java

I have a new String(array,"UTF-8") in one place and exactly the same code in another place in my app. Actually, I may have it in many places. And every time, I have to use that "UTF-8" constant in order to create a String from a byte array. It would be very convenient to define it once somewhere and reuse it, just like Apache Commons is doing; see CharEncoding.UTF_8 (There are many other static literals there). These guys are setting a bad example! public static "properties" are as bad as utility classes.

front image
The Shining (1980) by Stanley Kubrick

Here is what I'm talking about, specifically:

package org.apache.commons.lang3;
public class CharEncoding {
  public static final String UTF_8 = "UTF-8";
  // some other methods and properties

Now, when I need to create a String from a byte array, I use this:

import org.apache.commons.lang3.CharEncoding;
String text = new String(array, CharEncoding.UTF_8);

Let's say I want to convert a String into a byte array:

import org.apache.commons.lang3.CharEncoding;
byte[] array = text.getBytes(CharEncoding.UTF_8);

Looks convenient, right? This is what the designers of Apache Commons think (one of the most popular but simply terrible libraries in the Java world). I encourage you to think differently. I can't tell you to stop using Apache Commons, because we just don't have a better alternative (yet!). But in your own code, don't use public static properties—ever. Even if this code may look convenient to you, it's a very bad design.

The reason why is very similar to utility classes with public static methods—they are unbreakable hard-coded dependencies. Once you use that CharEncoding.UTF_8, your object starts to depend on this data, and its user (the user of your object) can't break this dependency. You may say that this is your intention, in the case of a "UTF-8" constant—to make sure that Unicode is specifically and exclusively being used. In this particular example, this may be true, but look at it from a more global perspective.

Let me show you the alternative I have in mind before we continue. Here is what I'm suggesting instead to convert a byte array into a String:

String text = new UTF8String(array);

It's pseudo-code, since Java designers made class String final and we can't really extend it and create UTF8String, but you get the idea. In the real world, this would look like this:

String text = new UTF8String(array).toString();

As you see, we encapsulate the "UTF-8" constant somewhere inside the class UTF8String, and its users have no idea how exactly this "byte array to string" conversion is happening.

By introducing UTF8String, we solved the problem of "UTF-8" literal duplication. But we did it in a proper object-oriented way—we encapsulated the functionality inside a class and let everybody instantiate its objects and use them. We resolved the problem of functionality duplication, not just data duplication.

Placing data into one shared place (CharEncoding.UTF_8) doesn't really solve the duplication problem; it actually makes it worse, mostly because it encourages everybody to duplicate functionality using the same piece of shared data.

My point here is that every time you see that you have some data duplication in your application, start thinking about the functionality you're duplicating. You will easily find the code that is repeated again and again. Make a new class for this code and place the data there, as a private property (or private static property). That's how you will improve your design and truly get rid of duplication.

PS. You can use a method instead of a class, but not a static literal.