Strings Stink!

In a previous post, I talked about some ideas about how to write Object Oriented code.  Today I'd like to delve a little deeper.

One of my favorite Refactorings is Replace Data Value with Object.  You start with a weak type and move to a stronger one.  You start with code that manipulates a variable, and move to calling methods on a class.

In the aforementioned post, we went from a string to a FilePath, using this refactoring.  Fowler says it's great for builtin types, like string & int.  That's because these types have very little structure.  Every instance has some unspoken rules about how they should be handled. 

The absolute worst is string.  As my OO Jedi Master would say, “Strings are smelly“.  One the one hand, you can store any kind of information in a string.  And in C# (and other modern languages), everyone knows what a string is, and how to create, destroy, pass them.  On the other hand, you can store any kind of information in a string.  So any smarts about their contents must be in your code that manipulates them.

When we talked about this idea at coffee, we called the use strings “Premature Serialization“ or “Postmature Deserialization“, because we're taking rich information and persisting it in a string, in memory.

So what to do about it?  Get started with Refactoring. 

  1. Make up a name for what the string contains.  “SocialSecurityNumber“ or “ShoeSize“ or something
  2. Create a class by this name, and put a string in it.  For now, the constructor can take a string, and you even make the field public, just to get things going quickly.
  3. Replace the original string by an instance of this class.  Look for other strings that match, and do the same there.
  4. Look for places you manipulate the string, and move that logic into methods on the new class. 

Once you're done, go back and see if you can do it again.  So you went from “string filePath“ to “FilePath filePath“.  But if this file represents, say, an MP3, then repeat the refactoring to create the class “MP3File“. 

Also look for groups of variables that always go together.  Fowler points out the Range pattern, and it's a great one.  When you have a “start“ and “end“ of something, make it into a range.  I once saw this code:

string startDate;
string endDate;

and boy, did I want to have a party there!  This was in C#, where there's already a Date class! 

  1. Replace each with Date instances.  
  2. Replace the pair with a DateRange.
  3. Figure out the semantics of this DateRange, and build a new class around 'em.