Strings Stink!

In a previous post, I talked about some ideas about how to write Object Oriented code.  Today I’d like to delve a little deeper.

One of my favorite Refactorings is Replace Data Value with Object.  You start with a weak type and move to a stronger one.  You start with code that manipulates a variable, and move to calling methods on a class.

In the aforementioned post, we went from a string to a FilePath, using this refactoring.  Fowler says it’s great for builtin types, like string & int.  That’s because these types have very little structure.  Every instance has some unspoken rules about how they should be handled. 

The absolute worst is string.  As my OO Jedi Master would say, “Strings are smelly“.  One the one hand, you can store any kind of information in a string.  And in C# (and other modern languages), everyone knows what a string is, and how to create, destroy, pass them.  On the other hand, you can store any kind of information in a string.  So any smarts about their contents must be in your code that manipulates them.

When we talked about this idea at coffee, we called the use strings “Premature Serialization“ or “Postmature Deserialization“, because we’re taking rich information and persisting it in a string, in memory.

So what to do about it?  Get started with Refactoring. 

  1. Make up a name for what the string contains.  “SocialSecurityNumber“ or “ShoeSize“ or something

  2. Create a class by this name, and put a string in it.  For now, the constructor can take a string, and you even make the field public, just to get things going quickly.

  3. Replace the original string by an instance of this class.  Look for other strings that match, and do the same there.

  4. Look for places you manipulate the string, and move that logic into methods on the new class. 

Once you’re done, go back and see if you can do it again.  So you went from “string filePath“ to “FilePath filePath“.  But if this file represents, say, an MP3, then repeat the refactoring to create the class “MP3File“. 

Also look for groups of variables that always go together.  Fowler points out the Range pattern, and it’s a great one.  When you have a “start“ and “end“ of something, make it into a range.  I once saw this code:

string startDate;
string endDate;

and boy, did I want to have a party there!  This was in C#, where there’s already a Date class! 

  1. Replace each with Date instances.  

  2. Replace the pair with a DateRange.

  3. Figure out the semantics of this DateRange, and build a new class around ’em.


Comments (13)

  1. Check M. Twice says:

    Do you mean DateTime and TimeSpan, by any chance?

  2. Steve Maine says:

    Nice post! An example of this within the .NET framework would be the System.Uri class. Fundamentally, URI’s are strings — but they are specific types of strings that must conform to certain rules. There’s also a common set of operations that you often perform on URI strings. Hence, the System.Uri class, which encapsulates these rules and operations into a strongly typed class that make it easier for developers to deal with string representations of URI’s.

  3. Steve: Yeah, System.Uri isn’t so strong in this department. That’s one that’s had a lot of discussion here at Microsoft recently.

    Thanks for your thoughts.

  4. Charles Shopsin says:

    Objective C has a really neat concept (I think inherited from smalltalk) called categories. They are sort of like localized inheritance light. What they allow you to do is add your own methods onto other people’s object. So you could add a Pathinfo category to string that would let you do stuff like somestring.UNC. You don’t have access to any of the private stuff, but you don’t when you’re using a raw string either.

    It can be totally abused and people end up with the giganto classes, but it is a really nice way to add methods that SHOULD be on an object but just aren’t, like decimal.GetBytes(). Decimal actually just has a GetBits method, which of course returns an array of 4 ints which is exactly what I would expect of a method called GetBits….

  5. Hans Jergen Ohff says:

    How would all this refactoring affect performance? While Im all for objectifying things sometimes we have to draw the line and be practical.

    I question whether statics at all are pure OO or not. I say no.

  6. Hans Jergen Ohff says:

    To be more specific, mixed objects with instance and static members are not in my view pure OO.

    Maybe if the entire type is static , but not mixed.

  7. Louis Parks says:

    Jay, I wonder how you think of Int32 usage. Rather than Collection.Count, where Count is an Int32, do you advocate that Count should be of type Count? The same concept applies for String.Length, array bounds, etc.

    Is being pure OO sufficient justification for the overhead of creating (and remembering how to use) these additional types?

  8. Hans Jergen Ohff says:

    Its nice to have all this but once you start to actually design a solution and take dependancies on 3rd party libraries and you have deadlines to reach, you do start to throw all of that out the window. Its called being practical.

  9. Hans Jergen Ohff says:

    Unless all this is enforced by the language its not going to see the light of day in a real world solution. Maybe in academia but not in the world of the real. And definately not once when the perf team gets theyre hands on it.

  10. Olle de Zwart says:

    Well Hans ofcourse there are real world limitations, but if people always based their decisions on that argument then we would probally be using .text written in assembler and walking around naked, because it doesn’t provide allot of the overhead you descrive. But in the end we are wearing clothes (well most of us do) and we arn’t using a application written in assembler to have this discussion.

    Someone has to pave the road for us and get the discussion going. I for one can’t think of a good reason why I can’t store allot data I have in strings now in a object type with it’s own set of validation rules.

    Get a intern to do your conversions for you, cheap for you and it’s a way for them to grow into the project to move on to bigger and better things.

  11. Regarding Performance of writing OO code: This isn’t really as interesting a question as everyone makes it out to be.

    I strive for the cleanest, clearest, simplest code I can write. Some time down the line I’ll do some performance analysis. I’ll find that the cause of slowness is something I never would have guessed. When I go to speed things up, I’ll be glad I’m working in clear code.

    Don’t let fear of performance issues make you write bad code.

  12. Louis: In these articles I’m talking about what OO looks like. You get to decide what to use when.

    My goal is to get the clearest code I can.

    In the case of most strings, I end up building logic that is type-specific into client code, because I don’t have a type to put it in.

    So, for String.Length: does it help me write clear code if this returns a dedicated type? Do I find myself repeating algorthims on the Length throughout my code?