Kicking around a function that formats stuff


Today's "Little Program" is really a "Little Puzzle" that got out of hand.

This started out as a practical question: This code fragment screams out for some sort of simplification. (I've changed the names of the classes.)

class FrogProperty
{
 public string Name { get; private set; }
 public string Value { get; private set; }
 ...
}

class ToadProperty
{
 public string Name { get; private set; }
 public string Value { get; private set; }
 ...
}

var frogStuff = new List<string>();
foreach (var frogProp in FrogProperties) {
  frogStuff.Add(string.Format("{0}: {1}", frogProp.Name, frogProp.Value));
}
frogStuff.Sort();
Munge(frogStuff);

var toadStuff = new List<string>();
foreach (var toadProp in ToadProperties) {
  toadStuff.Add(string.Format("{0} = {1}", toadProp.Name, toadProp.Value));
}
toadStuff.Sort();
Munge(toadStuff);

var catStuff = new List<string>();
foreach (var cat in Cats) {
  catStuff.Add(string.Format("{0}", cat.Name));
}
catStuff.Sort();
Munge(catStuff);

var dogStuff = new List<string>();
foreach (var dogProp in DogProperties) {
  dogStuff.Add(string.Format("{0} {1}", dogProp.Name, dogProp.Value));
}
dogStuff.Sort();
Munge(dogStuff);

...

Clearly, the pattern is

var stuff = new List<string>();
foreach (var thing in thingCollection) {
 stuff.Add(string.Format(formatstring, thing.Name, [optional: thing.Value]));
}
stuff.Sort();
Munge(stuff);

Everything here is pretty straightforward, except for the string.Format part. Can we write a function that takes a thing and formats it in a somewhat flexible manner?

Let's start with the Name-and-Value cases. We might try something like this:

public static string FormatNameValue<T>(this T t, string format)
{
 return string.Format(format, t.Name, t.Value);
}

But then we'd run into trouble, because there is no constraint on T, so the compiler will complain, "I don't know how to get a Name or a Value from an object."

And since Frog­Property and Toad­Property do not have a common base class, you're kind of stuck.

One way out would be to use the new dynamic type:

public static string FormatNameValue<T>(this T t, string format)
{
 dynamic d = t;
 return string.Format(format, d.Name, d.Value);
}

But that won't work in the Name-only case:

cat.FormatNameValue("{0}");

The cat object has a Name but no Value. The attempt to read the Value will raise an exception (even though it is never consumed by the format).

Maybe we can turn to reflection.

public static string FormatNameValue<T>(this T t, string format)
{
 return string.Format(format,
                      typeof(T).GetProperty("Name").GetValue(t, null),
                      typeof(T).GetProperty("Value").GetValue(t, null));
}

This still raises an exception if there is no Value, but we can detect the missing Value before we run into trouble with it.

static object GetPropertyOrNull<T>(this T t, string prop)
{
 var propInfo = typeof(T).GetProperty(prop);
 return propInfo == null ? null : propInfo.GetValue(t, null);
}

public static string FormatNameValue<T>(this T t, string format)
{
 return string.Format(format,
                      t.GetPropertyOrNull("Name"),
                      t.GetPropertyOrNull("Value"));
}

Okay, now we're getting somewhere.

But before getting to deep into this exercise, I should point out that another way to solve this problem is to turn it inside-out. Instead of making the munger understand all of the different objects, why not make each object understand munging?

class FrogProperty : IFormattable
{
 public string Name { get; private set; }
 public string Value { get; private set; }
 public override ToString(string format, IFormatProvider formatProvider)
 {
  switch (format) {
  case "Munge":
   return string.Format(formatProvider,"{0}: {1}", Name, Value);
  default:
   return ToString(); // use object.ToString();
  }
 }
}

class Cat : IFormattable
{
 public string Name { get; private set; }
 public override ToString(string format, IFormatProvider formatProvider)
 {
  switch (format) {
  case "Munge":
   return string.Format(formatProvider,"{0}", Name);
  default:
   return ToString(); // use object.ToString();
  }
 }
}

The generic helper function would then be

var stuff = new List<string>();
foreach (var thing in thingCollection) {
 stuff.Add(string.Format("{0:Munge}", thing);
}
stuff.Sort();
Munge(stuff);

Okay, fine, rain on my little puzzle parade.

Let's ignore this very useful advice and proceed ahead with our puzzle, because we're determined to see how far we can go, even if it's in the wrong direction.

Now that we have Format­Name­Value, we might say, "What about generalizing to cases where we want properties other than Name and Value?" One design would be to pass in a format string and list of properties you want to fill in:

thing.FormatProperties("{0}: {1} (modified by {2})",
                       "Name", "Value", "ModifiedBy");

Our Format­Name­Value function would go something like this:

public static string FormatProperties<T>(
    this T t, string format, params string[] props)
{
 object[] values = new object[props.Length];
 for (var i = 0; i < props.Length; i++) {
  values[i] = typeof(T).GetProperty(props[i]).GetValue(t, null);
 }
 return string.Format(format, values);
}

This suffers from a problem common to most formatters: Once you get more than a few insertions, it becomes hard to figure out which one matches up to what. So I'm going to try something radical:

static Regex identifier = new Regex(@"(?<={)(.*?)(?=[:}])");

// pedants would use
//identifier = new RegEx(@"[_\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}]" +
//       @"[_\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}\d\p{Pc}\p{Mn}\p{Mc}]");

public static string FormatProperties<T>(this T t, string format)
{
  var values = new ArrayList();
  int count = 0;
  format = identifier.Replace(format, (m) => {
    values.Add(typeof(T).GetProperty(m.Value).GetValue(t, null));
    return (count++).ToString();
  });
  return string.Format(format, values.ToArray());
}

Instead of separating the properties from the format, I embed them in the format.

thing.FormatProperties("{Name}: {Value} (modified by {ModifiedBy})");

Note that I explicitly exclude colons from identifiers. That lets me do things like this:

var result =
  (new System.IO.FileInfo(@"C:\Windows\Explorer.exe"))
    .FormatProperties("Created on {CreationTime:F} " +
                      "{Length} bytes in size");

The property names are extracted and replaced with corresponding numbers, but the format string remains, allowing it to be used to alter the final formatting of the property.

Okay, at this point I figured I had gone far enough. The fun had run out, so I decided to stop.

Comments (17)
  1. Joshua says:

    The direction is not wrong. I've seen too many cases of general form in 3rd party library be flawed. Even Object.ToString is wrong. Should have been Object.ToString(IFormatInfo).

  2. lol says:

    It's a common misconception that cats have no value. But one has to ask where YouTube would be without them? http://www.youtube.com/watch

  3. Medinoc says:

    This is starting to look like DebuggerDisplayAttribute.

  4. Rick C says:

    "we're determined to see how far we can go, even if it's in the wrong direction."

    Today, on The Old New Thing, Raymond channels all the people who've written the horrible code he's written about.

  5. Stuart says:

    I'd probably just code this to take a Func<T, string> and use lambdas - because if the classes have common properties but no common interface, that suggests to me you're dealing with objects you can't access the code of.

    void FormatAndMunge<T>(IEnumerable<T> thingCollection, Func<T, string> format) {

     var stuff = new List<string>();

     foreach (var thing in thingCollection) {

       stuff.Add(format(thing));

     }

     stuff.Sort();

     Munge(stuff);

    }

    FormatAndMunge(toads, toad => string.Format("{0} = {1}", toad.Name, toad.Value));

    etc

  6. Harald van Dijk says:

    To be really pedantic, no, pedants wouldn't (or at least: shouldn't) use that: I think you got it right with the non-pedantic version, not only because the C# rules for identifiers don't apply to .NET (so properties may be defined that are not valid C# identifiers), but more importantly because you shouldn't want invalid identifiers to be silently ignored, even if they would just end up causing exceptions later. It's good that you throw an exception for them that clearly points to the problem in the calling code.

    That aside, if the classes can be extended, then my first thought after the generic FormatNameValue version using dynamic, would be to define an interface that provides Name and Value property getters. Having that, depending on the project, I might stop there: there is something to be said for each object implementing a useful ToString, but there is also something to be said for avoiding the code duplication of ToString in all of those classes. It's a matter of weighing the odds: is it more likely that one class will need slightly different formatting, or is it more likely that one caller will need slightly different formatting?

  7. nathan_works says:

    wait, a regex ? Now you've got 2 problems.

  8. Scott Brickey says:

    unless there's a constraint that the code needs to work on existing (unmodifiable) objects, I think I'd just have interfaces interfaces:

    iHasName and iHasNameAndValue

    then just have extension methods:

    MyToString<T>(this T) where T:iHasName

    MyToString<T>(this T) where T:iHasNameAndValue

    could be repeated as necessary for different interfaces depending on the data structure and desired output.

  9. John says:

    @Scott Brickey's solution is something I've used in the past, and to expand upon it when we ran into an existing (unmodifiable) object we leveraged the Adapter pattern.

  10. CarlD says:

    ...when suddenly, and without warning, Raymond invented (a flavor of) string interpolations - a feature that's been in Perl (and others) for years, and is proposed for inclusion in the next official version of the C# language (See: roslyn.codeplex.com/.../570292).   It is a cool feature - one that will be widely used (and abused) if implanted in C#.

  11. Nico says:

    The lambda approach Stuart shows is the first thing that came to mind for me.

    Format strings are already a common source of problems as code evolves, and string interpolation can take that problem to a whole new level of horrible.  Personally I think the only way string interpolation has any business in a language like C# is if it's implemented as a compiler service/translation so that validity and types can be verified at compile-time.

  12. CarlD says:

    @Nico - the proposed C# 6 feature is exactly that - a compile time translation that's checked for types, etc.  The sketch that Raymond produced here is more like Perl interpolations in that it's anything goes and we'll sort it out at runtime.

  13. Nyctef says:

    I disagree with the IFormattable implementation for a couple of reasons - firstly, you've already introduced a constraint that the classes don't have a common base type, so this feels like cheating. Secondly, repeated application of this principle (oh, just stick it all on the objects themselves) tends to end up with massive SRP violations and unmaintainable code since all the classes are ten thousand lines long.

    If you're going to do this "properly," the technically correct answer would be to implement a visitor pattern on objects you want to be able to format (or report, or whatever). The interpolation extension method is a lot more fun, though :)

  14. Medo says:

    +1 for Stuart's solution, it's very similar to what I coded up myself when I read your puzzle. I pulled the string.Format into the FormatAndMunge function and left the formatstring and property accessors as parameters, but comparing the code there's no big gain in that, and in fact it loses a bit of generality and is slightly more complicated.

  15. BWR says:

    Seems like this would have problems with escaped braces. If I had to use a regex, I would probably use something like the following and translate/append each match into a StringBuilder (checking that the last match includes the last character of the format string).

    G(({(?<identifier>[_p{Lu}p{Ll}p{Lt}p{Lm}p{Lo}p{Nl}][_p{Lu}p{Ll}p{Lt}p{Lm}p{Lo}p{Nl}dp{Pc}p{Mn}p{Mc}]*)(:(?<argument>[^}]))?})|{{|}}|[^{}]*)

  16. Matt says:

    Reflection is like CreateRemoteThread. If you're using it for something other than a debugger or actual "meta" program, you're Doing It Wrong.

  17. Arthur van Leeuwen says:

    Somehow this reminds me of Algol 68's FORMAT type (with corresponding denotations). Ofcourse, Algol 68's FORMAT denotations were somewhat like lambda's in and of themselves... allowing you to specify which methods to call when extrapolating the format...

Comments are closed.

Skip to main content