Using LINQ to write constraints in OCL style v3


In a comment to my last post, Keith Short challenged me to write a version of this constraint with good error reporting, i.e. to post an error to the task list that gives the names of the duplicate properties, and supports navigation to the element that is in error.  After a bit of head-scratching, I decided I can’t do this easily using the built-in functions (it turns out that the Except() function will not subtract a set from a bag leaving the duplicates).  But, using extension methods, I can build my own supporting functions.  This is what I ended up with.


[ValidationMethod(ValidationCategories.Menu | ValidationCategories.Save)]


private void TestExampleElement(ValidationContext context)


{


  var propnames = this.Properties.Select( p => p.Name);


  var duplicateProps = propnames.Duplicates();



  if (duplicateProps.Count() > 0)


  {


      string errors = duplicateProps.CommaSeparatedList();


      context.LogError(string.Format(“Non-unique property names: {0}”, errors), “Error 1”, this);


  }


 


  var subpropnames = this.Properties.SelectMany(p => p.SubProperties).Select( p => p.Name);


  var duplicateSubProps = subpropnames.Duplicates(); 


  if (duplicateSubProps.Count() > 0)


  {


      string suberrors = duplicateSubProps.CommaSeparatedList();


      context.LogError(String.Format(“Non-unique sub property names: {0}”, suberrors), “Error 2”, this);


  }


}





public static class C


{


  public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> source)


  {


      return source.Aggregate(Enumerable.Empty<T>(),


           (agg, p) => ( source.Count(x => x.Equals(p)) > 1 ?


                                 agg.Union(new T[1] { p } ) : agg ) );


  }


 


  public static string CommaSeparatedList(this IEnumerable<string> source)


  {


    return source.Aggregate(String.Empty,


                   (agg, s) => String.IsNullOrEmpty(agg) ? s : (agg + “, “ + s));


  }


}


The  Duplicates function goes through the source collection, tests whether each item appears in the source more than once, and if so puts it into the result collection.  The errors put into the task list contain a comma-separated list of the duplicated names.  The third parameter to LogError is the model element that carries the error; double-clicking on the error navigates to that element.

Comments (8)

  1. Michael says:

    CommaSeparatedList can be written like this:

    source.Aggregate((agg, s) => agg + “, ” + s);

    The overload of Aggregate that doesn’t take a seed value uses the first value as the seed:

    http://www.atrevido.net/blog/2007/08/16/Practical+Functional+C+Part+III+Loops+Are+Evil.aspx

    It probably doesn’t matter in this case, but I think the Duplicates function performs in N*N?

  2. stevecook says:

    Michael, thanks for the agg tip, especially since I’m always calling it for a non-empty collection.  You are right about the N*N – it’s not particularly easy to make it faster though, so for this exercise it seemed ok.  Any pointers to a more efficient version?

  3. GarethJ says:

    I think if you pass new List<Property>(duplicateProperties).ToArray() to the erro message, you’ll get teh individual properties selected rather than just the parent which might be a bit nicer too.

  4. GarethJ says:

    Actually, perf is what’s going to get interesting here. As these LINQ constructs stack up against the brute force non-indexed collections we have, it’d be easy to end up having traded terseness for perf slightly too far over large models.

    We’ll need to see what the tipping point is to need mopre efficient query processing.

  5. MichaelGiagnocavo says:

    Well, a very quick way is to use a HashSet:

    public static IEnumerable<T> Duplicates2<T>(this IEnumerable<T> source)

    {

       HashSet<T> primary = new HashSet<T>();

       HashSet<T> secondary = new HashSet<T>();

       return source.Where(i => !primary.Add(i) && secondary.Add(i));

    }

    That performs linear as far as I know (but uses N + D memory). If there was a way to set the capacity on the HashSet, then it’d perform in constant time. There are probably more efficient but less ‘cute’ ways of doing it.

    The difference on a list of 10,000 integers is about 2000x. (12000ms versus 6ms).

    I’m thinking there could be a good use for a conditional extension method for sets so you wouldn’t have to add if statements all over the place for logging. Something like:

    propNames.Duplicates().IfAny(dups => { Log… });

  6. MichaelGiagnocavo says:

    Sorry, it wouldn’t be constant time overall; I was just referring to the HashSet usage (when the HashSet grows beyond its capacity, it has to resize which is an O(N) operation).

  7. stevecook says:

    Michael

    Your return source.Where(i => !primary.Add(i) && secondary.Add(i)); causes the method to return an unevaluated expression, which has the nasty property that it only works correctly the first time.  You need to use ToArray() or similar to force evaluation before returning.  Also, I don’t believe secondary is needed: we’re only looking for values that appear more than once, not values that appear exactly twice.

    But thanks for pointing out HashSet – that’s a great new thing.

  8. MichaelGiagnocavo says:

    Oh yes, those hashsets are captured… silly me missing something like that ! :S