Solving complexity by using the right abstraction

Em 2003 escreví este artigo que acabou não sendo publicado. Como ele fala do uso de diferentes linguagens para resolver um problema, creio que pode ser útil no contexto relativo a DSL's e Software Factories.

O curioso deste artigo é que ele surgiu com um quebra-cabeça matemático que foi trazido pelo meu sogro...

Procurei pela minha versão em portugues e, infelizmente não achei - me perdoem por isto. Mas creio que pode ser interessante mesmo assim.

_________________________


Solving Complexity by using the right abstraction

Exploring a multi-language environment means more than reenergize COBOL or RPG programmers, minimize training costs or reengineer old and functional applications with low efforts. All of this is very welcomed, but we can dream even more…

If we assume language as an instrument for modeling applications that solves real problems, and that a good modeling implies in perfect fit match between the language and problem to be solved, we can now foresee a real use of a multi-language environment that can empower both, our software production and business rules modeling.

To exemplify this new power, this article uses a logical puzzle and solves it using C# and a Microsoft research language called AsmL. AsmL is a high level language designed for State Machine specifications that has expressiveness for logical quantifiers, non-determinism and other high order mechanisms. Certainly it is possible to use other languages based on logic or constraint programming, but the main issue is: the effort to produce and maintain code is directly connected to the problem complexity. Complexity can be minimized with abstraction, and language is a tool for abstraction. At the end, diminish complexity is the real power of a multi-language environment.

Our Puzzle

I’m not a puzzle fan, but my father-in-law is. He came visit us months ago and he asked me to begin an Internet search for logical puzzles. After a while, we’ve got a bunch of them, and some were really very interesting.

But in this search, there was one that amazed the most, probably because of its unexpected simplicity. It seemed to me quite a mystical puzzle indeed and I could not resist, and immediately I’ve found myself working on it… The puzzle enunciation is:

Two integers, M and N, between 2 and 99.

Two people: S and P.

To S, someone tells M and N sum value. To P, someone tells M and N product value.

A dialog between P and S occurs:

P – I do not know M and N. Do you?

S – I knew you didn’t know. I do not know also.

P – Now I know them!

S – Now I know them too!

The question is: which are the values for M and N?

Impressive, isn’t it? Well, if you would like to solve it, please stop reading this article now and come back later – I’ll understand and forgive you.

Next sessions I’ll discuss how to solve it using C# and AsmL. Some, like my father-in-law, do think it is not fair to use such a strong tool as a computer. People like him call themselves romantic, and this brute force solution is against their rules. But, as I use this just as a metaphor for other hard business logic as scheduling, min-max problems, and so on, I do think I can be forgiven – so I hope...

Solving our Puzzle

To solve this problem requires understanding the secret meaning expressed in each dialog sentence. Each sentence talks about a property of M and N, and at the very end we hope there is just one M-N pair that obeys all four properties. As you will see, each property depends on the previous, and this makes our solution recursive.

Let’s have a look at the first sentence in this dialogue. This is the easiest one.

When P says he does not know M and N values, we can deduce the first property for this pair of numbers: (Prop1) M and N cannot be primes at the same time. We can deduce that because if M and N were prime at the same time, it values should be obvious to P. As an example, imagine P knows M*N = 22. In this case we have just one solution: the pair (2, 11). All solutions with two primes will suffer the same problem.

The second sentence is a little more intriguing and it took me some time to realize it. When S says that he knew P didn’t know, he means something very strong: (Prop2) for all pair M-N that can results in the sum S knows, none of them has the property Prop1. M an N cannot be both prime (so says Prop1), and now each pair that can produce the sum value S knows can not also be a prime pair. The existence of just one contradicts S second sentence.

Please, take a breath: things are getting more complex.

Our third sentence tells us something we must translate into a new property. P could guess M and N values because the property Prop2 has a strong meaning to him. As he discovered the pair values after knowing Prop2, we can conclude: P knows the solution because (Prop3) for all pairs M-N that can produce the product P knows, just one could make S say Prop2! There is no other way. And our problem continues to be recursive and so will be our algorithm to solve it.

Now, let’s analyze the very last sentence. Probably you are ready to find it yourself after understanding property Prop3 above. Let us see: if S says he now knows M and N, it’s because Prop3 says something valuable to him. So, the new property should be: (Prop4) for all set of pairs M-N that can results in the sum S knows, only one satisfy property Prop3.

Now we’ve got all Properties, we can begin our software that will search a solution, that we hope it is unique. Solving it by software is to make an algorithm that searches those properties for all pair of M and N with values between 2 and 99. The simpler approach is an algorithm that tests Prop1-4 for all M and N space. As we are not looking for optimizations, this way could be good enough.

Next session I solve it using C#. C# has many mechanisms to drive abstraction. The solution I present does not use many of those mechanisms, but it is a nice example of how to solve this kind of problem by brute force. As you will see one session after, this is a far more complex solution compared to the AsmL solution I’m presenting later.

Solving with C#

To solve this problem using a brute force approach, I’ve choose to implement a C# code that reflects our Prop1-4 enunciations. My idea was to think our puzzle space as a matrix with dimensions (2..99, 2..99) where each element must be tested against Prop1-4.

Before taking care of properties, it’s a good idea to begin declaring some useful sets and functions.

First, I’ve declared a static array with all prime numbers we need. I could have used one function to get it, but I’ve found an array easier for this article (of course I’ve used a function to generate it):

static int [] Primes = new int[] { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 };

Second, we need a function to return all integer pairs that can produce a number by its sum (the one S knows). As an example, imagine S knowing 6 as the sum of M and N. So, the list {(2,4), (3,3)} should be the result from this function. Our properties frequently talk about the list of pairs S can suppose from the sum he knows, and our algorithm will make some tests against this list. The function I’ve used is:

static Hashtable getSumPairList( int M )

{

      Hashtable ht = new Hashtable();

      int half = M / 2;

      ``for (int i = half+1; i < (M-1) && i <= 99; i++ )

      ``{

            int j = M-i;

            if ( j > 1 )

            {

                  int[] pair = new int[2] {i, j};

                  ht.Add( j, pair );

            }

      }

      return ht;

}

This function returns a set and I’ve choose to implement it using C# Hashtable class that works like a dictionary. Inside this Hashtable we insert integer pairs (int[2]). This algorithm divides the space of searching, for all i and j, by limiting j to values smaller than M/2, and limiting i to values higher than it. Otherwise, I would suffer the problem of having (2, 6) and (6, 2) in my resultant list, duplicating our efforts. i should be tested against (M-1) to guarantee the integer value 1 is not there.

Of course, I could have used a class Pair instead of an array, but syntax elegance was not my target at this time.

Next helping function does something similar to getSumPairList, returning all pairs of integers that, when multiplied, equals the integer passed as parameter. An implementation is:

static Hashtable getProductPairList( int M )

{

      Hashtable ht = new Hashtable();

      for (int i = 2; i < M; i++ )

      {

            int remainder = M % i;

            if (remainder == 0 )

            ``{

                  int j = M / i;

                  ``if ( i <= 99 && j <= 99 && j > 2)

                  {

                        int[] pair = new int[2] {i, j};

                        try

                        {

                              ``ht.Add( ((i>j)?j:i), pair );

                        ``}

                        ``catch {} // ignore - pair is already inside the set

}

            }

      }

      return ht;

}

The idea is the same as the sum pair list we’ve saw before. This function returns a Hashtable having the pair lowest value as a key - ((i>j)?j:i) expression guarantees this in this code. This function also uses a trick for ignoring duplicate insertions: you can see there is a C# catch command ignoring any error. The error can happen here is a duplicate insertion for a pair that already exists in the Hashtable.

We’ve used C# modulo operator (%) and a zero test against the remainder to guarantee i and j will be exact integers.

Once removing these obstacles, now it’s time to rewrite in C# our first property.

To simplify and optimize a little, I’ve found useful to write a method that stores all Prop1 tests in a matrix that represents all possible pairs. The test is very simple once having our Primes array in hands. The algorithm bellow sets true in each matrix element if Prop1 is not obeyed (or, saying it in another way: it sets true when both M and N are primes):

static bool[,] PropertyOne = new bool[99,99];

static void SolvePropertyOne()

{

      for (int i=0; i < Primes.GetLength(0); i++ )

      {

            for (int j=0; j < Primes.GetLength(0); j++ )

            {

                  PropertyOne[Primes[i],Primes[j]] = true;

            }

      }

}

Prop2 says: for all pair M-N that can result in the sum S knows, none of them has the property Prop1. So we can expect a double for to test all M and N, a call to getSumPairList() to produce the resultant list for each M+N, and finally a loop on the resultant list to test if none of then has the property Prop1. The algorithm bellow performs it storing in a matrix this property for all M and N.

static bool[,] PropertyTwo = new bool[99, 99];

static void SolvePropertyTwo()

{

      for ( int j= 2; j <= 99; j++ )

      ``{

            for( int i = 2; i <= 99; i++ )

            {

      ``            Hashtable htSum = getSumPairList( i + j );

                  bool allObeyProp1 = true;

                  foreach( DictionaryEntry sumPair in htSum )

                  ``{

                        ``// test Property One

                        if ( PropertyOne[ ((int[])(sumPair.Value))[0] ,

  ((int[])(sumPair.Value))[1] ] )

      {

                              allObeyProp1 = false;

                              break;

                        }

                  }

                  PropertyTwo[i, j] = allObeyProp1;

            }

      }

}

A similar approach I’ve used for rewriting Prop3 in C#. Prop3 says: for all pairs M-N that can produce the product P knows, just one could make S obey Prop2. Again we can expect a double loop, a call to getProductPairList(), and a loop through the resultant list to test Prop2. The straightforward method is:

static bool[,] PropertyThree = new bool[99,99];

static void SolvePropertyThree()

{

      for ( int j= 2; j <= 99; j++ )

      ``{

            for( int i = 2; i <= 99; i++ )

            ``{

                  ``Hashtable htProd = getProductPairList(i*j);

                  // count the number of pairs that satisfy Prop2

                  int counter = 0;

                  foreach( DictionaryEntry prodPair in htProd )

                  {

                        if ( PropertyTwo[ ((int[])(prodPair.Value))[0],

  ((int[])(prodPair.Value))[1]] )

                        {

                              counter++;

                        }

                  }

                  PropertyThree[i, j] = ( counter == 1 );

            }

      }

}

Prop4 follows the same pattern. Just to remind you, Prop4 says: for all set of pairs M-N that can results in the sum S knows, only one satisfy Prop3. Our resultant method is:

static bool[,] PropertyFour = new bool[99,99];

static void SolvePropertyFour()

{

   for ( int j = 2; j <= 99; j++ )

   ``{

      for ( int i = 2; i <= 99; i++ )

      ``{

         if ( PropertyTwo[i,j] )

         {

            Hashtable htSum = getSumPairList( i + j );

            if ( htSum.Count <= 1 )

               PropertyFour[i,j] = false;

            else

            {

               int count = 0;

               foreach( DictionaryEntry sumPair in htSum )

               {

                  if ( PropertyThree[((int[])(sumPair.Value))[0],

                                     ((int[])(sumPair.Value))[1] ]

                       || PropertyThree[((int[])(sumPair.Value))[1],

                   ((int[])(sumPair.Value))[0] ]

                     )

                     count++;

              }

              PropertyFour[i,j] = count == 1;

            }

         }

      }

   }

}

The only trick here is to consider only sets returned by getSumPairList() with size greater than 2. If not, invalid pairs as (98, 99) can appear at the final solution, and those pairs are not a valid solution because they could be obvious to both P and S. The 99 limit for N and M could make those pairs as the only solution in such a case.

This is an after work discover. In truth there was an error in our specification that disregards this limit. It was discovered in the computer lab and now we can rephrase Prop4 as: for all set of pairs M-N, with more than one element, that can results in the sum S knows, only one satisfy Prop3. To be honest, I should have used this complement in all Product and Sum list of pairs returned. My point here is: making it only in Prop4 is enough to get the right result.

Our final solution is the intersection of all Prop2-4 matrices. A logical and of each property matrix element give us the only possible pair. Run and see it. I’m not the one that tells what happen at the end of a good film.

What is impressive is that it does not take too long to get the answer. Few seconds and I could give an answer to my romantic father-in-law - for his disappointment!

Solving with AsmL

Two days ago I was wandering in the https://research.microsoft.com site and I could not resist download and test AsmL, even in its beta version. I love computer languages and I’m convinced we need a multiple language software engineering. Next day I read the manual and I decided to solve our Puzzle and compare it with the solution I’ve made with C#.

The idea of using AsmL for this problem came more from its first order logic mechanisms than from its non-determinism or state machine linguistic mechanisms. Other aspect I consider is its ability to work using Visual Studio .NET. If I succeed in showing AsmL improves a lot readability and maintenance, and if we consider that any other aspect (GUI, printing, etc) can be done in C#, VB or other imperative language, may be I can arise the sense of need and importance of a multi-language environment and development. That’s my goal.

I’ll try to explain quite nothing about AsmL. I do think the ones used to Mathematics do not need any knowledge of AsmL to read the solution presented here. If we look at the Prop1-4 definitions, you will see our English sentences and the AsmL are very similar. But if you need to know more about AsmL, you can found a lot more in its manual (see https://research.microsoft.com/fse/asml ).

The logic I’ve used to solve this problem is the same I’ve already used in the C# solution. The main distinction is that in C# we’ve used static matrix to store intermediary results. This clarifies the algorithm and improves performance. In AsmL, this could obscure the algorithm. Performance is secondary in this exercise. If you need performance, you can still optimize the code I’ll show you in the same way we do with other languages (i.e. reducing loops, storing intermediate results and so on).

First of all, let’s go to the main definitions. We should define our M and N limits and our prime set as we did in C#. In AsmL syntax this translates to:

var numbers = {2..99}

var prime = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 }

Second we should define again some functions that will help us when rewriting our properties in AsmL. This time I’ve introduced a new function that tests if a number is in the prime set. The inclusion operator in of AsmL is enough to do this.

isPrime(i as Integer) as Boolean

        return i in prime

Next we need helping functions similar to getSumPairList and getProductPairList we’ve found in our C# solution. Those functions abuse of AsmL declarative constructors and, as you could see, two lines are enough for each one.

Sums(m as Integer) as Set of (Integer, Integer)

        let Si as Set of (Integer) = { i | i in numbers where i <= m/2 }

        return { (i,m-i) | i in Si where i <= 99 and m-i <=99 }

Products(m as Integer) as Set of (Integer, Integer)

        let Si as Set of (Integer) = { i | i in numbers where i < m }

        return { (i,j) | i in Si, j in Si where i*j = m and j>=i}

Constructors, in AsmL, have a simple syntax with a strong semantic. In the AsmL tutorial we can find the following explanation:

Suppose we have a set A that includes the integers from 1 to 20 , and we want to find those numbers that, when doubled, still belong to the set. We will call that set C and express it this way:

C = {i | i in A where 2 * i in A}

The " | " symbol is read as "such that" and the entire expression is read as " C is the set of i such that i is in A where two times i is in A ." The portion of the declaration following the | is called the binder. It binds names (in our case, the name " i ") with values. The binder uses either the word " in " or the " = " sign and an expression containing the values that will be associated with the names. It can include a " where " clause that constrains those values.

Now we are ready to define each property Prop1-4 using AsmL functions. This solution continues to abuse from AsmL constructors and show all potential of quantifiers like forall and exists. Those quantifiers look like the same we’ve used in our English sentences for property definition, and this makes all the beauty: the AsmL language model is so close to our English definition that the translation between them is almost direct. That’s why I could make it work in less than 1 hour – and it was my first program using AsmL!

FirstDontKnow() as Set of (Integer, Integer)

        return { (i,j)| i in prime, j in prime }

SecondDontKnow() as Set of (Integer, Integer)

        let S_FirstDontKnow as Set of (Integer, Integer) = FirstDontKnow()

        return { (i,j) | i in numbers, j in numbers where forall s in Sums(i+j) holds not (s in S_FirstDontKnow) }

       

FirstNowIKnow() as Set of (Integer, Integer)

        let S_SecondDontKnow as Set of (Integer, Integer) = SecondDontKnow()

        return { (i,j) | i in numbers, j in numbers where exists unique s in Products(i*j) where s in S_SecondDontKnow }

               

SecondNowIKnow() as Set of (Integer, Integer)

        let S_FirstNowIKnow as Set of (Integer, Integer) = FirstNowIKnow()

        let S_SecondDontKnow as Set of (Integer, Integer) = SecondDontKnow()

        return { (i,j) | (i,j) in S_SecondDontKnow where exists unique s in Sums(i+j) where s in S_FirstNowIKnow and (Size(Sums(i+j)) > 1) }

Those functions above define intermediate sets using local variables defined by the let statement. I’ve used it for clarity and to make tests easier. It’s easy to write intermediate results (sets) having it in variables.

At the very end, all we have to do is to define our Main function making a list intersection of each property resolution space, using AsmL * operator. I did add two lines more, so I could construct the final solution set by getting only one of each pair of equivalent pairs (ex.: (2,9) instead of { (2,9),(9,2) } ) and print the final result.

Main()

        step 1:

           solution = SecondNowIKnow() * FirstNowIKnow() * SecondDontKnow()

           finalSolution = {(i,j) | (i,j) in solution where (j,i) in solution and i>=j}

           WriteLine( "--" + finalSolution )

That’s all!

Conclusion

More than reach a solution for a puzzle or teach C# or AsmL, this article tries to show you that:

· Correctness depends on good modeling;

· Models are expressed by languages;

· Good part of complexity is on the language mapping from the problem space to the solution implementation space;

The puzzle solutions in this example are an attempt to demonstrate the above. This should be visible after realizing how much complex is the C# solution against the AsmL one.

The good news is: with the advent of .Net Framework and the Visual Studio IDE, we can preview a future where distinct language paradigm can coexist, improving correctness, maintainability and time to market.

I hope soon more commercial products using distinct modeling paradigms can arise at the .Net world. Logical programming, constraint programming and others language styles can not be mapped on imperative languages without loosing simplicity and, sometimes, expressiveness. We can diminish this distance by creating libraries of classes, but this just diminishes a little our problem. C++ STL library can be used as an example here. The final solution, I believe, drives us to a multi-language environment.

For sure we can imagine a lot of hard business problems getting the benefit of this multi-language software development environment. We can preview distinct sets of users working together, taking the best of its specific knowledge, in a very productive team. At the end, we will reach better levels of software engineering.