Simplifying boolean expressions


i was doing a little fun C# coding on the side and i was running into one of those cases where i had a conditional expression to write and i couldn’t figure out the best way to write it.  it was basically something like:

void Foo(a, b) {
    if (!((a && !b) || (!a && b))) {
        //do stuff
    }
}

i was sitting there playing with this expression until i realized that it could be reduced to just “if (a == b)”.  After that i was thinking “it would be nice if this could just be taken care of for you”.  Similarly we could transform expressions like:

if (a > b || a == b) { //…

into

if (a >= b) { //…

Or could we…

interestingly enough we might not actually be allowed to do that.  Why?  Well lets say the user had written their type with the following members:

bool operator >(T t1, T t2) { return true; }
bool operator ==(T t1, T t2) { return true; }
bool operator >=(T t1, T t2) { return false; }

(egads!  Legal, although discouraged).

Now we’ve gone ahead and changed the legal meaning of your code. 

Here’s another example:  Say we tried to transform the following expression:

if (a > 0 || a > 0) { //…

into

if (a > 0) { //…

Seems fine, but actually it might not be.  Here’s how:

int _a;
int a { get { return _a++; } }

So, in order to get something like this right it seems like we’d have to go deeper than just looking at the parse structure of the code, and we’d have to go deeper to examine the semantics behind it.  in the first case we can’t assume that overloaded operators have the same combinatoric semantics that we’re used to with most of the builtin types, and in the second the expression has side affects, so reducing it will not preserve those changes.  Luckily the compiler can at least tell us if an expression leads to executable code, and in that case we can just assume the expression is unsafe to manipulate. 

Of course, it’s quite possible that one might want to still go through with the transformation because they know that it’s safe for their domain.  For example, this would allow them to simplify:

if ((!((Foo() && !Bar()) || (!Foo() && Bar()))) { //…

into

if (Foo() == Bar()) { //…

This would be OK as long as the user was confident that the semantics of those operators expressions are the same as if they were just dealing with simple boolean expressions.

So i figured i’d provide two expression manipulation tools.  One to just simplify the expression “safely” and one to simplify it “aggressively” as per the caveats i listed above. 

i’d also want to be able to handle code of the form:

if (a) {
   if (b) {
      c;
   } else {
      d;

   }
} else {
   d;
}

and transform it into:

if (a && b) {
    c;
} else {
    d;
}

So now, how do i go about doing this?  Well, the next post will go into the prototype i wrote to get it done.

 If you’d like something like this in the IDE, let me know!

 



 

Edit: Note that this is not about showing you optimizations that can be done with your code.  This is about allowing you to take an overly complex and confusing bit of code and being able to rewrite it into a form that’s much more clear and maintainable.  In the first example i had rather than having to put down some complicated implication rules i was just able to use a simple equivalence which not only was incredibly easy to read but it also helped me understand better what the code meant that i was writing at that point.


Comments (18)

  1. . says:

    pitty you use shitty K&R style braces and IMPLICIT boolean testing which is UNREADABLE in its own right.

  2. Steven says:

    Sounds nice, but it does only work with booleans in certain cases. Your first example won’t work if a and b are ints.

    Won’t the (a>0||a>0) with side effects in operator> cause undefined behaviour, according to the standard, with two post-increments in what is basically the same expression?

  3. David Levine says:

    It’s a nice idea but I think there are many ways to get it wrong and only one to get it right. I would definitely not want it to be an automatic search-and-replace style feature; there are too many issues of unknown side-effects due to data types and hidden code getting executed, especially when doing Boolean operations on generic types.

    I think it would be ok to have a popup window "suggest" a cleaner form, and give the user a manual option to replace it. This leaves it up to the coder to decide whether to use the simplified form.

    It would also be nice to provide a little code snippet compiler to test the alternate form using sample data to test drive the results.

  4. Lars says:

    Arrrgghhh!

    operator> with side effects?

    It doesn’t work if a and b are ints?

    bool operator ==(T t1, T t2) { return true; } ?

    Why did I even bother to study computer science for four years?

    Assuming this is c# why not invent a keyword similar to unsafe called unsound. And only allow these kind of montrosities inside that construct. Like this.

    unsound

    {

    // Inside this block the laws of logic and mathematics no longer apply and you are free to use any kind of side effects and redefinitions you like.

    }

  5. Steven: "Sounds nice, but it does only work with booleans in certain cases. Your first example won’t work if a and b are ints. "

    That code won’t compile if "a" and "b" are ints, so i’m not sure what the problem is there 🙂

    "Won’t the (a>0||a>0) with side effects in operator> cause undefined behaviour, according to the standard, with two post-increments in what is basically the same expression? "

    I don’t believe the behavior is undefined as C# specifies an exact left to right evaluation of expressions.

  6. Lars:"Arrrgghhh!

    operator> with side effects?

    It doesn’t work if a and b are ints?

    bool operator ==(T t1, T t2) { return true; } ?

    Why did I even bother to study computer science for four years?

    Assuming this is c# why not invent a keyword similar to unsafe called unsound. And only allow these kind of montrosities inside that construct. Like this.

    unsound

    {

    // Inside this block the laws of logic and mathematics no longer apply and you are free to use any kind of side effects and redefinitions you like.

    } "

    I agree. But it’s still allowed by the language. (i actually don’t know any language that disallows it).

    Do you have a suggestion on how you would enforce sanity in a language?

    Also… i’m not sure where people are getting the idea that this wouldn’t work with ints…

  7. David: "It’s a nice idea but I think there are many ways to get it wrong and only one to get it right. I would definitely not want it to be an automatic search-and-replace style feature; there are too many issues of unknown side-effects due to data types and hidden code getting executed, especially when doing Boolean operations on generic types.

    I think it would be ok to have a popup window "suggest" a cleaner form, and give the user a manual option to replace it. This leaves it up to the coder to decide whether to use the simplified form.

    It would also be nice to provide a little code snippet compiler to test the alternate form using sample data to test drive the results. "

    I was thinking of something along the lines of a smart tag to allow you to do this. When on an arithmetic expression that could be simplified you’d get the options to do the "safe" and "aggressive" simplification.

  8. It’s called Boolean Algebra and the new C++/CLI compiler supports these optimizations with managed and unmanaged code. The version7 C++ compiler supports this with unmanaged code.

  9. Steven says:

    CN: That code won’t compile if "a" and "b" are ints, so i’m not sure what the problem is there 🙂

    CN: I don’t believe the behavior is undefined as C# specifies an exact left to right evaluation of expressions.

    Ah, it’s C#. You didn’t mention that anywhere and I assumed C/C++ (where my remarks do apply).

  10. damien morton says:

    You might want the option of having a truth table pop up, which you can fill-in, and then have a generated (and optimized) if/then expression created for you.

  11. Josh says:

    Definitely do not do any auto-correcting, and clippy-style popups ("would you like me to fix your expression?") would also suck. The only way it would be acceptable is if it was a function you had to manually invoke through a menu item (which means it would probably be relegated to one of those little used/known features of the IDE, so probably not worth the effort).

    I think these kind of optimizations should be done by the compiler, not the IDE. High level langugage source code is meant to be human readable and easy to understand. A large boolean expression may make more sense when being read, even though it could be reduced to something simpler – though not as obvious.

    For this same reason, I dislike the fact that we need an FxCop rule that tells us "when checking for an emtpy string, check the length instead", so while this code may be more intuitive and readable:

    if (myString == "") or even if (myString == String.Empty)

    it really should be re-written as

    if (myString.Length == 0)

    I understand why the second is faster, but I think that is the type of thing the compiler should handle for you. I should be able to write (myString == "") and know the compiler will simply check the length. A high level language shouldn’t burden the user with having to understand the implementation of the language.

    Did I get off-topic? My point is that users should keep code in the most readable (maintainable) form, and let the compiler manipulate it to get the best performance.

  12. Paul: How does the compiler know if sideaffects occur? Does it do full program analysis to determine this?

    Steve: Ah, i see now. Yes, this is C#.

    Damien: The truth table idea is interesting, but rather quirky. Plus, with only 4 variables it quickly gets out of hand.

    Josh: Sorry if i was unclear. This would not be about auto-correction. This would be about having something like a smart tag at the point of the expression which could inform you of simpler expressions if any existed.

    The point would be that when you actually had *confusing* source it could then find a far less confusing result for you.

    I do agree that optimizations are best in the realm of the compiler, but this isn’t about optimization for me, this is about clarity.

    "A high level language shouldn’t burden the user with having to understand the implementation of the language"

    I somewhat agree. However, i disagree with your example. Your test (myString == "") is not the same as (myString.Length == 0) because of the null checks that will be involved. i.e. the first will never throw an exception, the second might.

    Until these optimizations are put into the compiler and runtime (which i hope will happen over time) it works to share that burden with the language, the compiler, and the author.

    Finally: Isn’t there a String.IsEmpty() that you can call?

    And no, you didn’t get off-topic, i just explained myself badly. This is not a blog about optimization, it is about keeping code in a readonable and maintainable form.

  13. damien morton says:

    What youre describing is very similar to karnaugh maps, or at least, a multivalued variant thereof.

    For complex boolean expressions, I often find myself writing out truth tables and working from there.

    Maybe its just my EE background.

  14. Steven says:

    One great big advantage of something like this would be easy inversion of complex expressions.

    Suppose you have something like

    if (a && (!b || c > 15) || c == 3)

    and want to invert it, you can, of course, just turn that into

    if (!(a && (!b || c > 15) || c == 3))

    but that makes it even more confusing. Let the proposed functionality clean that up.

    Beats figuring it out manually:

    if (!a || (b && c <= 15) && c != 3)

    Then again, others may want the reverse. They might prefer to have the original expression converted into:

    if (c == 3)

    {

    //True

    } else {

    if (a)

    {

    if (!b || c > 15)

    {

    //True

    } else {

    //False

    }

    } else {

    //False

    }

    }

    which makes the flow much easier to follow (to me, at least).

  15. Steven says:

    Well… my code above would be easier to follow if the indenting had been preserved.

  16. Steven: your code would look like:

    if (c == 3) {

    ….//True

    } else {

    ….if (a) {

    ……..if (!b || c > 15) {

    …………//True

    ……..} else {

    …………//False

    ……..}

    ….} else {

    ……..//False

    ….}

    }

    And yes, easy collapsing/expansion of if/elses into boolean expressions would be nice as well and i’ll try to get that.

  17. Damien: Yes, and that’s why i mentioned that CS and EE carriculums often cover this. However, i find kmaps unfriendly and error prone. These operations are much more understandable to me because it stays close to the original boolean logic that they are representing.

    i.e. while both are isomorphisms, i find this form to have a domain that is far closer (and more intuitive to work in) than kmaps.