Tip 8 – How to write ‘WHERE IN’ style queries using LINQ to Entities


Imagine if you have a table of People and you want to retrieve only those whose the Firstname is in a list of interesting firstnames. This is trivial in SQL, you write something like this:


SELECT * FROM People
WHERE Firstname IN (‘Alex’, ‘Colin’, ‘Danny’, ‘Diego’)


A SQL IN is equivalent to a LINQ Contains


In the LINQ (to objects) world there is no ‘IN’ so you need to reverse the order like and use the Contains method:


var names = new string[] { “Alex”, “Colin”, “Danny”, “Diego” };


var matches = from person in people
        where names.Contains(person.Firstname)
        select person;


Notice that semantically we’ve gone from:


value.IN(set)


in SQL to


set.Contains(value)


In LINQ. The result however is the same.


Support for Contains in .NET 3.5 SP1 vs .NET 4.0


IEnumerable<T>.Contains(T t)will be supported in EF in .NET 4.0, so you can write queries like the LINQ query above in the next version of EF.


Unfortunately though it doesn’t work in .NET 3.5 SP1: LINQ to Entities complains when it encounters a LINQ expression like this because it doesn’t know how to translate calls to Contains to SQL.


But what about all our .NET 3.5 SP1 users out there? What can they do in the meantime?


Workaround for .NET 3.5 SP1


Well there is a workaround courtesy of Colin one of the big brains on the EF team.


The essence of the workaround is to recognize that you can rewrite the above query like this:


var matches = from person in people
        where person.Firstname == “Alex” ||
              person.Firstname == “Colin” || 
              person.Firstname == “Danny” ||
              person.Firstname == “Diego”

        select person;


Certainly this is more ‘wordy’ and a pain to write but it works all the same.


So if we had some utility method that made it easy to create these kind of LINQ expressions we’d be in business.


Well that is exactly what Colin did in our forums a while back, with his utility method in place you can write something like this:


var matches = ctx.People.Where(
        BuildOrExpression<People, string>(
           p => p.Firstname, names
        )
);


This builds an expression that has the same effect as:

var matches = from p in ctx.People
        where names.Contains(p.Firstname)
        select p;

But which more importantly actually works against .NET 3.5 SP1.


Extra Credit…


If you’ve read this far, well done!


Here is the plumbing function that makes this possible:


public static Expression<Func<TElement, bool>> BuildOrExpression<TElement, TValue>(
        Expression<Func<TElement, TValue>> valueSelector, 
        IEnumerable<TValue> values
    )
{     
    if (null == valueSelector) 
        throw new ArgumentNullException(“valueSelector”);


    if (null == values)
        throw new ArgumentNullException(“values”); 
 


    ParameterExpression p = valueSelector.Parameters.Single();


    if (!values.Any())   
        return e => false;


    var equals = values.Select(value =>
        (Expression)Expression.Equal(
             valueSelector.Body,
             Expression.Constant(
                 value,
                 typeof(TValue)
             )
        )
    );

   var body = equals.Aggregate<Expression>(
            (accumulate, equal) => Expression.Or(accumulate, equal)
    ); 

   return Expression.Lambda<Func<TElement, bool>>(body, p);
}

I’m not going to try to explain this method, other than to say it essentially builds a predicate expression for all the values using the valueSelector (i.e. p => p.Firstname) and ORs those predicates together to create an expression for the complete predicate.


For more information on the technique Colin uses here mosey on over to his blog and read this.

Comments (28)

  1. Meta-Me says:

    Hopefully if you’re reading this you’ve noticed that I’ve started a series of Tips recently. The Tips

  2. Martin Laufer says:

    Hi Alex,

    unfortunately in T-SQL there is no easy way to express a recordset of constants like the array initializer in C#. The analogous way would be insertrs into a temporary table.

    With that in mind one can rewrite the query with the IN operator using an equi-join. As far as I remember, LINQ-2-SQL was able to transform a query like

    var matches = from person in people

                 join name in names on person.Firstname equals name

                 select person;

    into the query with the IN operator. I wonder if EF isn’t capable of that. Do I miss something?

    Best regards

    Martin

  3. AlexJ says:

    This query isn’t supported in EF in .NET 3.5 SP1

    But it will be supported (along with Contains()) in .NET 4.0 in the same way L2S supports it today, namely by translation to IN (…)

    Trivia:

    Actually the way this is actually supported is that the EF produces a list of OR predicates in our Canonical Query Trees (provider agnostic trees), and then the SqlClient provider, recognizes that all those predicates share the same parameter on the same side, which means it can translate it into a SQL IN (…)  

    Alex

  4. dahlbyk says:

    I think you want Expression.OrElse (logical OR) rather than Expression.Or (bitwise OR).

    Also, rather than considering Any() a special case, I would use DefaultIfEmpty():

    <pre>var equals = values.Select(value =>

           (Expression)Expression.Equal(

                valueSelector.Body,

                Expression.Constant(

                    value,

                    typeof(TValue)

                )

           )

       ).DefaultIfEmpty(

           Expression.Constant(false)

       );</pre>

    Cheers ~

    Keith

  5. AlexJ says:

    Keith,

    The reason I use Expression.Or(..) is that no matter what we do we can’t convince the database to do a short-circuiting evaluations, so from the EF’s perspective Or and OrElse are identical. So by using Or we ‘produce’ CLR expressions who’s intent can actually be satisfied by the database!

    Nice spot though!

    Alex

  6. Jeff says:

    I made a suggestion of how the 2100 parameter limit could be addressed using xml parameters on the Microsoft .Connect site but it looks like they are not interested in this solution.

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=475984&wa=wsignin1.0

    I really think we need a decent way of handling large mufti valued parameters which doesn’t require hacking the expression tree or changing our queries to call some other workaround method.

  7. Ngoc says:

    Hi Alex,

    How to write a LINQ query like this SQL statement in Entity Framework (with the above workaround)?

    SELECT * FROM People

    WHERE Id = 123 OR Firstname IN (‘Alex’, ‘Colin’, ‘Danny’, ‘Diego’)

  8. Abrar says:

    Hi Alex,

    I am trying to fit the below Buildexpression function in my join query. My query is

    var q = from l in blendedItems

    join v in blendedItemsView

    on l.LineItemID equals v.LineItemID

    where v.Status == _readyStatus

    && strArray.Contains(v.BillingGroupID)

    select l;

    I am able to do this in a saperate statement like this:

    var query = dao.LineItemInterface.Where(BuildContainsExpression<LineItemInterface, string>(l => l.BillingGroupID, strArray));

    Thanks in advance.

    Abrar

  9. AlexJ says:

    @Abrar

    I don’t think I understand what you are asking for. Can you be a bit more explicit for me? I.e. can you write the code you WANT to be able to write?

    Cheers

    Alex

  10. Abrar says:

    Thanks for your prompt reply.

    I have a join query which has a contains function with comma saperated list as an array.

    As .Net Framework 3.5 sp1 doesn’t support Contains function, I am using BuildContainsExpression function by Colin.

    Using this i am able to retrieve values in a simple way but i am not able to fit this function in my Join query.

    Given below is my Join query and the simple BuildContainsExpression function

    My Join Query


    var q = from l in blendedItems

    join v in blendedItemsView

    on l.LineItemID equals v.LineItemID

    where v.Status == "Blended"

    && strArray.Contains(v.BillingGroupID)//should go here

    select l;

    My Simple IN query


    var query = dao.LineItemInterface.Where(BuildContainsExpression<LineItemInterface, string>(l => l.BillingGroupID, strArray));

    I want to fit the Simple query into my Join query.

    Should fit in the commented line ("should go here").

    Thanks in advance.

    Abrar

  11. Maria Dixit says:

    Hi Alex,

    I have a "WHERE IN" query which works fine with one condition, now i want to add one more condition in my existing query.

    Can you please help me out in writing the second condition.

    my second condition is "WHERE cust.Region == _strRegionName"

    below is my query with first condition with "WHERE IN or BuildContainsExpression" which is working fine, i want to add second condition in this query:

    var query = dao.LineItemInterface.Where(BuildContainsExpression<LineItemInterface, string>(l => l.BillingGroupID, strArray));

    static Expression<Func<TElement, bool>> BuildContainsExpression<TElement, TValue>(

                  Expression<Func<TElement, TValue>> valueSelector, IEnumerable<TValue> values)

                   {

                       if (null == valueSelector) { throw new ArgumentNullException("valueSelector"); }

                       if (null == values) { throw new ArgumentNullException("values"); }

                       ParameterExpression p = valueSelector.Parameters.Single();

                       // p => valueSelector(p) == values[0] || valueSelector(p) == …

                       if (!values.Any())

                       {

                           return e => false;

                       }

                       var equals = values.Select(value => (Expression)Expression.Equal(valueSelector.Body, Expression.Constant(value, typeof(TValue))));

                       var body = equals.Aggregate<Expression>((accumulate, equal) => Expression.Or(accumulate, equal));

                       return Expression.Lambda<Func<TElement, bool>>(body, p);

                   }

    Thanks,

    M

  12. AlexJ says:

    @Maria,

    Can you write some pseudo code – i.e. what you want to write, imagine that the language allows it.

    That will help me understand your requirement.

    Alex

  13. AlexJ says:

    @Abrar

    I sure there are better ways, but one option would be to do this:

    (

      from l in blendedItems

      join v in blendedItemsView

      on l.LineItemID equals v.LineItemID

      where v.Status == "Blended"

      select new {l, v}

    )

    .Where(

      BuildContainsExpression<LineItemInterface, string>(x => x.l.BillingGroupID, strArray))

    )

    .Select(x => x.l);

    The database should do a pretty good job of optimizing this…

    Alex

  14. Chris Lomax says:

    I actually found that querying a sproc was faster and easier than limiting yourself with the where in clause in LINQ. I run into the 2100 RPC limit when showing categories on my website and ended up using SPROCS. It went from a 3 second query down to 0.1 seconds. Sometimes its better to pass off to another technology than try to do a work around in LINQ.

    I have just come from L2S to entity and run into this (no contains predicate) limitation. Entity in 3.5 is driving me mad at the minute, there are loads of cool things you can do in L2S that you cant do in entity but they are not going to be developing the L2S framework any further in .NET 4.

    Microsoft, sort you head out and stop developing multiple technologies for the same purpose. Im sick of them doing "quick fixes" until they have plans to release a bigger project or idea (vista)

  15. AlexJ says:

    Chris,

    I understand your frustration with the difference between EF and L2S. Thankfully most of those differences will be gone soon when .NET 4 comes out.

    As for running into the RPC limit, wouldn’t you run into the same problem in L2S too?

    Alex

  16. jiorix says:

    Could any body explain the method BuildOrExpression to me please?

    I’m just not quite understand,thanks!

  17. Mirek says:

    Hello,

    I’ve found BuildOrExpression very useful but now I have a problem. I need to make a query like

    select X, Y from T

    where X in (value1, value2, … valueN)

    or Y in (value1, value2, … valueN)

    Any suggestions ?

  18. Marcelo Delpino says:

    Great post!!

    But one thing let me curious… why do you use null before ==? just style? Or is there any reason?

    Thanks for the Tip 😉

  19. Osa says:

    For those of you using ObjectQuery like me. Below is the extension I use.

    Example:

    ObjectQuery<DocumentTypes> qry =

    ctx.DocumentTypes                        .BuildOrExpression("DocumentTypeId", ids) .OrderBy("it.DocumentTypeName");

    public static ObjectQuery<T> BuildOrExpression<T, TList>(this ObjectQuery<T> qry,

               string column, TList [] list)

           {

               if (list != null && list.Length > 0)

               {

                   string uniqueName = "k" + Guid.NewGuid().ToString("N");

                   ObjectParameter[] parameters = new ObjectParameter[list.Length];

                   System.Text.StringBuilder commandText = new System.Text.StringBuilder();

                   commandText.AppendFormat("(it.{0} = @{1}0", column, uniqueName);

                   parameters[0] = new ObjectParameter(uniqueName + "0", list[0]);

                   for (int i = 1; i < list.Length; ++i)

                   {

                       commandText.AppendFormat(" OR it.{0} = @{1}{2}", column, uniqueName, i);

                       parameters[i] = new ObjectParameter(uniqueName + i.ToString(), list[i]);

                   }

                   commandText.Append(")");

                   qry = qry.Where(commandText.ToString(), parameters);

               }

               return qry;

           }

  20. Awesome extension method. Thanks a ton!!

  21. Vaccano says:

    I am using .NET 4.0 and I am still not able to use contains.  I get this error: "Unable to create a constant value of type <TYPE HERE>.  Only primitive types ('such as Int32, String, and Guid') are supported in this context"  Don't know if I am doing it wrong, but if it is not supported in .net 4.0 then you may want to update your post to indicate that.

  22. ivo says:

    Given the following example:

       var matches = from p in ctx.People

       where listOfIds.Contains(p.Id)

       select p;

    I noticed some preformance issues in my case when 'listOfIds' is a collection of about 16000 records

    I tracked down the problem and it takes about 90% of the time generating the SQL to excecute

    With the following code I was able to optimize this and reduce the execution time by 85%:

               List<Person> people = new List<Person>();

               int index = 0;

               int pageSize = 1000;

               do

               {

                   int skip = index * pageSize;

                   List<int> idsInCurrentPage = listOfIds.Skip(skip).Take(pageSize).ToList();

                   var peopleInCurrentPage = from p in ctx.People

                                               where idsInCurrentPage.Contains(p.Id)

                                               select p;

                   people.AddRange(peopleInCurrentPage.ToList<Person>());

                   index++;

               }

               while (listOfIds.Count > (index * pageSize));

               return people;

    This leads me to beleive that LINQ is using str = str + "<some parameter>" to generate the SQL query instead of using the StringBuilder

  23. ivo says:

    did you receive my comment about the performance issue when the names list has around 16 thousand records?

  24. AlexJ says:

    Sorry ivo – my blog occasionally thinks comments are spam and puts them on hold

    Your approach to send multiple independent request is interesting.

    As for where the problem is, it could be anywhere in the stack, all the way down to how SQL handle long statements. I wonder where the problem gets introduced.

  25. AlexJ says:

    @ivo – PS I've passed you issue onto Kati & Diego from the EF team… so maybe they'll weigh in.

  26. divega says:

    Hello Ivo, thanks for bringing this up to our attention. For a query like this, we have found that most of the time is spent inside EF manipulating expression trees. The cost seems to grow more than linearly with the number of elements in the collection and that is why a workaround like yours helps so much. There is more information and a couple of workarounds (one of them very similar to yours) in this thread in StackOverflow:

    stackoverflow.com/…/why-does-the-contains-operator-degrade-entity-frameworks-performance-so-drama

    The long term plan is to add native support for Contains/IN expression in EF, to avoid expanding Contains to a tree of ORs between equality comparisons as we do today.

    I encourage you to vote for this suggestion at data.uservoice.com/…/2598644-improve-the-performance-of-the-contains-operator.

    Thanks,

    Diego

  27. ivo says:

    thanks!