Design Notes: Row Count


One of the scenarios we have heard feedback around is the ability for the a client of a data service to determine the total number of entities in a set without having to retrieve them all. The following video discusses this scenario in more detail and some of our thoughts regarding how to make the experience better. 

 


Astoria Design Walkthrough: Introducing $count as an URI query option

 

Additional design details:

– The count value would be calculated after $filter expressions are applied.  For example /Customers?$filter=City eq ‘London’ would return a count value made up of only the customers living in London and not all the customers.

– If a query included $top, $skip or $orderby, the count value would be unaffected.  For example /Customers?$top=1&$skip=2&$orderby=Name would return the total number of customers stored by the service.

– If a query includes a $expand, then the count applies to the outer entity set.  For example /Customers?$expand=Orders would return the number of customers stored by the service

 

What do you think?  Would this be useful in your data services applications?

 

Mike Flasko

ADO.NET Data Services, Program Manager

 

This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.

Comments (27)

  1. John Papa says:

    Adding the $count would be helpful. Its not needed all the time in my experiences, but when it is, its nice to be able to do a quick query without all the baggage just to get a count. I had been doing custom service methods to get the counts, which is a lot of code overhead for something simple, so a $count would be nice.

  2. BorekB says:

    I also think your plan is good and that $count will be very welcome.

  3. Andy says:

    Will you also support an HTTP HEAD request also returning a Count, without needing Count as a querystring parameter?

  4. I estimate that about 5% of the queries my apps execute would benefit by a $count operator.

    Ignoring $skip and $top makes sense.

    –rj

  5. Oops.

    Please delete the word "execute" from my previous post.

    –rj

  6. a count operation $count would be fine, but where in the result would be stored on the atom feed? it must be in the content right?

    I mean there is no where else the count could go. but that would mean each entry would have a count element which seems silly.

    You could put as an http header on the result, but again where would you put that value on the client framework? would we have a query result metadata returned in the header? like entity-count=xxxxx

    doing strickly as a scalar value ie

    (from c in service.customers where c.city == "london" select c).Count

    that should return a scalar value. but again that could be moved to service operation which is what it can be done now.

    basically most people wanted to do paging, when you do that you need the count, top and be able to skip. But you have to remember we always had to get that data outside of the query.

    I think it should still be the case. Atom feeds dont leave much space to change the markup, basically you have only the content element to mes about with.(which is what you have done using the xmlns there)

    allow us to get count as a scalar value that is all i think we should need. we can already use service operation to do that but it would be nice to do it on query not as service operation.

  7. sorry crazy stuff you cant get on service operation.

    anyhow,

    see example below:

    var query = (from c in context.Customers

                     where c.City == "London"

                     Select c).Count

    int a = (int)query;

    that is seems ok. now lets make it more complicated

    var query = (from c in context.Customers

                     where c.City == "London" && c.Orders.Count != 0

                     Select c)

    and just to be funny i might want to put a ".Count" at the end of the query.

    now with those we can return counts to filter and return number of items.

    for the case of getting the count AND result set on the same call that is completely different.

    where would i get this info from the query below?

    var query = (from c in context.Customers

                     where c.City == "London"

                     Select c)

    do you go off and create a new method on IQuerable<T> called metadata? or RecordCount?

    No i dont think you should be able to do that. If you want the count do it like everyone else and make another call to the data base to get the row count.

    but if people really want to you could add support for people to write their own operators. ($returnRowCount)

    that is the server side people could do what ever they want just like a change and query interceptor.

    another thing is something they can do now, intercept the query and look at the http headers for a user defined header and populate it with the total row count.

    something else is that you add that ability that was mentioned before of being able to use poco object with special property (ie dictionary property) with metadata or additional properties that the client side dont have (ie metadata like count)

    so

    public class customer

    {

          public string Name { get; set; }

          public Dictionary<string,string> ExtensionData {get; set;}

    }

    but to allow something like this we need:

    IgnorePropertiesAttribute needs to work on the client site (allowing to exclude our own properties from being serialized to the server)

    should have been done from the start, I remove extra properties from the feed on the wrtingentity event (i basically use the ignorePropertiesattribute on my partial classes on the client side and i use a helper class to read the Xelement and remove properties the server dont have.

    but then the client tool kit needs to know how when to do that may be an attribute (ExtentionDataAttribute)

    but you start losing the POCO feel to it.

  8. Nick Williams says:

    It’s great to see this post!

    $count=true or omission/$count=false would be a perfectly acceptable syntax to me for notifying a DS of our intentions.

    $filter precedence is how I would expect also… So long as the number of records actually being returned equals the record count, then it’s proper. $top/$skip are irrelevant when it comes to record counts so that, too, is peachy.  $expand behavior is ideal.

    How about some more info on how you guys were hoping to implement things with the client libraries?

  9. The ExtensionDataAttribute could be used to mark a property so the client framework knows when use it.

    again, i feel thart going that length to give count on query results is a bit over the top.

    just give the http header thing and the count.

    that will solve the problem and still make things simple and if people want to do harder stuff use something like their own intercepting of the entity writing on the server side and injecting XElements into the feed before it is sent. But they will need something to allow them to get the value on the client side.

  10. dont need $count=boolean just $count as having there or not having there is already a boolean.

    to be able to get the scalar value of a count like the

    http://server/context.svc/clients(‘1’)/name/$value

    that return just "daniel"

    or http://server/context.svc/clients(‘1’)/name

    will return <name xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices">daniel</name&gt;

    will be fine.

    but filtering needs count and bad.

  11. pabloc says:

    To answer the question about "where to put the count": if you only ask for the count we’d give you just that value. However, often times applications will want to render the first page of data and it would be great (from the latency perspective) to obtain the first page of data and the count in the same query.

    The plan is to include the count as part of the feed itself. Atom feed elements -just like Atom entries- allow custom elements to be added (in a separate XML namespace), so we have the perfect spot to transport this information to the client. The JSON case is trickier, but we’ll manage.

    Note that it’s pretty hard to do this today with a service operation, because it means you’d have to fully parse the expression language that goes into $filter in order to compute the count of the appropriate subset.

    As for the client API…things can always change, but current thinking is that we’ll add both support for the LINQ .Count() operator (to obtain only the count) and also a separate .IncludeCount() "chainable" operator to request for the count *and* the results, something like this:

    from c in svc.Customers.IncludeCount()

    where c.State == "WA"

    select c;

    Then on the client we’d surface the count in the QueryOperationResponse object we already have. It’s not completely nice (because you have to cast the reference to it), but it’s pretty clean.

    -pablo

  12. Nick Williams says:

    The count alone is useful in situations such as search fields that show number of results for a given query and the likes, but I do imagine the majority of interaction will be done for use in communicating the number of pages to the user.

    I was anticipating, worst case, having to do something like $top=1&$count and ignoring the data returned and just taking the count, but it sounds like you guys have things covered.  .Count and .IncludeCount sound great to me.

    Are there any particular areas (beyond what has been discussed already) that you guys are fishing for feedback in?

  13. pabloc says:

    Actually, I just remembered something that needs to be adjusted from my comment above:

    We can’t really map to LINQ’s .Count(), because Astoria’s count ignores top/skip (by design, the point is to know the total number of entities regardless of paging directives), so we mapped it to a custom method…can’t remember the name, probably a bad name, but distinct from Count().

    @Nick: I think we got good feedback on this one…but there will be more questions to come on other areas we’re designing, so keep it coming!

    Thanks,

    -pablo

  14. William Stacey says:

    Great news.  Seems exactly what we need.

  15. Right Pablo, that is cool, so linq count will only return the total number of records on the returned query, that is fine as expected.

    I think you meant on yur linq example to so something like this.

    (from c in svc.Customers.IncludeCount()

    where c.State == "WA"

    select c).Skip(10).Take(10);

    Then the include count would make sense.

    Would you add support to allow the include of count on expansions too?

    I understand your willingness to give in some form the total count of records on available set of data all in one call, for Latency sake, but would that break the  "query" design? After all isn’t it the mantra that if the consumers of the service wanted to do something special that the system was simple enough that they could build anything on to it?

    daniel Portella

  16. Perhaps giving some way that consumers can get extra stuff would be better. Like PagedQuery<T>. that requires Skip and Top and returns total available records. all then you need to so is think about how the query translates to the client.

    In the end of the day we are always hitting the database twice, just not sure if advocating how people should do things is the right way of going about it.

    daniel portella

  17. They way you have spoken is that this is written into stone already. I am sure include count wont be available on expands silly thing to have ^^.

    Just thought of something:

    [quote]

    The count value would be calculated after $filter expressions are applied

    [/quote]

    But would we be able to use count on filters? like one of my examples above?

    .daniel portella

  18. pabloc says:

    @Daniel: sure, Take()/Skip() should be there (at least Take). I understand your feeling of "breaking the query pattern", but there are certain things that aren’t just doable using the current building blocks, so for those (and mostly only for those) we want to provide built-in capability.

    Regarding the PagedQuery thing as a way to encapsulate the paging stuff…it’s certainly a good idea, we’ll have a discussion with the rest of the folks in the team about it. Right now this is almost what we have, but we don’t have an explicit public class to represent it. I wonder what would go into that class besides yes/no for "retrieve total count".

    Note that strictly speaking you don’t know whether or not we hit the database twice. Our default implementation certainly does, but more sophisticated or domain-specific data sources may avoid the "count query" by having it cached or may be by using some in-band mechanism that retrieves both from the storage system at the same time.

    As for counting expanded items…requirement taken, we’ll discuss it with folks in the team.

    I didn’t understand the last question: "But would we be able to use count on filters? like one of my examples above?"

    Thanks

    -pablo

  19. @pablo: On my last question I was thinking about Counts nested inside filter expressions if counts are only applied after filter are applied doesn’t mean that only the top entity count would be returned?

    I think that just for latency sake is not good enough reason to do all that work for that. If people are getting 4000 milliseconds of latency don’t you think there would be something wrong with the application first? I think returning count would be last thing I would be worrying about.

    About other domain specific stuff and things, that is fine it is good that we are catering to them and all I think that is a must, but also mostly (I suppose) of the user base of data services uses the entity framework, which doesnt really support this scenarios properly see below:

    Using multiple active result sets in stored procedures would be a good case where "metadata" could be returned about a query. On these cases then you would like to return all the data.

    the entity framework only allow actual entities or scalar variables to be returned by the stored procedure, and to make it worse the entity framework doesnt allow entities that dont actually map to areal table of view. Hopefully that will be fixed in together with computed variables.

    As far as I know part of that is also the case for astoria, as it only allow entities to be returned that are contained on the data source (entity data context).

    I would rather see astoria being able to return non data source complex objects (that i can build up onto it) then build options and capabilities may not fit well on the current designing blocks.

    I just like something more intuitive, anyhow, the PagedQuery class could have much more then just a boolean property to set a flag for the IncludeCount, It could manage the whole paging aspect. But again I see now that something like that would not be added.

    .daniel portella

  20. pabloc says:

    @Daniel:

    Nesting: I’m still not sure I understand. Count applies to the set addressed by the URL…I’m not sure to what nesting you’re referring to (the filter expression itself can be composite, but if you need counts on different filter expressions you can do that by issuing the different requests).

    Latency: I understand your perspective, but there is a lot of people that don’t share that opinion and consider latency a very high priority in their list.

    Complex types: yep, 100% agree, I know it’s painful today, we’re looking into ways of bringing custom types into the service interface-level model even if they aren’t in the input model. Not sure where we’ll land yet.

    data + "metadata" (count): this is not so much an issue around mechanisms as it is of APIs for extensibility. We could use a number of things to get both from the database in a single go (MARS is probably not appropriate for this, 2 result-sets, one small for metadata and then the data one, would likely be a better fit). It’s more a matter of balancing hooks/customizations with out-of-the-box capabilities. We wanted the Entity Framework path to work "as is" so we chose a "reasonable" default implementation. For custom data sources implementors have more flexibility, but they have to do everything themselves.

    Thanks,

    -pablo

  21. You got to love the ‘@’ very impersonal. I still find it funny.

    @Pablo:

    Forget about the "nesting" question I have got my answer else where.

    I know the latency can be a big thing for other people I am not discrediting that, I think you have mis understood me, I am saying that the astoria team can come up with a much better solution to this problem.

    Complex types: I hope that version 2 will have support for them. At the moment is pain like you have said. I haven’t looked at all the code for astoria yet (hopefully I will finish in this holidays) I don’t know how difficult it would be to de-serialize an strongly typed object that is returned on a service operation. I haven’t read the "System.Data.Services.Serializers.SyndicationSerializer" class yet so I am not sure if it doesn’t serialize complex types that are not part of the context model.

    You could hit the database in many different ways like you have said before and you are absolutely right to say that is is matter of balance, I think the solution can be more simple and elegant than this. My example using multiple active results sets was a hypothetical example and not a solution to how Astoria is going to implement their internal retrieval of the count of total available records, so whether is appropriate for this or not it is irrelevant otherwise to do that in the very essence should be wrong, in both parts.

    The feeling we are getting is that the decision has already been made and that may be the design notes posted was more of a medium to broadcast the fact that it will be implemented, don’t want to think that the protocol is being undermined but I think this put this process in a dangerous position that what is said is in vain and that may be it more a method of lifting ideas into the design process and not making any real significant impact.

    The forum is open to discussion I asking if astoria I whiling to show the effort to find solutions with the input and help of their community. This transparent development process seems more like the name implies that a barrier (transparent as it is) is present and it is a transparent sound proof wall.

    Apologies for my pessimism I just say as I see it,

    No harm meant

    .daniel portella

  22. It’s been a little while since we’ve written about the future direction of data services, so this post

  23. dvanderboom says:

    Count is essential of course.  Total table count as well as after a filter is applied.  Also important is count as an aggregate function, to count the number of grouped rows: "select x, y, count(*) from z group by x, y".  And when talking about aggregates, functions other than count are also very important for analysis of all that data.  Are these supported, or are there plans for this, through Astoria and in SDS specifically?

  24. dvanderboom says:

    When doing paging, especially with SDS, I imagine some of the datasets will be changing quite a bit.  If I’m viewing page 1000 of many, and hundreds of rows are added or removed at lower page counts by the time I go to the next or previous page, it seems I’m going to have a problem with data page continuity.

    For these circumstances, it would be very helpful to have a paging mechanism that can operate relative to a specific row, such as the first row in a currently-reitreved page, or the next existing one in the case that row has been removed in the meantime.

  25. Mihai Cvasnievschi (mihai@dbi.ie) says:

    For you guys still looking for a work arround for paging I can provide you with a very simple solution:

    (for the example I use Northwind):

    In the Web Data Service class add something like

    [WebGet]

    public int SelectOrdersCount()

    { return this.CurrentDataSource.Orders.Count();}

    In the client you could use:

    proxy.CreateQuery<int>("SelectOrdersCount").ToArray().FirstOrDefault()

    This way you don’t have to return all the results at client side and you could use a query using Skip(), Take() for paging purposes.

  26. pabloc says:

    @dvanderboom: we are not planning to support aggregates in general in the upcoming version. SDS itself will support full relational queries using SQL, so you’ll be able to do aggregates there, and you can chose to expose them as a custom type if you happen to be using Astoria as the service front-end.

    Regarding volatility of data and the implications of paging: stateless paging has this as an intrinsic problem. Note that we do actually do server-driven paging by exchanging a continuation token that is value-based. You can do the same thing from the client by ensuring a known full order and then use the order keys in a $filter clause such that the next page has rows that are "greater than" the current page, and then use $top to limit the number of results. This is also typically better in terms of performance if the underlying database happens to have the right indexes.

    In any case, note that this does not mean that you’ll have no holes or no repeats.

    -pablo