Grokking Report Builder Queries: This is NOT your mother's SQL!

IT People and Report Builder
The natural inclination when IT people see and start to use Report Builder is that they immediately try think of the entire experience in terms of the underlying SQL statements -- in particular the type of SQL statements they are used to writing. Most immediately start to guess what RB's generated SQL statements look like, and many start to conjure up ways they can manipulate RB and/or the underlying report models into generating the SQL statements they think they want. This is a bad thing.

The Report Builder user interface and underlying report models were designed for people who haven't the foggiest idea what "SQL" stands for, and wouldn't know a SQL statement if they saw it on the six o'clock news. To accommodate this, the SQL queries generated are rather non-standard in a number of ways. Accordingly, all the technical reporting knowledge that an IT person brings with them to Report Builder largely gets in the way of their understanding it, and presumably reduces their effectiveness in helping others get up to speed with its more advanced capabilities.

The Inside Scoop
The two most important concepts to realize about RB queries are: (a) filters and selection are totally independent, and (b) row identity is totally inviolate.

Filters and field selection are independent in that selecting a particular field to display in your report, even if that field is defined on a related table, will never reduce the set of rows you get back in your report. In short, RB queries generally use LEFT OUTER joins instead of INNER joins, to ensure that joining to another table to pull a field from it will never remove any rows from the result set. If the user wants rows to go away for any reason (including rows that would normally go away due to an INNER join), they must create an explicit filter to do so.

Row identity is held inviolate in that no field may be added to a group that would change the identity of the rows in that group. For example, Order Date may not be added to a Customer group, because to display all the Order Dates for each customer, the group would have to return one row per Customer-Order, not one row per Customer. In contrast, the user may add #Orders to a Customer group, because #Orders is an aggregate that can return a single value for each Customer. The effect this rigorous constraint on row identity has on the generated SQL is that fields from related tables are (almost) always pulled into the main query by joining to a nested SELECT statement. The primary responsibility of the nested SELECT statement is to internally group and aggregate its results until it arrives at the cardinality required by the containing SELECT statement. Hence, the joins never affect the "row identity" of the main result.

Why We Do It That Way
The cold, hard, truth is that most human beings know nothing about SQL. They never think of their business questions in terms of joins, group by's, having clauses, foreign keys, and optimizer hints. In fact, they generally don't think of their questions as a single "query" or "statement" any more than they would describe the minutiae of an entire day at work in a single sentence. To them, the most natural analogy for query design (and you might want to use this the next time you explain Report Builder to someone) is the manual process that would be required to obtain the same information, i.e. sitting down in front of a big file cabinet with a pad of paper and a calculator, and methodically going through each desired set of files, gathering information, calculating totals, and recording the results. Report Builder's query design experience, and by necessity the resulting SQL queries, have been modeled precisely after this kind of approach. In fact, you may rightly consider each individual field in an RB report to be its own independent query, because in fact it is . Now of course, RB has optimizations in query generation that try to reduce the number of joins used to obtain the information the user wants, but the base case or initial assumption is that every single field is its own query.

Once an IT person groks these essential truths, a whole new world of (IMO) amazingly cool functionality will open up to them as they drop all that SQL baggage and start to think about their queries the way regular people do. Sure, there are limitations: RB can't do some things in this release, and there are occasionally perfomance implications, but by and large it is a fantastically intuitive way to think about real business questions, and it really doesn't perform that badly, considering that the users don't give one iota of thought to their join strategy.