Conor vs. Indexed View Updates

(as ANSI NULLs is one of the requirements for indexed views, I thought I’d give you some perspective on why the questions I ask you are useful and interesting to better understand queries and various query operations)

 

As currently designed, indexed views tries to strike a balance among:

1. query shapes that can be efficiently matched

2. query shapes that can be efficiently maintained

3. query shapes that can be efficiently created

 

Loosely speaking, indexed views work for basic SPJG (SELECT-PROJECT-JOIN-GROUPBY) queries.  The joins must be inner joins, not self joins, and join on a key.   Each operator has a series of restrictions based on (1),(2),and (3) above. 

 

So how does the optimizer maintain these things?  Well, it helps to have an understanding of what happens with views in general.  Here’s a primer I wrote on the subject awhile back (https://www.sqlskills.com/BLOGS/CONOR/post/Inserts-against-views-An-introduction.aspx).  Overall, the view has a tree definition, and an update against that view is decomposed by substituting the original update tree with an expanded tree for the view definition.  Since the view is ultimately based on base tables, the update queries that are supported in SQL Server are ultimately operations against those base tables (in other words, the server translates your request into an equivalent operation against the base table).  Then the change is performed and everyone is happy.

 

INDEXED views are an extension of this concept – the main difference is that the index is maintained as part of the update to the base table.  (My soon-to-be-released book chapter on Query Optimization goes into more detail on the subject of wide updates).  The set of changes to the base table are further streamed into a set of changes against the indexed view.  The key property is that the exact set of changes to the view can be computed from the specific changes to the base table.  Many of the restrictions in indexed views come from this requirement.  In the academic literature, this is sometimes referred to as the delta algebra or delta logic.

Essentially, the original view definition is modified so that the source table (the one you are modifying in the UPDATE) is replaced with only the set of changes instead of the whole original table.  Here’s a basic example

 create table dbo.t1(col1 int primary key, col2 int, col3 int)
create table dbo.t2(col1 int primary key, col2 int, col3 int)

drop view dbo.v1

CREATE VIEW dbo.v1 with schemabinding as 
select t1.col1 from dbo.t1 as t1 inner join dbo.t2 as t2 ON (t1.col1=t2.col3) 
WHERE t1.col3=5 and t2.col2 = 8
create unique clustered index i1 on dbo.v1(col1)

update v1 set col1=col1+1

The update to the view is seen in the bottom branch of the query, and the delta part is the Table Spool (which replaced t1 in this case)

image

So, the rows being modified are joined with t2 (just like in the view definition), filtered, and projected (the SELECT list).  The resulting delta stream is then applied to the persisted indexed view after some update-specific operations (collapse and a special filter that is mostly internal stuff).

 

Let’s talk about a few of the restrictions:

1. No Self-joins – well, the delta algebra doesn’t work for self joins.  In other words, one can’t take a single update stream for the base table and substitute it into the view definition and then _efficiently_ update the view definition (remember, one of the goals is to maintain views efficiently)

2. ANSI_PADDING, ARITHABORT, ANSI_NULLs, … The main problem here is that these can impact the results returned in the query, and since customers likely don’t want to have 1 index view for each combination of these values, they are restricted.

3. Limited operator support – Finding the set of operators that can be efficiently matched, maintained, and created is a difficult problem – there are many operators, and some of them can not easily be supported in indexed views.  Most of them have odd restrictions like what is described in the CREATE INDEX help page (https://msdn.microsoft.com/en-us/library/ms188783.aspx)

3a. No DISTINCT – The QP needs to be able to uniquely determine the set of rows in the indexed view to change.  DISTINCT, like GROUP BY, “destroys” information in the query tree by collapsing rows.  GROUP BY in indexed views requires COUNT_BIG(*), which is a way to not lose information (but it requires that you modify your query to achieve that).

3b. MIN/MAX, STDEV, VARIANCE, AVG – does not preserve enough information to allow incremental maintenance.

3c. UNION – UNION (not ALL) implicitly includes DISTINCT on the result of a UNION ALL, so it is not allowed for the same reason as DISTINCT.  More generally, UNION ALL makes it hard to decompose the source table for the UPDATE to work (without more extensive modificiations to the view to track which branch of the UNION is the source of the row)

3d. fulltext, ROWSET – these include external information, and these are restricted to prevent it from changing.  It is necessary to prevent a wrong results bug if the external source changes, as indexed views are designed to be kept in-sync with the source.

3e. Subquery – this is partially  performance issue, but some subqueries make it impossible to determine the exact set of rows to change without also looking at rows not being changed.

There are lots of restrictions on indexed views, but I hope this gives you a bit of the “why” on the design.

 

Happy Querying,

 

Conor Cunningham

Technorati Tags: SQL,Indexed Views