# Cross Join Crazies and Scalar Value Function Optimization

Just on a flight out to TechReady, and took some time to finally pull this article together. And I apologise right now – I did promise my next post was going to be an MDX primer, but this one has been kicking around in my head now for a while. I’ll also add that this is really something you might want to try – I’ve found it works, and, might be something useful to have in your ‘box of tricks’.

Going through ItzIk BenGan’s - T-Sql tuning, he explains how, when evaluating a join between two tables, Sql will perform a logical cross join between the two, and eliminates the rows that don't satisfy the join condition. The left and right bit in the statement affect the behaviour to return non matching rows for the outer table. So we ought to be able to be able to make some use of that as a cheeky optimization - and it turns out I think I can show that you can to turn a scalar function into an inline TVF function, which can then be used in Cross apply. Several customers have pointed out to me that using this style of conversion they can often double their query performance.

To start, how often have you seen this style of code (the schema and data generation for this is here)

declare @v1 varchar(max), @v2 varchar(max)
select @v1 = name from t1 where pk = 1;
select @v2 = name from t2 where pk = 1;

select @v1, @v2;

If that cross join optimization is really happening, and the row elimination really taking place - Sql should be able to efficiently limit the rows so lets get the assignment in one hit:

declare @v1 varchar(max), @v2 varchar(max)

select @v1=t1.name, @v2=t2.Name
from t1 cross join t2
where t1.pk = 1 and t2.pk = 1

select @v1, @v2;

Looking at the plans - I do indeed get a seek into the tables. Interesting, you might say, but, how can I use this to my advantage?

It allows us to recode some of your multi-statement scalar valued functions into inline table valued functions that can be used in cross apply.  Why is that useful? It seems quite a few of my customers have noticed they get better performance out of the equivalent TVF when cross-applied, rather than having a scalar function called inline.

For example, consider this function to return the two aggregates in a single row:

create function dbo.GetSumOfBestScoreAndBestMeasureForId(@id int)
returns int
as
begin
declare @bestScore int;
select @bestScore = max(score) from t1Scores where pk = @id;
declare @bestMeasure int;
select @bestMeasure = max([weight]) from t2Measures where pk = @id
return @bestScore + @bestMeasure;
end

This represents a fairly typical function I see in implementations. It will run fairly well, but takes two seconds on my machine, against 10,000 rows, for this query :

select id, dbo.GetSumOfBestScoreAndBestMeasureForId(id)
from tId;

So lets rewrite it to use the cross join, to get the data as one set:

declare @id int = 2; -- arbitary test value

declare @bestScore int, @bestMeasure int

select @bestScore = max(score), @bestMeasure = MAX([weight]) from t1Scores
cross Join t2Measures
where t1Scores.pk = @id and t2Measures.pk = @id

select @bestScore + @bestMeasure;

Now, bringing that into a TVF:

create function dbo.GetSumOfBestScoreAndBestMeasureForIdCrossApply (@id int)
returns table
as
return select max(score) + MAX([weight]) as SumBestResult from t1Scores
cross Join t2Measures
where t1Scores.pk = @id and t2Measures.pk = @id;

Now we can compare the two side by side:

-- returns CPU time = 1248 ms,  elapsed time = 1481 ms
select id, T.SumBestResult
from tId cross apply  dbo.GetSumOfBestScoreAndBestMeasureForIdCrossApply(id) T

-- returns CPU time = 2028 ms,  elapsed time = 2173 ms
select id, dbo.GetSumOfBestScoreAndBestMeasureForId(id)
from tId;

As always – test, test and test for your and data, but using cross join can be used to bring that multi-statement scalar function down into an inline TVF for some scenarios. In my case the performance only doubled – we’d normally looking for a bigger gain, however, this query is fairly simple to write, so may be worth your time investigating. I’d also add here the IO stats seem phenominal, as you might expect with a cross join – so I can’t stress enough the testing bit.

Cheers – Ryan

Originally posted by Ryan Simpson on 21 March 2010 http://rynsim.spaces.live.com/blog/cns!1DA5A63F849536B6!1054.entry

Tags