Batching Brainteaser Explained

So, as some of you pointed out, the batching brainteaser does print out the "intersection" of GroupA and GroupB. But how exactly does this work?

Recall that when you access metadata on an item list during a task invocation, you will trigger batching - and this will essentially cause the item list to get filtered, causing the task to be invoked multiple times for each unique "batch" of items satisfied by the criteria you use to batch.

So, if you want to loop, you'd use meta-data value that is unique - such as %(Identity). The following example causes what would be considered as looping. Say you have an items defined as shown:

<ItemGroup>
<GroupA Include="file1.txt"/>
<GroupA Include="file2.txt"/>
<GroupA Include="file3.txt"/>
<GroupA Include="file4.txt"/>
</ItemGroup>

 <ItemGroup>
<GroupB Include="file1.txt"/>
<GroupB Include="file3.txt"/>
<GroupB Include="file5.txt"/>
</ItemGroup>

A call to the Message task as shown would cause four Message task invocations:

<Message Text="Item @(GroupA) has identity %(GroupA.Identity)"/>

In this case, we have explicitly qualified the identity metadata by using the name of the item, but it is redudant because there is only one item in use when we invoke the task. However, when multiple items are involved, not qualifying the metadata will mean that that piece of metadata will get used against all items that are involved in the task invocation. The following task invocation example makes this clear:

<!-- Print a header -->
<Message Text="|GroupA| - |Identity| - |GroupB|"/>
<Message Text="|@(GroupA)| - |%(Identity)| - |@(GroupB)|"/>

The output clearly shows the multiple batches that are formed because of this invocation when two item lists are involved, when using an unqualified metadata accessor like we used: 

|GroupA| - |Identity| - |GroupB|
|file1.txt| - |file1.txt| - |file1.txt|
|file2.txt| - |file2.txt| - ||
|file3.txt| - |file3.txt| - |file3.txt|
|file4.txt| - |file4.txt| - ||
|| - |file5.txt| - |file5.txt|

Once again, if you keep in mind that the items involved are filtered according to each unique value of the metadata that batching occurs on, then this makes sense.

So now, all you have to do really is to apply a condition that eliminates those batches that have empty values of either GroupA or GroupB - which leads to what we had as our CreateItem call in our brainteaser. Look at it again with that sharper batching eye and see if the answer emerges :)

<CreateItem Include="@(GroupA)" Condition="'%(Identity)' != '' and ' and '@(GroupB)' != ''">
<Output TaskParameter="Include" ItemName="GroupC"/>
</CreateItem>

So what do you think? Too confusing or too cool? Would love to hear your thoughts!

[Author: Faisal Mohamood]