Typed DataSet - Potential Performance And Security Risk

Are you using Typed DataSet as DTO (data transfer object)? Are you building distributed systems where the DTO goes back and forth including your Smart Client? If yes then I think you should be aware that the most of your DB schema can be easily revealed using my friends ILDASM and FindStr.

It is common pattern creating shared libraries that contain only data definitions. These libraries are shared/deployed usually to both client and the server.

In my example I created simple library called TypedDataSetSharedLibrary.dll. It holds Typed DataSet I generated from AdventureWorks sample database. I ran simple command line as follows:

ildasm.exe TypedDataSetSharedLibrary.dll /text | findstr /C:"ldstr" >"C:\TypedDataSetSharedLibrary.dll.Strings.txt"

Here is the fragment of what I see after opening the resulting file:

IL_00d7:  ldstr      "tableTypeName"
IL_00e4:  ldstr      "vEmployeeDataTable"
IL_001d:  ldstr      "The value for column 'Title' in table 'vEmployee' "
IL_001d:  ldstr      "The value for column 'MiddleName' in table 'vEmplo"
IL_001d:  ldstr      "The value for column 'Suffix' in table 'vEmployee'"
IL_001d:  ldstr      "The value for column 'Phone' in table 'vEmployee' "
IL_001d:  ldstr      "The value for column 'EmailAddress' in table 'vEmp"
IL_001d:  ldstr      "The value for column 'AddressLine2' in table 'vEmp"
IL_001d:  ldstr      "The value for column 'AdditionalContactInfo' in ta"
IL_0059:  ldstr      "XmlSchema"
IL_00a9:  ldstr      "vEmployee"
IL_00c9:  ldstr      "vEmployee"
IL_0031:  ldstr      "vEmployee"
IL_004f:  ldstr      "vEmployee"

It is clear that from such information an attacker may learn a lot about DB schema thus being able to craft her attacks more easily.

Why it is performance risk? Well, it is not but using this approach one could spot it.

Imagine that the above fragment only small representation of huge data set that travels the network. Recently I stumbled on such code. This simple check revealed that the code uses Typed DataSet of about 1000 columns. We assumed that this should be a problem from network throughput perspective and we ran some load tests using VSTS.

The result was pretty expected - almost all bandwidth was utilized by load of few simultaneous users.

Next time you design distributed system - take these into account.

Plus Dino Esposito once published another discussion around DataSets vs. Collections which might be useful too.

 

Enjoy