High Scalability Architecture For Data Types Like Image

Today is April 3rd 2008 and this small article focus on the approach of implementing a scalable solution which involve serving large amount of data such as Images. This is an infrastructure pattern.

Ever wondered how a large web site serve content such as Images to end user? Take for example you have a photo sharing web site. Generally we use SQL Server as the back end storage solution for any application architecture. However, not all data types can be stored in the SQL Server. When it comes to data types such as images, SQL Server stores them in the form of BLOB. But BLOB in SQL Server has its own inherent problems. If you do read or write to BLOB, the whole of the BLOB being transferred over the wire and stored in memory. This can cause network congestion and memory pressure, particularly when there is a considerable load of concurrent users.

If you need to read or write BLOB data such as images, you should first consider the options of storing them directly on a hard disk and storing the physical path or the URL in the database. This reduces load on the database. This basically requires two components. One, a File Server (for storing actual data) and then the Database management System to store references of the same. Microsoft provides solution for both the components in the form of Windows Storage Server and SQL Server. The following diagram explains this solution in a better way:

 

 

Untitled

 

Now, when we understand that dedicated File Server is the solution for storing datat types such as Images, how do we typically implement it? Such a solution is implemented in form of Storage Area Network (SAN) where Network Attached Storage (NAS) Devices are used to store the large data and act as a File Server. In such a solution even SQL Server resides on SAN.

However, if you still want to store BLOB data in the database (maybe because you don’t want to have complex infrastructure of Storage Server), consider the following options to reduce the performance cost:

--> Use chunking to reduce the amount of data transferred over the wire. Chunking involves more round trips, but it places comparatively less load on the server and consumes less network bandwidth. You can use the DataReader.GetBytes (from Application code) to read the data in chunks or use SQL Server-specific commands, such as READTEXT and UPDATEDTEXT, to perform such chunking operations.

--> Avoid moving the BLOB repeatedly because the cost of moving them around can be significant in terms of server and network resources. Consider caching the BLOB on the client side after a read operation.

 

Comments and suggestions, as always, more than welcome.

Rahul Gangwar