Windows Azure Blob storage API provided following upload scenarios to upload a blob:
Scenario : You can upload a single blob in N parallel threads
In your code if you set CloudBlobClient.ParallelOperationThreadCount = N; then N parallel threads will be used to upload a single blob
Scenario : You can upload multiple M blobs in one single thread for each blob
In your code if you set CloudBlobClient.ParallelOperationThreadCount = 1; then only ONE threads will be used to upload each individual blob.
Instead of uploading a single blob in parallel, if your target scenario is uploading many blobs you may consider enforcing parallelism at the application layer. This can be achieved by performing a number of simultaneous uploads on N blobs while setting CloudBlobClient.ParallelOperationThreadCount = 1 (which will cause the Storage Client Library to not utilize the parallel upload feature).
When uploading many blobs simultaneously, applications should be aware that the largest blob may take longer than the smaller blobs and start uploading the larger blob first. In addition, if the application is waiting on all blobs to be uploaded before continuing, then the last blob to complete may be the critical path and parallelizing its upload could reduce the overall latency.
So if you upload 10 blobs simultaneously you will have total M * 10 threads uploading your blobs.
Scenario : You can upload M blobs in parallel N threads, which means you have total M*N threads uploading the blob.
In your code if you set CloudBlobClient.ParallelOperationThreadCount = M; and have N threads uploading individual blob, and if you upload 10 blobs simultaneously you will have total M * N* 10 threads uploading your blobs.
It is important to understand the implications of using the parallel single blob upload feature at the same time as parallelizing multiple blob uploads at the application layer.
- If your scenario initiates 30 simultaneous blob uploads using the parallel single blob upload feature, the default CloudBlobClient settings will cause the Storage Client Library to use potentially 240 simultaneous put block operations (8 x30) on a machine with 8 logical processors. In general it is recommended to use the number of logical processors to determine parallelism, in this case setting CloudBlobClient.ParallelOperationThreadCount = 1 should not adversely affect your overall throughput as the Storage Client Library will be performing 30 operations (in this case a put block) simultaneously.
- Additionally, an excessively large number of concurrent operations will have an adverse effect on overall system performance due to ThreadPool demands as well as frequent context switches.
- In general if your application is already providing parallelism you may consider avoiding the parallel upload feature altogether by setting CloudBlobClient.ParallelOperationThreadCount = 1.