How to list millions of blob from Windows Azure Storage in less amount of time?

In a situation when you have millions of blobs in a Azure Storage container, it may take hours to list all of your blobs. This is mostly because huge bandwidth overhead due to uncompressed XML about container/blob data.


To expedite displaying blob list in Azure Storage With StorageClient API, you can use ListBlobsWithPrefix API by utilizing the prefix parameter as it will return a list of all blobs in that directory with that prefix. 


Depending on the names of your blobs, you’ll want to choose the right set of requests to make in parallel.


Example 1:

If all your blobs were named with random GUID’s:

-           Call  ListBlobsWithPrefix(“container/0”)”

-           You will get a list all blobs that have names starting with a 0. 


Example 2:

For a huge list of blobs you can, in parallel, issue the same request for “container/1”, “container/2”, …, “container/f”.

This way all hexadecimal letters will be covered and you will get a list if blobs starting from each hexadecimal character.


Example 3:

If your blob list contains a collection of English words

-           You can choose to make one request for each letter of the alphabet. 


Please be sure that he prefix (with ListBlobsWithPrefix API) can be multiple characters so you could allow the deserialization to occur in parallel, taking advantages of multiple cores if your Azure VM has multiple CPU cores or the machine where this code is running had multiple  cores.


Comments (0)

Skip to main content