Power BI

Content by Charles Sterling

Dogfooding File Size distribution

One of my customers was asking how well we handled large files at which point, i pointed him to our Dogfooding numbers.  Unfortunately these post don’t contain file sizes and dropped Brian a piece of mail asking for the size distribution (Where else but Microsoft can you drop an exec a piece of mail to count a bunch of files for you?)

Not only did he take time out to check the 67 million file sizes but he also took the time out to explain why it appears as if there are only ~10 million files in the count…A major optimization over some of the other version control systems that make real copies for things like branches.

 (From Brian Harry)

What it counts are the number of distinct paths in the system.  We branch quite a lot and branching actually only copies the file content when the files is actually changed in a branch.  As a result, there are many fewer distinct file contents than there are distinct file paths (many paths reference the same file content).  There are approximately 9.2 million unique file contents.  The distribution is as follows:

0-1MB                   8,977,270

1-2 MB                  97,295

2-3 MB                  42,978

3-5 MB                  30,564

5-10 MB               22,892

10-20 MB             17,946

>20 MB                 9,623