I just found out something neat. VSTS tells you how many samples were taken in kernel mode. Unfortunately, it doesn’t tell you where in kernel mode that time was spent. But, it does tell you if you need to be thinking about kernel-mode time, which makes it an even better tool to use on the front line of your perf investigation.
For example, when profiling a WPF app, this kind of info is super-helpful if you suspect a foul video driver is causing your app to behave badly. I've been investigating a couple of those issues recently, and found Intel VTune & AMD's CodeAnalyst to be helpful for narrowing down kernel time, or at least, which module is taking time.