Debugging Azure Data Lake Job Failures Made Easy (part 1) - Debug U-SQL job failure of C# custom code

Working with large datasets is hard -- when developers build big data applications, it is impossible to cover the wide variety of input data as test cases. Untested or corrupt data often exposes corner cases in the user code as big data jobs get run at scale. Debugging using error messages and logs takes a lot of painstaking and difficult work to pinpoint the root cause. The challenge of pinpointing a bad subset of data is compounded by the difficulty of reproducing the exact failure state in the cloud environment.

Azure Data Lake Tools for Visual Studio now offers improvements to help developers address this exact issue for user defined C# code in failed U-SQL jobs. When job fails, Azure Data Lake saves a snapshot of the failure state, including input or intermediate data, user code, and system generated code. Using Azure Data Lake Tools for Visual Studio, developers can download the cloud failure environment to a local machine and debug it inside a customized Visual Studio Solution.

Developers now have access to both the corrupt data and the execution state that created the job failure in the cloud. Using the Visual Studio solution, they use native Visual Studio .NET debugging experience to step through code, edit code, run and test a fix on the local machine. The tested fix is applied to the production environment by re-registering the assembly and re-running the job or submitting changes to a CI/CD pipeline.

Contact us at adldevtool@microsoft.com if you have problems or feedback, we shall be happy to hear your voice.