I had the opportunity to demonstrate one of the coolest features at SQL PASS 2015 this afternoon. U-SQL Job “Playback”
You can watch the video here:
The video doesn’t have any audio so let me explain a bit what’s going on.
- I ran a simple U-SQL script that read a small amount of data – about 11 TBs. It took about 40 minutes to run.
- I asked for a Parallelism of 1000 when I submitted the job – that means 1000 nodes were spun up in seconds for this job and then when the job concluded – the nodes were released.
- This is a small job in terms of input size. My starting point for a big data job is around 100TB – and we regularly process PETABYTES)
- When a U-SQL job is “compiled” the total work of the job is split apart into small pieces of work. Each piece is called a “vertex”.
- This job has 12,116 vertices (which you can see in the video)
- The vertices are grouped into “stages” – that means these vertexes perform the same operations on the same set of data.
- I asked for 1000 nodes of parallelism. This means that out of the 12,116 vertices, I’ve said “try to do 1000 at a time if possible.”
- When a U-SQL job runs, the U-SQL execution engine records when each vertex started and finished
Now that you know all that … In Data Lake Tools for Visual Studio, once the job is finished we load the vertex timing information and then replay it back in 30 seconds (even if the job took much longer). This is called “Job Playback”. It’s *not* a video, but rather a visualization of what actually happened when the U-SQL job actually ran.
Please note that this is one of *MANY* tools available for U-SQL developers to gain insight into their big data jobs.