F# Scripting Zen – Bulk Updating Testcases

As the F# team is busy working to finish up Visual Studio 2010, one task left to complete is to localize the compiler, so that on a Japanese machine the error messages will be in Japanese.

While I’m sure a few Ugly American Programmers might question the value of localized error messages, imagine if the F# compiler told you this every time you made a mistake:

"このタイプのパラメータ値を推定タイプ略語の安定の下では、消去されていません。このドロップ、または並べ替えのタイプのパラメータを、 のタイプは、例えば、略語の使用によるものです< ' a >を= はまたは\ Ñ \ ttypeスワップ< 'は、 ' b > = ' b * 'を1つの。 明示的にこの値の型のパラメータを宣言、 \ Ñ例: \ tlet金< 'は、 ' b > ( ( x 、 y )に:スワップ< ' b 、 ' a >を参照) :スワップ< 'は、 ' b > = (イは、 X )"

Anyways, my job is to update our F# compiler test automation to be localization agonistic. Meaning that tests will work just fine on both English, Japanese, and any of the other dozen or so languages Visual Studio gets localized to.

The majority of our compiler automation works by taking a code file, compiling and running it, and if the process returns with a non-zero exit code report a failure. Negative tests - where we expect the code file not to compile – work by checking the compiler output for specific messages.

The following is a sample F# compiler testcase. Note that the ‘expected message’ is an arbitrary regular expression, which is why the ‘(‘ and ‘)’ need to be escaped with a backslash.

// FSB 1488, Implement redundancy checking for dynamic type test patterns

//<Expects status=warning>\(9,7-9,16\): warning FS0026: This rule will never be matched</Expects>

//<Expects status=warning>\(15,7-15,16\): warning FS0026: This rule will never be matched</Expects>

let _ =

    match box "3" with

    | :? string -> 1

    | :? string -> 1 // check this rule is marked as 'never be matched'

    | _ -> 2

   

let _ =

    match box "3" with

    | :? System.IComparable -> 1

    | :? string -> 1 // check this rule is marked as 'never be matched'

    | _ -> 2

   

exit 0

As you can probably guess, matching against English compiler output is problematic when testing a localized compiler. The F# QA team (consisting of 3 people) discussed a few options, but the design we settled on was simply encode more metadata into the ‘Expects’ tag, and simply not validate specific error text on non-English builds.

So we want to rewrite all of our negative tests to look something like this:

// FSB 1488, Implement redundancy checking for dynamic type test patterns

//<Expects id="FS0026" span=(9,7-9,16) status="warning">This rule will never be matched</Expects>

//<Expects id="FS0026" span=(15,7-15,16) status="warning">This rule will never be matched</Expects>

let _ =

    match box "3" with

    | :? string -> 1

    | :? string -> 1 // check this rule is marked as 'never be matched'

    | _ -> 2

   

let _ =

    match box "3" with

    | :? System.IComparable -> 1

    | :? string -> 1 // check this rule is marked as 'never be matched'

    | _ -> 2

   

exit 0

When the testcase is in this format, then rather than just matching the ‘message text’ we can build up the regular expression which we will match against the compiler output. So we will look for something like: <span> + <message id> + <text> .

Then, on localized installs we simply won’t match the text. (But still validate the line/column span of the error, and the error ID from the compiler.) This, combined with a little loc-specific testing, should be enough.

So how do you fix the testbed? We have ~1,600 automated testcases which use the <Expects> technique, of those ~600 which need to be updated. As much as I love manually going through and tweaking formatting of testcases, hopefully I can put my F# knowledge to good use and whip up an F# script do to this for me.

Defining Types

Before solving a problem in F#, the first step is usually defining new types. In this case, we want to create types to describe the metadata encoded in testcases.

I’d like to draw your attention to how concise F#’s syntax is for declaring types. If this were C# you would need to sprinkle in property getters and setters, curly braces and semicolons. Instead, creating discriminated unions and records is straight forward. In addition, I’ve added a static Parse methods which we will use later.

type MessageType =

    | Warning

    | Error

    | NotIn

    | Success

   

    override this.ToString() =

        match this with

        | Warning -> "warning"

        | Error -> "error"

        | NotIn -> "notin"

        | Success -> "success"

   

    static member Parse(txt : string) =

        match txt.ToUpper() with

        | "WARNING" | "\"WARNING\"" -> Warning

        | "ERROR" | "\"ERROR\"" -> Error

        | "NOTIN" | "\"NOTIN\"" -> NotIn

        | "SUCCESS" | "\"SUCCESS\"" -> Success

        | _ -> failwithf "Unknown message type: [%s]" txt

type MessageRange =

    | NoRange // No error range specified

    | OnePoint of int * int // (a,b)

    | TwoPoints of int * int * int * int // (a,b-c,d)

    static member Parse(txt : string) =

        let onePtRegex = @".*\\\((\d+),(\d+)\\\).*"

        let twoPtRegex = @".*\\\((\d+),(\d+)-(\d+),(\d+)\\\).*"

       

        let onePtMatch = Regex.Match(txt, onePtRegex)

        let twoPtMatch = Regex.Match(txt, twoPtRegex)

        if onePtMatch.Success then

            OnePoint(int onePtMatch.Groups.[1].Value, int onePtMatch.Groups.[2].Value)

        elif twoPtMatch.Success then

            TwoPoints(int twoPtMatch.Groups.[1].Value, int twoPtMatch.Groups.[2].Value,

                      int twoPtMatch.Groups.[3].Value, int twoPtMatch.Groups.[4].Value)

        else

            NoRange

   

type ExpectedMessage =

    {

        MsgType : MessageType

        MsgID : string

        MsgRange : MessageRange

        MsgText : string

    }

   

Parsing (AKA Regular Expression voodoo)

Here comes the fun part: greping through each tescase file and trying to parse it using regular expressions.

I’ll be honest when I say that we haven’t been super-consistent in how to encode error messages. So some tests have error spans, some don’t. Some match the error ID, some don’t.

Admittedly this code could be better commented, but it should be pretty straight forward. Note that the removeTokens function could easily be rewritten to be a bunch of functions composed together (rather than using piplining). Having the separate ‘fixLineForm'X’ methods makes debugging a tad easier.

/// Given the match string for a testcase strip out error spans and compiler error numbers.

/// E.G. "\(9,18-9,21\): warning FS0035: This form of object expression is deprecated."

/// becomes: "This form of object expression is deprecated"

let removeTokens line =

   

    let fixLine regexPattern (groupIdx : int) line =

        let m =

  Regex.Match(

                line,

                regexPattern)

        if m.Success then

            m.Groups.[groupIdx].Value

        else

            line

    // Specify error range (a,b-c,d): error FS????:

    let fixLineForm1 = fixLine @"\\\(\d+,\d+-\d+,\d+\\\)..(error|warning) FS\d\d\d\d..([^<]*)" 2

    // Specify error range (a,b): error FS????:

    let fixLineForm2 = fixLine "\\(\d+,\d+\\): (error|warning) FS\d\d\d\d..([^<]*)" 2

    // Start with error FS????:

    let fixLineForm3 = fixLine @"(error|warning) FS\d\d\d\d..([^<]*)" 2

    // Start with FS????:

    let fixLineForm4 = fixLine @"FS\d\d\d\d..([^<]*)" 1

    // Just error range (Note this must come before line form1)

    let fixLineForm5 = fixLine @"\\\(\d+,\d+-\d+,\d+\\\)..([^<]*)" 1

    // Just error FS????

    let fixLineForm6 = fixLine @"(error|warning) FS\d\d\d\d([^<]*)" 2

    line |> fixLineForm1 |> fixLineForm2

         |> fixLineForm3 |> fixLineForm4

         |> fixLineForm5

/// If the line contains an <Expects> block, parse out the error ID, span, and text
/// and rewrite it making them explicit.

let fixExpectedMessageLine line =

   

    let isMessageLineRegex = "//.*<Expect[^>]*status=([^>]+)>([^<]*)</Expect.+"

   

    let m = Regex.Match(line, isMessageLineRegex)

   

    if m.Success then

        // If we've identified it as a line containing testcase metadata,

        // parse it and rewrite it.

        let message =

                {

                    MsgType = MessageType.Parse(m.Groups.[1].Value)

         MsgID = (let id = Regex.Match(m.Groups.[2].Value, ".*(FS\d\d\d\d).*")

                                if id.Success then id.Groups.[1].Value

                                else "FS0191")

                    MsgRange = MessageRange.Parse(m.Groups.[2].Value)

                    MsgText = removeTokens (m.Groups.[2].Value)

                }

               

        let spanPart =

            match message.MsgRange with

            | NoRange -> ""

   | OnePoint(x,y) -> sprintf "span=\"(%d,%d)\" " x y

            | TwoPoints(a,b,c,d) -> sprintf "span=\"(%d,%d,%d,%d)\" " a b c d

           

        if message.MsgType = Error || message.MsgType = Warning then

           

            sprintf

                "//<Expects id=\"%s\" %sstatus=\"%s\">%s</Expects>"

                message.MsgID spanPart (message.MsgType.ToString()) message.MsgText

           

        else

            // Only rewrite negative tests, skip SUCCESS or NOTIN tests...

  line

    else

        // This line didn't contain any metadata information

        line

Fixing Testcases

The work to actually fix testcases requires a few utility functions. I’ll cover each one in order.

First is my favorite four-line sequence expression: return all files under a given folder. Using yield! inside of a sequence expression is quite sexy.

let rec filesUnder basePath =

    seq {

        yield! Directory.GetFiles(basePath)

        for subDir in Directory.GetDirectories(basePath) do

            yield! filesUnder subDir }

Next is creating a type called FixedFile, which keeps track of any changes we made to the source code. (That is, if we need to rewrite any lines containing testcase metadata.) The code simply tries to modify each line, and if the updated array of strings is different then we know the file has been modified.

You cannot compare two arrays to see if they are equal in C#, because it will use referential equality (comparing pointers.) The F# code on the other hand uses structural equality (comparing array elements).

type FixedFile = { FilePath : string; FileFixed : bool; UpdatedLines : string[] }

           

let fixFile filePath =

    let lines = File.ReadAllLines(filePath)

  let fixedLines = lines |> Array.map fixExpectedMessageLine

    // Note use of structural equality

    {

        FilePath = filePath

        FileFixed = (lines <> fixedLines)

        UpdatedLines = fixedLines

    }

Next is a couple of functions to checkout a file for edit using using Source Depot. (Shelling out to tf.exe to use Visual Studio Team Foundation Server would work just as well.)

/// Spawns a new process. Returns (exit code, stdout)

let shellExecute executablePath workingDir args =

        let startInfo = new ProcessStartInfo()

        startInfo.FileName <- executablePath

        startInfo.WorkingDirectory <- workingDir

        startInfo.Arguments <- args

       

        startInfo.UseShellExecute <- false

        startInfo.CreateNoWindow <- true

        startInfo.RedirectStandardOutput <- true

        let proc = Process.Start(startInfo)

        proc.WaitForExit()

        (proc.ExitCode, proc.StandardOutput.ReadToEnd())

/// Checks out a given file for edit

let checkoutFile filePath =

    printfn "Checking out: %s" filePath

    let pathToSD = @"d:\Tools\SourceDepot\sd.exe"

    let enlistmentPath = @"D:\dd\cambridge_staging_3\src\"

   

    // Add working directory so SD knows which client to use...

    let (ec, txt) = shellExecute pathToSD enlistmentPath ("edit " + filePath)

   

    printfn "SD exited with code %d\n%s" ec txt

    ()

Putting it Together

Unfortunately our script is somewhat anti-climactic. To fix all 600 compiler testcases the actual work is done in six lines of code:

@"D:\dd\cambridge_staging_3\src\tests\fsharpqa\Source"

|> filesUnder

|> Seq.map fixFile

|> Seq.filter (fun fixedFile -> fixedFile.FileFixed)

|> Seq.iter (fun ff -> checkoutFile ff.FilePath

                       File.WriteAllLines(ff.FilePath, ff.UpdatedLines))

So about 167 lines later, I have a script file that automates a task that would have easily taken a couple work days to complete. This F# script leveraged the existing IO and Regular Expressions libraries built into .NET, and can easily be packaged into a class library and shared with other developers.

So when you next encounter a problem you want to automate, consider looking to see if an F# script is the right solution for you.