LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

Ingredients:

·         Visual Studio 2008 (Beta2 or Higher)

 

Categories: LINQ to Objects

 

Introduction:

LINQ Cookbook, Recipe 11 showed how you can use LINQ queries to perform calculations on sets of data using a set of standard aggregate functions such as Average, and Sum. In this recipe, you will learn how to add an extension method so that you can include your own custom aggregate function in a LINQ query.

This recipe adds two extension methods: StDev (standard deviation) and StDevP (standard deviation for the entire population). Because the extension methods are added to the IEnumerable(Of T) type, you can use the custom aggregate functions in the Into clause of an Aggregate, Group By, or Group Join query clause. Notice that there are two overloads of each extension method: one that takes input values of type IEnumerable(Of Double), and another that takes input values of type IEnumerable(Of T). This enables you to call the custom aggregate functions whether your LINQ query returns a collection of type Double, or any other numeric type. The overload that takes input values of type IEnumerable(Of T) uses the Func(Of T, Double) lambda expression to project a the numeric values as the corresponding values of type Double before calculating the standard deviation. When calculating the standard deviation for values of type Double, you can simply call the StDev() or StDevP() overloads. When calculating the standard deviation for values of numeric types other than Double, you need to pass the value to the StDev(value) or StDevP(value) overloads to ensure that the value is projected as type Double.

Instructions:

·         Create a Console Application.

·         After the End Module statement of the default Module1 module, add the following class, which contains both the StDev and StDevP functions.

Class StatisticalFunctions

 

    Public Shared Function StDev(ByVal values As Double()) As Double

        Return CalculateStDev(values, False)

    End Function

 

    Public Shared Function StDevP(ByVal values As Double()) As Double

        Return CalculateStDev(values, True)

    End Function

 

    Private Shared Function CalculateStDev(ByVal values As Double(), _

                                           ByVal entirePopulation As Boolean) As Double

        Dim count As Integer = 0

        Dim var As Double = 0

        Dim prec As Double = 0

        Dim dSum As Double = 0

        Dim sqrSum As Double = 0

 

        Dim adjustment As Integer = 1

 

        If entirePopulation Then adjustment = 0

 

        For Each val As Double In values

            dSum += val

            sqrSum += val * val

            count += 1

        Next

 

        If count > 1 Then

            var = count * sqrSum – (dSum * dSum)

            prec = var / (dSum * dSum)

 

            ‘ Double is only guaranteed for 15 digits. A difference

            ‘ with a result less than 0.000000000000001 will be considered zero.

            If prec < 0.000000000000001 OrElse var < 0 Then

                var = 0

            Else

                var = var / (count * (count – adjustment))

            End If

 

            Return Math.Sqrt(var)

        End If

 

        Return Nothing

    End Function

 

End Class

 

·         After the StatisticalFunctions class, add the following module to add the extension methods to IEnumerable to calculate the standard deviation for both IEnumerable(Of Double) and IEnumerable(Of T).

Module StatisticalAggregates

 

    ‘ Calculate the stdev value for a collection of type Double.

    <Extension()> _

    Function StDev(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

        Return StatisticalFunctions.StDev(stDevAggregate.ToArray())

    End Function

 

    ‘ Project the collection of generic items as type Double and calculate the stdev value.

    <Extension()> _

    Function StDev(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

                         ByVal selector As Func(Of T, Double)) As Double

        Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

        Return StatisticalFunctions.StDev(values)