Whidbey Readiness Quiz: More on arrays


Continuing examining new types in Whidbey…  My goal is to motivate why we added Whidbey features through simple little quizzes.   For today, let’s say you have several arrays of ints and you want to sum different sub ranges from each one (that is, a different start index and count for each array).  For example, if you start at zero for a count of 1 for every array the sum will be 8.  Write the Sum() method called below. 

 

    int[] set1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

    int[] set2 = { 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 };

    int[] set3 = { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 };

    int[] set4 = { 1, 1, 2, 3, 5, 8, 13, 21 };

    int[] set5 = { 2, 3, 5, 7, 11, 13, 17, 19 };

    int[] set1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

 

    int sum = Sum(/* need to pass the all the sets above, start indices and counts for each */);

           

     Console.WriteLine(“sum = {0}”,sum);

 

Comments (22)

  1. Bryn says:

    I’ll answer this because I have my own question:

    static int Sum(ArraySegment<int>[] segments)

    {

    int sum = 0;

    foreach (ArraySegment<int> segment in segments)

    {

    for (int i = segment.Offset; i < (segment.Offset + segment.Count); i++)

    {

    sum += segment.Array[i];

    }

    }

    return sum;

    }

    Now for the question:

    Is there a good article that lists the new/changed features in Framework 2.0 vs 1.1?

    Oh yes, also, you’ve got a typo. Based on the answer to your example (8) i think the second "set1" needs to be "set6."

  2. Mads Houmann says:

    The inner loop shows quite clearly why ArraySegment should have an indexer.

    And implement IEnumerable<T>

    foreach (ArraySegment<int> segment in segments)

    for (int i = 0; i < segment.Count; i++)

    sum += segment[i];

    foreach (ArraySegment<int> segment in segments)

    foreach (int i in segment)

    sum += i;

  3. Bryn says:

    Mads: Totally agree, i thought the syntax was a little klunky myself. The only reason i could think of was that some of their scenarios might have included algorithms which applied one set of semantics to the original array and another to the segment. Something like "Print the values in the segment black and the rest red." A GetEnumerator definition would be another good solution.

  4. AT says:

    There is suggestion on ProductFeedback related to ArraySegment:

    http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=FDBK11011

    It looks like Microsoft going to implement this – but anyway – go and vote 😉

    P.S> You can add "params" in Sum() signature.

    static int Sum(params ArraySegment<int>[] segments)

    P.P.S> Here is ugly hack for .NET 1.1 😉

    It does not need to contruct useless objects – while it does exactly that you requested 😉

    // Calculate sum of requested subsets step by step 😉

    Console.WriteLine("Sum are {0}",Sum(Sum(set1,1,3),Sum(set2,3,4),Sum(set3,4,5)));

    /// <summary>

    /// Plain simple method that will return sum of all values in array or 0 if there is no any.

    /// Will throw OverflowException or truncate return to lower 32-bits based on checked/unchecked context used.

    /// </summary>

    /// <param name="values">values to summarise</param>

    /// <returns>sum of values or lower 32-bit of it</returns>

    /// <exception cref="OverflowException">in case if return value or some intermidiary result does not fit in int

    /// and function used in checked context</exception>

    static int Sum(params int[] values)

    {

    int sum = 0;

    foreach (int x in values) sum += x;

    return sum;

    }

    /// <summary>

    /// Similar to <see cref="Sum(int[])"/> but will perform sum on sub-range – not on entire array.

    /// </summary>

    /// <param name="arr">source of values to summarise</param>

    /// <param name="start">0-based stating index position</param>

    /// <param name="count">number of element to summarise</param>

    /// <returns>sum of values or lower 32-bit of it</returns>

    /// <exception cref="OverflowException">in case if return value or some intermidiary result does not fit in int

    /// and function used in checked context</exception>

    /// <exception cref="ArgumentOutOfRangeException">if <paramref name="start"/> and <paramref name="count"/> represent wrong sub-range in <paramref name="arr"/></exception>

    static int Sum(int[] arr, int start, int count)

    {

    if (start < 0) throw new ArgumentOutOfRangeException("start", start, "Must be non-negative value");

    if (count < 0) throw new ArgumentOutOfRangeException("count", count, "Count must be non-negative value");

    if (start > arr.Length – count) throw new ArgumentOutOfRangeException("count", count, "Invalid start + count. Trying to read values outside of array");

    int sum = 0;

    for (int x = start; x < start + count; x++) sum += arr[x];

    return sum;

    }

  5. C++ guy says:

    Why is this any better than C++?

    template <typename ResultType, typename SetsIterator>

    ResultType Sum(SetsIterator sets_begin, SetsIterator sets_end) {

    typename ResultType result();

    for (SetsIterator i = sets_begin; i != sets_end; ++i) {

    result = accumulate(i->begin(), i->end(), result);

    }

    return result;

    }

    Even better, this will work for *any* type of container or data type. For example, an array of lists of integers, or a deque of ordered sets of floats.

    And, this will run about 10 times faster than any VM-based language.

  6. Adam Merz says:

    C++ guy: 10 times faster? Whoops, there went your credibility, right out the window.

  7. Philip Rieck says:

    It’d be nice to be able to do :

    static T Sum<T>(ArraySegment<T>[] segments)

    {

    T sum = default(T);

    foreach (ArraySegment<T> segment in segments)

    {

    for (int i = segment.Offset; i < (segment.Offset + segment.Count); i++)

    {

    sum += segment.Array[i];

    }

    }

    return sum;

    }

    but note that there’s currently *no* possible constraint we can add to allow this to compile. Doh! It would be nice to have a Sum<T> or accumulate<t> or calculator<t> or whatever so that performing operations would work. Hope that goes in the hopper for whidbey, or at least whidbey++.

  8. C++ guy says:

    Using a list<vector<int> >, it’s gonna be 10 times faster than any containers in whidbey. Write the code. Measure the performance. Watch in awe.

  9. AT says:

    C++ guy: Can you provide us your test-cases / exact timings ? How big arrays you have used ? Have you compiled .NET / C++ in debug /release mode ? Any processor specific optimisations ? How you measured timing – how you bigger startup time for C# ?

    My estimate that there must not be such a dramatic difference. At most 2 times slower, if not less. The only difference in C# versus C++ can be array boudaries checking – but it’s highly optimised by runtime. Even more – I can expect that C++ will be possibly slower that C# – as C++ unable to optimise STL constructs enumeration and CPU can hit high branch prediction and memory reference resolution penalty.

    Thanks in advance.

  10. C++ guy says:

    Well, if Whidbey can really do this, then you can add me to the list of C++ to Whidbey converts.

    I don’t have access to Whidbey. Don’t you have to be an MVP to get it?

    Here is the C++ code I wrote. I’m using MSVC 6, with STLport’s version of STL, and compiled in optimized mode. I ran the test on my machine, which is has an Athlon XP processor.

    Forget about performance for a moment. Look at how I use my same single Sum() function with static data, a list<vector<int>>, and a deque<set<int>>. With Whidbey, wouldn’t you have to write 3 separate versions of the Sum() function?

    Back to performance. Here’s the results on my machine:

    C:deepsumRelease>deepsum.exe

    Static test, 1000000 iterations: 140 milliseconds.

    list<vector<int>> test, 1000000 iterations: 141 milliseconds, including fill time.

    If someone who had access to Whidbey could write a comparable test and measure the performance, that would be excellent. And feel free to compile this code and test it yourself. But be sure to use STLport’s STL, because it runs at least 10% faster than Microsoft’s implementation.

    #include <windows.h>

    #include <stdio.h>

    #include <vector>

    #include <list>

    #include <set>

    #include <algorithm>

    #include <numeric>

    #include <deque>

    #include <iostream>

    using namespace std;

    // My sample code from the blog.

    // Because I’m using MSVC 6, I have to define the extra SetIterator type.

    template <typename ResultType, typename SetsIterator, typename SetIterator>

    ResultType Sum(SetsIterator sets_begin, SetsIterator sets_end) {

    ResultType result = ResultType();

    for (SetsIterator i = sets_begin; i != sets_end; ++i) {

    result = accumulate<SetIterator, ResultType>(i->begin(), i->end(), result);

    }

    return result;

    }

    // A template for running the SUM function N times

    template <int count, typename ResultType, typename SetsIterator, typename SetIterator>

    void Test(SetsIterator sets_begin, SetsIterator sets_end) {

    for (int i = 0; i < count; ++i)

    ResultType result = Sum<ResultType, SetsIterator, SetIterator>(sets_begin, sets_end);

    }

    struct SetBeginEnd {

    int * begin_;

    int * end_;

    int * begin() { return begin_; }

    int * end() { return end_; }

    typedef int * iterator;

    };

    #define STATICSET(a) { a, a + (sizeof(a) / sizeof(a[0])) },

    int main(int argc, char* argv[]) {

    DWORD start = GetTickCount();

    // Verbatim from the example on the blog

    int set1[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

    int set2[] = { 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 };

    int set3[] = { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 };

    int set4[] = { 1, 1, 2, 3, 5, 8, 13, 21 };

    int set5[] = { 2, 3, 5, 7, 11, 13, 17, 19 };

    int set6[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

    SetBeginEnd sets[] = {

    STATICSET(set1)

    STATICSET(set2)

    STATICSET(set3)

    STATICSET(set4)

    STATICSET(set5)

    STATICSET(set6)

    };

    SetBeginEnd* const sets_begin = sets;

    SetBeginEnd* const sets_end = sets + 6;

    // First, let’s try the totally static case:

    Test<1000000, int, SetBeginEnd*, int*>(sets_begin, sets_end);

    DWORD static_done = GetTickCount();

    // Now lets put all the numbers into a list of vectors:

    list<vector<int> > numbers(6);

    list<vector<int> >::iterator li = numbers.begin();

    SetBeginEnd* i;

    for (i = sets_begin; i != sets_end; ++i)

    copy(i->begin(), i->end(), back_inserter(*li++));

    // And Run the test again.

    Test<1000000, int, list<vector<int> >::iterator, vector<int>::iterator>

    (numbers.begin(), numbers.end());

    DWORD done = GetTickCount();

    cout << "Static test, 1000000 iterations: " << static_done – start

    << " milliseconds." << endl;

    cout << "list<vector<int>> test, 1000000 iterations: " << done – static_done

    << " milliseconds, including fill time." << endl;

    // And to demonstrate how versatile the Sum() is, and how it can be

    // used with any types, let’s do deque of ordered sets:

    deque<set<int> > number_deque(6);

    deque<set<int> >::iterator di = number_deque.begin();

    for (i = sets_begin; i != sets_end; ++i)

    copy(i->begin(), i->end(), insert_iterator<set<int> >(*di, di->begin()));

    Sum<int, deque<set<int> >::iterator, set<int>::iterator>

    (number_deque.begin(), number_deque.end());

    return 0;

    }

  11. AT says:

    C++ guy: You don’t need to be MVP. Get Whidbey beta on http://lab.msdn.microsoft.com/express/ or http://lab.msdn.microsoft.com/vs2005/

    I will try to run your tests on C++ vs. C# Whidbey – but I’m unsure if we are allowed to post benchmark results. I need to check license 🙁

  12. Philip Rieck says:

    C++ Guy: (I scrubbed my conclusions out of this post in the interest of not furthering a holy war on this comment section)

    While I’m also not so sure about posting benchmarks (even if allowed, are they valid on a beta product?)

    The C++ version is faster, but not nearly 10x. In fact, less than 2x (again, not gonna post the benchmarks. Please download the vs2005 oct ctp and play with it).

    However, note that your version doesn’t actually do what the quiz asked – you’re summing all members of the lists, not just a consistent subset (like columns 0-1 or just the first member).

    Also note that the c# version of Sum isn’t generic (it’s just using generics)- not just as you pointed out, but also because it can only take integers. The point of the question was to easily sum a subset of the arrays. Not to be reused across different types of data stores.

    Lastly , the longer c# version (the one that sums a subset) comes in at about 40% of the code listing length. That means less time to write, easier to understand, less potential for bugs, and simpler to modify.

  13. C++ guy says:

    Thanks for the Whidbey download link.

    I would like to play with the C# code. Could you post it?

    Also, did you run the test only an array of arrays of ints? How is the performance using C# dynamic containers, like a list of vectors? At which point, you’d have to write another function, right? And then how do the code listing lengths compare?

  14. C++ guy says:

    Doh. I go to download the C# express Beta, and after verifying my e-mail twice, I get to the last step, click "Continue," and get this web page:

    https://profile.microsoft.com/regsysvalidateemail/msdn/http%3a%2f%2fmsdn.microsoft.com%2fvs2005%2fexpress%2fdl%2fvcsharp%2f

    The system cannot find the file specified.

    How did you all download the C# express Beta?

  15. C++ guy says:

    Thanks for the link. Download in progress…

    In the mean time, could someone post the C# code? I’m not a C# guru, and you would be furthering the progression of a C++ diehard towards C#.

  16. C++ guy says:

    Philip said> Lastly , the longer c# version (the one that sums a subset) comes in at about 40% of the code listing length. That means less time to write, easier to understand, less potential for bugs, and simpler to modify.

    I don’t understand what the 40% of the code listing length means. Would a smaller percentage be better, because it means you spent fewer lines of code int he function? Or a larger percentage, meaning it took few lines of code to set up the test case?

    Here’s the C++ version that sums a range:

    template <typename ResultType, typename SetsIterator, typename SetIterator>

    ResultType Sum(SetsIterator sets_begin, SetsIterator sets_end, int first, int last) {

    ResultType result = ResultType();

    for (SetsIterator i = sets_begin; i != sets_end; ++i) {

    result = accumulate<SetIterator, ResultType>

    (i->begin() + first, i->begin() + last + 1, result);

    }

    return result;

    }

    So, can it be done in C# in fewer than 7 lines of code (not including {} brackets?)

  17. C++ guy says:

    Wow. Wrote the code, compiled it, and the performance is impressive for static data.

    But for an ArrayList of ArrayLists, the performance was on the order of 10 times slower.

    I realize that this is not what the original problem asked. And that is part of the seductiveness of C#. The solutions to these little problems look so darn good. But when you need a flexible solution, you pay a serious performance penalty, or write a customized version for each type you’re using. I’ll stick with my C++ solution which is _generic_and_fast_.

    Here’s the C# code:

    namespace ConsoleApplication1

    {

    class Program

    {

    static int Sum(ArraySegment<int>[] segments)

    {

    int sum = 0;

    foreach (ArraySegment<int> segment in segments)

    {

    for (int i = segment.Offset; i < (segment.Offset + segment.Count); i++)

    {

    sum += segment.Array[i];

    }

    }

    return sum;

    }

    static int GenericSum(System.Collections.ICollection sets)

    {

    int sum = 0;

    foreach (System.Collections.ICollection set in sets)

    {

    foreach (int i in set)

    sum += i;

    }

    return sum;

    }

    static void Main(string[] args)

    {

    int start = Environment.TickCount;

    int[] set1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

    int[] set2 = { 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 };

    int[] set3 = { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19 };

    int[] set4 = { 1, 1, 2, 3, 5, 8, 13, 21 };

    int[] set5 = { 2, 3, 5, 7, 11, 13, 17, 19 };

    int[] set6 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

    ArraySegment<int>[] segments = new ArraySegment<int>[6];

    segments[0] = new ArraySegment<int>(set1);

    segments[1] = new ArraySegment<int>(set2);

    segments[2] = new ArraySegment<int>(set3);

    segments[3] = new ArraySegment<int>(set4);

    segments[4] = new ArraySegment<int>(set5);

    segments[5] = new ArraySegment<int>(set6);

    for (int i = 0; i < 1000000; ++i)

    Sum(segments);

    int static_done = Environment.TickCount;

    System.Collections.ArrayList sets = new System.Collections.ArrayList();

    sets.Add(new System.Collections.ArrayList(set1));

    sets.Add(new System.Collections.ArrayList(set2));

    sets.Add(new System.Collections.ArrayList(set3));

    sets.Add(new System.Collections.ArrayList(set4));

    sets.Add(new System.Collections.ArrayList(set5));

    sets.Add(new System.Collections.ArrayList(set6));

    for (int i = 0; i < 1000000; ++i)

    GenericSum(sets);

    int done = Environment.TickCount;

    Console.WriteLine("1000000 iterations, static: {0}", static_done – start);

    Console.WriteLine("1000000 iterations, ArrayList: {0}", done – static_done);

    }

    }

    }

  18. Philip Rieck says:

    C++ guy:

    Just so you don’t think you’re being ignored.

    I’m not interested in a c++ vs. c# debate, especially here. Sorry I got sucked in even a little bit. If you really want to, point to your blog and perhaps we’ll discuss there.

    I was a c++ developer (I was even called a good one by some). Now I’m not. If you want to stick with c++, great! Have fun, and I bet you’ll do quite well. But please let us discuss cool whidbey features here.

    Thanks!

    Brad:

    Sorry! I’m done.

  19. Adam Merz says:

    C++ guy, just a side note: you would not want to use ArrayList for this solution. Most of the reason your solution was slow was because you weren’t using generic collections.