Interview question


Here’s a nice simple interview question:

In a given .NET string, assume there are line breaks in standard \r\n form (basically Environment.NewLine).

Write a method that inserts a space between two consecutive line breaks to separate any two line breaks from each other.

Also, anyone venture a guess what is a practical application for such a method?

Comments (47)

  1. Robert says:

    string.Replace("rnrn", "rn rn");

  2. john says:

    Good one Robert, I like it, its simpler than mine.

    string insertSpaceBetweenLineBreaks(string input)

    {

       return Regex.Replace(input, "(rn)(\1)", "$1 $2");

    }

  3. Michael Chandler says:

    Beaten to it. I was thinking exactly the same as Robert.

  4. ben says:

    I’m afraid it’s not that easy. What about rnrnrn ? πŸ™‚

  5. jdecuyper says:

    As Ben mentionned, you need to take care of more than 2 consecutive breaks. I think it is necessary to iterate through the chars and create a new string as you go (which isn’t an optimum solution):

    string text = "hello rnrnrn world!";

    string textResult = "";

    char[] caFromText = text.ToCharArray();

    for (int i = 0; i < caFromText.Length; ++i)

    {

    textResult += caFromText[i];

    if ((int)caFromText[i] == CARRIAGE_RETURN)

    {

     // check if not reaching the end of the array

     if (i + 3 < caFromText.Length)

     {

      if ((int)caFromText[i + 1] == NEW_LINE

      && (int)caFromText[i + 2] == CARRIAGE_RETURN

      && (int)caFromText[i + 3] == NEW_LINE)

      {

       // two consecutive breaks where detected

       textResult += "n ";

       // jump to the next break

       ++i;

      }

     }

    }

    }

  6. john says:

    Hmmm, this version works via loop but I’d like to see a recursive version.

    This version uses a dash instead of space so its easy to see result.

    string insertSpaceBetweenLineBreaks(string input)

    {

       while (Regex.Match(input, "(rn)(\1)").Success)

       {

           input = Regex.Replace(input, "(rn)(\1)", "$1-$1");

       }

       return input;

    }

    The test string I used was "ArnBrnrnCrnrnrnDrnrnrnrnE".

    This is the result

    A

    B

    C

    D

    E

  7. john says:

    Recursive version, same result as previous while loop version.

    string insertSpaceBetweenLineBreaksRec(string input)

    {

       if (!Regex.Match(input, "(rn)(\1)").Success)

       {

           return input;

       }

       else

       {

           input = Regex.Replace(input, "(rn)(\1)", "$1-$1");

           return insertSpaceBetweenLineBreaksRec(input);

       }

    }

  8. Ben is right, Robert’s original way doesn’t work. Longer solutions will work, yes, but what about performance?

    I’m still not happy πŸ™‚ Any other ideas?

  9. BTW it is amazing how the comments reflect my own train of thought as I was solving it.

  10. jonskeet says:

    In terms of why you might want this – SMTP?

  11. Jon – pretty close. Not SMTP, but another web protocol, yes. Thanks for not revealing the answer just yet πŸ˜‰

  12. Sorry, not protocol – I should say "markup language"

  13. KooKiz says:

    Slightly faster :

    public static string Method3(string input)

           {

               int startIndex = 0;

               var sb = new StringBuilder(input.Length * 2);

               while (true)

               {

                   int index = input.IndexOf("rn", startIndex);

                   if (index == -1)

                   {

                       break;

                   }

                   int length = index – startIndex;

                   sb.Append(input.Substring(startIndex, length));

                   if (length == 1)

                   {

                       sb.Append("-");

                   }

                   startIndex = index + 1;

               }

               return sb.ToString();

           }

    Not much of a difference though.

  14. straigtforward solution

    let insertSpaces (s : string) =

       let rec loop (i : int) acc =

           let newIndex = s.IndexOf("rnrn", i)

           if newIndex = -1 then acc + s.Substring(i)

           else loop (newIndex + 2) (acc + s.Substring(i, newIndex – i) + "rn ")

       loop 0 ""

    or

    let insertSpaces2 (s : string) =

       let checkCrlf i =

           if i < s.Length – 1 then s.[i] = ‘r’ && s.[i+1] = ‘n’

           else false

       let rec loop i crlf (acc : System.Text.StringBuilder) =

           if i >= s.Length then acc.ToString()

           else

               let found = checkCrlf i

               match found, crlf with

               | true, true -> loop i false (acc.Append(‘ ‘))

               | true, false -> loop (i + 2) true (acc.Append("rn"))

               | false, _ -> loop (i + 1) false (acc.Append(s.[i]))

       loop 0 false (new System.Text.StringBuilder())

  15. Thomas Levesque says:

    Here’s my take at it :

           public static string InsertSpacesBetweenLineBreaks(string s)

           {

               StringBuilder sb = new StringBuilder();

               bool prevCrLf = false;

               char prev = ‘’;

               foreach (char c in s)

               {

                   if (c == ‘n’ && prev == ‘r’)

                   {

                       if (prevCrLf) sb.Append(‘ ‘);

                       prevCrLf = true;

                   }

                   else if (c != ‘r’)

                   {

                       prevCrLf = false;

                   }

                   sb.Append(c);

                   prev = c;

               }

               return sb.ToString();

           }

  16. Sebastian U says:

    Let’s take the declarative route πŸ˜‰ .

    // 4.0 Join overload

    string.Join(Environment.NewLine, input.Split(new[] { Environment.NewLine }, StringSplitOptions.None).Select(s => s == "" ? " " : s))

    Oh, who said it has to be C# anyway?

    input.Split([| Environment.NewLine |]. StringSplitOptions.None)

    |> Seq.map (function "" -> " " | s -> s)

    |> String.concat Environment.NewLine

  17. Here’s my F# solution

    let addSpaces (data:string) =

       let addStateToList state list =

           match state with

           | 0 -> list

           | 1 -> ‘r’::list

           | 2 -> ‘n’::’r’::list

           | 3 -> ‘r’::’n’::’r’::list

           | _ -> failwith "invalid"

       let list,state=  

           data

           |> Seq.fold ( fun (l,state) cur ->  

               match state,cur with

               | 0,’r’ -> (l,1)

               | 1,’n’ -> (l,2)

               | 2,’r’ -> (l,3)

               | 3,’n’ -> (‘ ‘::’n’::’r’::l,2)

               | _ -> (cur :: (l |> addStateToList state), 0)) (List.empty,0)

       let list = addStateToList state list

       let arr = list |> List.rev |> Array.ofList

       new System.String(arr)

  18. Rik Hemsley says:

    I vote for Sebastian U, because:

    1. Efficient enough (if we make some assumptions).

    2. Takes a very small amount of time to read and comprehend.

  19. ram says:

    What about

    string.Replace("nr", "n r");

  20. banko says:

          private string Process(string str)

           {

               return

                   str

                   .Split(new string[] { Environment.NewLine }, StringSplitOptions.None)

                   .Select(x => x.Length == 0 ? " " : x)

                   .Aggregate((a, x) => a + Environment.NewLine + x);

           }

  21. var s = "SIGNLErnDOUBLErnrnTRIPLErnrnrnENDrn**********rn";

    System.Diagnostics.Trace.Write( s );

    var sa = s.Split( new string[] { Environment.NewLine }, System.StringSplitOptions.None );

    s = string.Empty;

    foreach( string ss in sa )

      s += ( ss == string.Empty ? " " : ss ) + Environment.NewLine;

    System.Diagnostics.Trace.Write( s );

  22. ha-ha. banko placed same solution while I was testing mine! πŸ™‚

  23. string output = Regex.Replace(input, @"(rn)(?=rn)", "$1 ");

  24. Alan J. McFarlane says:

    Looks a bit like a generalised version of Thomas’

    Not at my dev PC, so no unit-tests. πŸ™

    using System;

    using System.Diagnostics;

    using System.Text;

    class SpaceLineBreaks

    {

    static void Main(string[] args)

    {

    //const string Match = "rn";

    const string Match = "YZ";

    const string Insert = "-";

    var p = new SpaceLineBreaks();

    Console.WriteLine("in : " + args[0]);

    string result = p.InsertBetweenTwoMatches(args[0], Match, Insert);

    Console.WriteLine("out: " + result);

    }

    public string InsertBetweenTwoMatches(string input, string match, string insert)

    {

    int matchPos = 0;

    bool lastWasMatch = false;

    var initialCapacity = input.Length;

    var bldr = new StringBuilder(initialCapacity);

    foreach (char cur in input) {

      Debug.Assert(matchPos < match.Length, "Match complete, should have been detected last loop!");

      // Note this (and below) comparison is ordinal, and thus is not good for all scripts…

      if (cur == match[matchPos]) {

         ++matchPos;

         // Don’t write out.

      } else {

         // Failed the match, reset, write out the ones we’ve stripped.

         bldr.Append(match.Substring(0, matchPos));

         lastWasMatch = false;

         matchPos = 0;

         // Need to check the current char again the reset match!

         if (cur == match[matchPos]) {

            ++matchPos;

            // Don’t write out.

         } else {

            bldr.Append(cur);

         }

      }

      //

      if (matchPos == match.Length) {

         matchPos = 0;

         if (!lastWasMatch) {

            lastWasMatch = true;

            bldr.Append(match);

         } else {

            bldr.Append(insert);

            bldr.Append(match);

            Debug.Assert(lastWasMatch == true, "As we insert between every pair.");

         }

      }

    }//for

    return bldr.ToString();

    }

    }

  25. KooKiz says:

    I’m surprised but seems like it’s faster than the split/join method :

    public static string Method4(string input)

           {

               string result = input;

               string tmp = string.Empty;

               while (result.Length != tmp.Length)

               {

                   tmp = result;

                   result = tmp.Replace("rnrn", "rn-rn");

               }

               return result;

           }

    (but the fastest yet on my machine is Thomas Levesque’s )

  26. Jeffrey Nelson says:

    Whipped this up in about ten minutes.  Seems too simple compared to the other solutions presented.

    static String ReplaceNewlines(String s)

    {

     String origNewlines = Environment.NewLine + Environment.NewLine;

     String newNewlines = Environment.NewLine + "-" + Environment.NewLine;

     int offset = Environment.NewLine.Length + 1;

     int startIndex = 0;

     int idx = s.IndexOf(origNewlines, startIndex);

     while (idx >= 0)

     {

       s = s.Replace(origNewlines, newNewlines);

       startIndex += offset;

       idx = s.IndexOf(origNewlines, startIndex);

     }

     return s;

    }

  27. mpg says:

    Speaking as a hiring manager type…

    * I note that only Skeet satisfied the 2nd part of the "interview" question — no job for the rest of you!  πŸ™‚

    * Those of you who wrote complicated code in order to improve performance: two points off, unless you stipulate in advance a simpler (easier to read) solution should be used unless and until a need for optimization is shown

    Anyway, I’m guessing something about HTTP headers being separated from the body?

    -mpg

  28. Jeffrey Nelson says:

    Should have taken more time on mine.  It’s wrong.  It works, but I screwed up the index logic.  Oh well.  Good thing I’m not interviewing at the moment.

  29. Tad Smith says:

    Can’t you simply run it twice?

    string.Replace("rnrn", "rn rn");

    string.Replace("rnrn", "rn rn");

    The first pass would leave a maximum of two consecutive rn sequences and the second pass would clear those up. Might not be the most efficient, but it’s certainly easy to understand and read.

  30. john says:

    @SebastianU – that is some sweet declarative code, thanks!

  31. Robert says:

    My first version didn’t work, but this will. Quite pragmatic, and probably also the fastest.

    test.Replace("rnrn", "rn rn").Replace("rnrn", "rn rn");

  32. PiersH says:

    here’s one:

    static string ReplaceNewlines (string str)

    {

    // check for null

    if (str == null)

    throw new ArgumentNullException("str");

    int cch = str.Length;

    // simple boundary condition, helps later

    if (cch < 4)

    return str;

    // pre-allocate space

    StringBuilder sb = new StringBuilder(cch * 4 / 3 + 1);

    // handle boundary condtion later, instead of in loop

    cch -= 3;

    int ich = 0;

    while (ich < cch)

    {

    char ch = str [ich++];

    sb.Append (ch);

    // check for double newlines, won’t overflow

    if (ch == ‘r’ && str [ich] == ‘n’ && str [ich + 1] == ‘r’ && str [ich + 2] == ‘n’)

    {

    sb.Append ("n ");

    ich++;

    }

    }

    // copy the remaining characters

    sb.Append (str, ich, str.Length – ich);

    return sb.ToString();

    }

  33. PiersH says:

    oops, that should have been:

    // pre-allocate space

    StringBuilder sb = new StringBuilder(cch * 3 / 2 + 1);

  34. ram: Yes! I like ram’s solution the most because it is the most concise, efficient enough and reuses the framework code (which is well tested and tightly optimized).

    Thomas Levesque: your solution is very efficient and easy to read. However if I were the interviewer, I’d still prefer code reuse.

    Jared: I’d need to look into how efficient that is. Looks like your state machine will be linear, which is as fast as it gets. I also need to look at the allocations.

    To all: do you know why string concatenation is bad and why StringBuilder is better?

    banko and salavatov: this solution has a problem of a lot of allocations.

    mpg – maybe πŸ™‚ However I needed it in HTML and <br /> tags.

    Tad Smith, Robert: for a chain of max of N line breaks, you have to call replace at least N-1 times. Your latest solution won’t work with: rnrnrnrn

    Good job everyone!

  35. My take:

    static string Transform(string s)

    {

       var input = new StringReader(s);

       var output = new StringBuilder();

       string line = null;

       while ((line = input.ReadLine()) != null)

           output.AppendLine(line.Equals(string.Empty) ? " " : line);

       return output.ToString();

    }

  36. GSEJ says:

    I’m curious about your response to Tad Smith’s solution, because it certainly seems to work. Starting with

    string s = "rnrnrnrn";

    s.Replace("rnrn", "rn-rn"); // – for clarity

    gives:

    "rn-rnrn-rn"

    repeating the operation gives:

    "rn-rn-rn-rn"

    isn’t that what’s required?

  37. Actually you’re right, my bad, sorry. It works indeed! However it does twice as much work and twice as much allocations then text.Replace("nr", "n r")

  38. GSEJ says:

    text.Replace("nr", "n r") is very clever, and very efficient. You are making an assumption which wasn’t stated in the original problem though: that the only time your string has the characters "nr" is when it’s part of the sequence "rnrn".

    That’s probably a reasonable assumption. If there was any doubt though, I’d probably err on the safe side – correctness must trump performance surely?

  39. banko says:

    The problem is stated ambiguously, since "rn" is not the same as Environment.NewLine, which is platform dependent.

    Correctness should come first , optimization second. What if the source gets compiled under Mono and Environment.NewLine is suddenly "n" ?

  40. Kirill – Just a suggestion: you may want to provide some unit tests.  Many of the answers given so far have subtle differences. The test should specify how the functionality should handle input like "rnr" or "rnn" (many of the answers would handle these differently).

    Note: I know this may go above and beyond an "interview question", but I think the conversation has long since passed that point.

  41. Brett Allen says:

    I want to say this has something to do with a multi-part post?

    Those have some funky rules to them (had to write a method to create one once, was a pain since I had no one to ask questions to)

  42. dbkk says:

    Bad performance, but elegant:

    while(str.Contains("rnrn"))

      str = str.Replace("rnrn", "rn rn");

    If it ends up being on a critical path and too slow, only then I’d try to optimize (loop and use StringBuilder to make a new string).

  43. Andrey Titov says:

    What should happens if there are not only complete "rn", but ‘r’ and ‘n’ alone, e.g. "ArBCDEnnFGrrH"?

  44. Daniel says:

    What hasn’t been mentioned yet is that some of these solutions (e.g. the ones using IndexOf) may end up in an infinite loop in the presence of weightless Unicode characters.

    "nXr".IndexOf("nr") will return 0 if X is a Unicode character that isn’t present in the sorting weight table. See http://esmithy.net/2007/10/11/unicode-surprises/ for details. (However, the character used in that blog post was added to the weight tables in .NET 4)

    string.Replace does not use culture-aware comparisons, so the string will stay unchanged (as expected) and you’ve got an infinite loop.

    Solution: 95% of the time, you’ll want to pass StringComparison.Ordinal to all string methods. Use CurrentCulture only if you really want culture-specific stuff to be relevant. Don’t use InvariantCulture, ever (unless you really, really know what you are doing).

  45. Rik Hemsley says:

    I’ve added all the implementations to a solution, with unit tests. I probably haven’t thought of all the tests necessary, so if you have any more ideas, please let me know.

    I’ve also added simple benchmarking for the implementations which pass the unit tests.

    http://sites.google.com/site/rikkus/kirill-osenkov-s-interview-question

  46. Rik – this is absolutely fantastic! Thanks so much for this!!