SYSK 150: RTF2PlainText


The other day I needed to display an RTF text stored in a database in a SQL Reporting Services report…  After doing some research, the team came to a conclusion that there is no easy way to do that, especially given the time we had left on the project.  The decision was made to strip-off the RTF formatting characters and display it as plain text.  A co-worker of mine, Scott Jennings, a.k.a. the “RegEx-Man”, came up with the following expression to do the job:


Regex.Replace(Regex.Match(rtf, @”\x5cviewkind4[^ ]*(.+)\x5cpar”).Groups[1].Value, @”[\n\r\f]|(\x5cpar)|(\x5c[a-zA-Z0-9]+)”, “”);


 


We’ve done some preliminary testing and it looks good so far…  However, if you come up with any improvements, kindly post them to this blog so all could benefit.


 

Comments (8)

  1. John Ingres says:

    I suppose you are using RegEx in a UDF server side; so SQL is stripping the tags, not Reporting Services, right?

    In that context (RexEx in a UDF), shouldn’t the Compiled option be used since the UDF is called row by row? (precompiling would be even better, but it is not always an option)

  2. Eric says:

    I tried your RegEx expression but got an empty string back. I also tried variations but could not fix it (my RegEx level is still 101;)

    I am using it exactly as posted. Do you have a clue what could be wrong?

  3. irenak says:

    Our original plan was to use a .NET class that’s invoked on the server side by SSRS report… But then we figured out that you can use it directly from SSRS…  

  4. irenak says:

    Post your code and I’ll take a look…

  5. Eric says:

    Thanks. Probably a stupid mistake (I am much more fluent in SQL than C#)

    here goes:

           [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None, IsPrecise = true)]

           [return: SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = true)]

           public static SqlString RTF2Text([SqlFacet(MaxSize = -1, IsFixedLength = false, IsNullable = true)] SqlString input)

           {

               if (input.IsNull)

                   return SqlString.Null;

               else

               {

                   String rtf;

                   rtf = input.Value.ToString();

                   return (SqlString)Regex.Replace(Regex.Match(rtf, @"x5cviewkind4[^ ]*(.+)x5cpar").Groups[1].Value, @"[nrf]|(x5cpar)|(x5c[a-zA-Z0-9]+)", "");

                   

               }

  6. irenak says:

    Try passing in the following RTF as input to the proc: ‘{rtf1ansiansicpg1252deff0deflang1033{fonttbl{f0fnilfcharset0 Microsoft Sans Serif;}}viewkind4uc1pardf0fs17 Testpar}’

    If it returns ‘Test’, then it’s functionning as intended.  This RegEx was designed to convert RTF created by .NET’s RichTextBox control to plain text.  It is likely that RTF from Microsoft Word would not work with this RegEx, so if that’s what you’re using in your tests, that would be the reason…

    Just FYI — there is no need to call ToString on input.Value since the returned value is of type String (no harm, but no need).

    IMPORANT: I would not recommend doing this kind of CPU intensive processing on the SQL Server; consider putting it on your application tier instead.

  7. Eric says:

    Your example works. Thanks.

    Indeed I was testing with rtf produced by Office, but intended to use it .NET’s RichTextBox control. And the application tier will do the work most of the time, but having it server side too is sometimes useful. A mirror server is often used for reports and a view can expose raw text if necessary for 3rd party reporting tools. Too bad there is no standard control in VS to support html. Today, it would a much better choice to store xhtml fragments than rtf.

    Thanks again for your help

  8. Michel Ulens says:

    Hi

    this expression returns empty… could you check with Scott what’s happening ?

    {rtf1ansideff0{fonttbl{f0fnilfcharset0 MS Sans Serif;}{f1fnil MS Sans Serif;}}

    {colortbl ;red0green0blue0;}

    viewkind4uc1pardcf1lang2067f0fs16 RENOVEREN BRIEVENBUS – BEKLEDING BRIEVENBUSKLEP IS WEG

    par VERVANGEN KLEP, ONTROESTEN EN HERSPUITEN VAN DE GEVERFDE METALEN DELEN

    par SCHOONMAKEN VAN DE INOX BUITENPLAAT

    par ZIE FOTO’Sf1

    par }