em dash, en dash, dash, dash, dash…


Some people have noticed that you can paste examples out of Word documents directly into a PowerShell session. Given all of the typographic tricks that Word does, this is actually much harder than it sounds. Here’s what we do. There’s a piece of code in the interpreter that takes each of the possible characters and maps it into the canonical representation for that character. So – an em-dash ([char] 0x2014) or an en-dash ([char] 0x2013) become a simple dash (0x02d). There are also predicate functions that return true it the character is a single quote, double quote or a dash. The code is (approximately):


 


public const char enDash = (char)0x2013;


public const char emDash = (char)0x2014;


public const char horizontalBar = (char)0x2015;


// left single quotation mark


public const char quoteSingleLeft = (char)0x2018;


// right single quotation mark


public const char quoteSingleRight = (char)0x2019; 


// single low-9 quotation mark


public const char quoteSingleBase = (char)0x201a;  


// single high-reversed-9 quotation mark   


public const char quoteReversed = (char)0x201b; 


// left double quotation mark


public const char quoteDoubleLeft = (char)0x201c;  


// right double quotation mark


public const char quoteDoubleRight = (char)0x201d; 


// low double left quote used in german.


public const char quoteLowDoubleLeft = (char)0x201E;


 


public static bool IsDash(char c)


{


    return (c == enDash || c == emDash || c == horizontalBar ||


        c == ‘-‘);


}


public static bool IsSingleQuote(char c)


{


    return (c == quoteSingleLeft || c == quoteSingleRight ||


        c == quoteSingleBase || c == quoteReversed || c == ‘\”);


}


public static bool IsDoubleQuote(char c)


{


    return (c == ‘”‘ || c == quoteDoubleLeft ||


        c == quoteDoubleRight || c == quoteLowDoubleLeft);


}


public static bool IsQuote(char c)


{


    return (IsSingleQuote(c) || IsDoubleQuote(c));


}


 


Of course it’s not just Word that we want to support. We want to provide reasonable support for arbitrary applications (within the limitations of the console host for now) so if anyone sees anything we missed, please let me know.


 


Now, for the trivia folks in the audience who want to know what an en is, from encarta:


em dash (plural em dash·es)
noun 
Definition:
long dash: in printing, a dash that is one em long
 
en dash (plural en dash·es)
noun 
Definition:
dash one en long: in printing, a dash that is one en in length
 
en [ en ] (plural ens)
noun 
Definition:
measure of printing width: a measure of printing width, half that of an em


em [ em ] (plural ems)
noun 
Definition:
1. variable measure of type: a unit of measurement of print size, equal to the point size of the typeface being used
2. printing 
Same as  pica
 
[Late 18th century. Representing pronunciation of m because the letter is about this width]



-bruce


 


Bruce Payette


PowerShell Technical Lead


 


PSMDTAG:FAQ: Can I cut-n-paste examples from WORD documents?
PSMDTAG:PARSER: (em dash, en dash, dash) handling