$OutputEncoding to the rescue

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.


For example:


Let’s create a text file with some Chinese characters in it.


PS C:\> ${c:\test.txt}=”中文


Try to use findstr to find  one of the Chinese characters, and it did not find anything.


PS C:\> Get-Content test.txt | findstr /c:


The same command works in Cmd.exe.


PS C:\> cmd /c “findstr /c: test.txt”


中文


 


What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.


PS C:\> $OutputEncoding


 


 


IsSingleByte      : True


BodyName          : us-ascii


EncodingName      : US-ASCII


HeaderName        : us-ascii


WebName           : us-ascii


WindowsCodePage   : 1252


IsBrowserDisplay  : False


IsBrowserSave     : False


IsMailNewsDisplay : True


IsMailNewsSave    : True


EncoderFallback   : System.Text.EncoderReplacementFallback


DecoderFallback   : System.Text.DecoderReplacementFallback


IsReadOnly        : True


CodePage          : 20127


 


 


 


PS C:\> $OutputEncoding = [Console]::OutputEncoding


PS C:\> $OutputEncoding


 


 


BodyName          : gb2312


EncodingName      : 体中文(GB2312)


HeaderName        : gb2312


WebName           : gb2312


WindowsCodePage   : 936


IsBrowserDisplay  : True


IsBrowserSave     : True


IsMailNewsDisplay : True


IsMailNewsSave    : True


IsSingleByte      : False


EncoderFallback   : System.Text.InternalEncoderBestFitFallback


DecoderFallback   : System.Text.InternalDecoderBestFitFallback


IsReadOnly        : True


CodePage          : 936


 


 


 


PS C:\> Get-Content test.txt | findstr /c:


中文


 


Voila! Now findstr works!


 


Wei Wu [MSFT]


 


 


 



POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don’t. 


 


Jeffrey Snover