$OutputEncoding to the rescue

PowerShell Team

You might have noticed that “findstr” does not work properly with non-English text in PowerShell.

For example:

Let’s create a text file with some Chinese characters in it.

PS C:\> ${c:\test.txt}=”中文

Try to use findstr to find  one of the Chinese characters, and it did not find anything.

PS C:\> Get-Content test.txt | findstr /c:

The same command works in Cmd.exe.

PS C:\> cmd /c “findstr /c: test.txt”

中文

 

What went wrong? When we pipe output data from PowerShell cmdlets into native applications, the output encoding from PowerShell cmdlets is controlled by the $OutputEncoding variable, which is by default set to ASCII. We can fix the afore-mentioned scenario by changing $OutputEncoding to [Console]::OutputEncoding.

PS C:\> $OutputEncoding

 

 

IsSingleByte      : True

BodyName          : us-ascii

EncodingName      : US-ASCII

HeaderName        : us-ascii

WebName           : us-ascii

WindowsCodePage   : 1252

IsBrowserDisplay  : False

IsBrowserSave     : False

IsMailNewsDisplay : True

IsMailNewsSave    : True

EncoderFallback   : System.Text.EncoderReplacementFallback

DecoderFallback   : System.Text.DecoderReplacementFallback

IsReadOnly        : True

CodePage          : 20127

 

 

 

PS C:\> $OutputEncoding = [Console]::OutputEncoding

PS C:\> $OutputEncoding

 

 

BodyName          : gb2312

EncodingName      : 体中文(GB2312)

HeaderName        : gb2312

WebName           : gb2312

WindowsCodePage   : 936

IsBrowserDisplay  : True

IsBrowserSave     : True

IsMailNewsDisplay : True

IsMailNewsSave    : True

IsSingleByte      : False

EncoderFallback   : System.Text.InternalEncoderBestFitFallback

DecoderFallback   : System.Text.InternalDecoderBestFitFallback

IsReadOnly        : True

CodePage          : 936

 

 

 

PS C:\> Get-Content test.txt | findstr /c:

中文

 

Voila! Now findstr works!

 

Wei Wu [MSFT]

 

 

 

POSTSCRIPT:  The reason we convert to ASCII when piping to existing executables is that most commands today do not process UNICODE correctly.  Some do, most don’t. 

 

Jeffrey Snover

0 comments

Discussion is closed.

Feedback usabilla icon