Blog / Linux/ How to Check File Encoding and Fix Chinese Garbled Text with iconv on Linux

How to Check File Encoding and Fix Chinese Garbled Text with iconv on Linux

Linux 系统下查看文件编码与使用 iconv 解决中文乱码

Checking File Encoding in Linux

To determine the character encoding of a file, use the file command with the -bi (brief, mime-type) option. For example, to check the encoding of example.txt and extract just the charset name in uppercase:

file -bi example.txt | sed -e 's/.*[ ]charset=//' | tr '[a-z]' '[A-Z]'

If the output is something like ISO-8859-1, the file is encoded in the ISO-8859-1 (Latin-1) character set.

Verifying Encoding Support in iconv

Before converting, verify that iconv supports the source encoding. List all supported encodings and filter for the one you need:

iconv -l | grep ISO-8859-1

If the output includes ISO-8859-1//, the encoding is supported for conversion.

Checking the System Locale Encoding

Knowing your system's default locale encoding helps choose a target encoding. Check it with:

echo $LANG

An output like zh_CN.UTF-8 indicates the UTF-8 encoding is in use. For Chinese environments, converting files to UTF-8 is often the best solution to prevent garbled text.

Converting File Encoding with iconv

The basic syntax for iconv is:

iconv -f SOURCE_ENCODING -t TARGET_ENCODING input_file

For example, to convert example.txt from ISO-8859-1 to UTF-8:

iconv -f ISO-8859-1 -t UTF-8 example.txt

This prints the converted content to the terminal. To save the output to a new file, use output redirection:

iconv -f ISO-8859-1 -t UTF-8 example.txt > example_utf8.txt

Common Issues and Notes

  • Command Correction: The original command tr '[a-z]' '[A-Z' had a mismatched bracket. It has been corrected to tr '[a-z]' '[A-Z]'.
  • Garbled Text: Chinese text often appears garbled when the file's storage encoding (e.g., GBK, GB2312) differs from the encoding used by the system or viewing tool (e.g., UTF-8). Converting to a consistent UTF-8 encoding via iconv is a standard fix.
  • Encoding Detection: The file -bi command is not 100% accurate, especially for plain text files. If garbled text persists after conversion, try other source encodings like GB18030 or GBK.
  • Batch Conversion: To convert multiple files, combine find with a shell loop script.

Post a Comment

Your email will not be published. Required fields are marked with *.