Understanding Chinese Filename Encoding Issues in Linux
When working with files from different encoding environments (such as Windows systems that often use GBK encoding) on Linux, you may encounter Chinese filenames displayed as garbled characters. This typically occurs due to mismatched character encodings between the file system, the files' origin, and your terminal.
Solution: Convert Filename Encoding with convmv
The convmv command-line tool is designed specifically to convert the encoding of filenames. It does not alter file contents, only the names themselves.
Step 1: Install convmv
On Debian/Ubuntu-based systems, use apt:
sudo apt update
sudo apt install convmv
For other distributions, use your package manager (e.g., yum, dnf, or pacman).
Step 2: Convert Filename Encoding
After installation, use the following command to convert filenames from GBK to UTF-8 in a specified directory:
convmv -f gbk -t utf-8 -r --notest /path/to/directory
Parameter breakdown:
-f gbk: Source encoding (GBK).-t utf-8: Target encoding (UTF-8).-r: Recursively process subdirectories.--notest: Perform the actual conversion. Omitting this flag runs a dry-run simulation only./path/to/directory: Replace with your target path (e.g.,/var/www).
Important Notes and Precautions
- Backup: Always back up important directories before running
convmv. - Permissions: Use
sudoif the directory requires elevated privileges. - Filename Only:
convmvconverts filenames only, not file contents. For content conversion, useiconv. - Verify Encoding: Ensure the source encoding is correct; an incorrect guess will produce garbled or broken filenames.
Additional Tools and Tips
- Terminal/Shell Settings: Configure your terminal emulator and shell environment variables (
LANG,LC_ALL) to use UTF-8 (e.g.,en_US.UTF-8orzh_CN.UTF-8). - iconv for Content: Convert file content encoding with:
iconv -f gbk -t utf-8 file.txt > file_utf8.txt.
Following these steps should resolve most Chinese filename garbling issues caused by encoding mismatches in Linux.