How to correctly open and save CSV in Excel | UTF-8 BOM support to prevent character corruption
「When I exported a CSV file from the system and opened it in Excel, Japanese characters became garbled」——this problem occurs daily in the field of web development, data analysis, and business systems. The root cause is often the difference in character encoding between UTF-8 and Shift_JIS (CP932). This article systematically explains the fundamental cause of character garbling, solutions using UTF-8 BOM CSV, Excel import procedures, implementation methods in PHP and Python, and line code issues.
Issue: Excel Expects Shift_JIS by Default
When opening CSV files by double-clicking in Excel for Windows, the encoding is automatically detected based on the OS locale settings. On Japanese Windows, this default encoding is <strong>Shift_JIS (code page 932)</strong>.
For this reason, when you double-click a CSV saved in UTF-8 to open it directly, multibyte characters (such as Japanese) are not interpreted correctly, resulting in garbled text. This is why the problem of "a non-engineer opening a UTF-8 CSV generated by an engineer in a web system with Excel and encountering garbled text" repeatedly occurs.
The correspondence of character encodings can be organized as follows.
| Character Encoding | Alias | Handling in Excel | Handling on Web and Linux |
|---|---|---|---|
| Shift_JIS | CP932、Windows-31J | Standard in Japanese version | Legacy. Not recommended for use |
| UTF-8 (without BOM) | — | Character corruption (older version) | Web Standards |
| UTF-8 with BOM | UTF-8 with BOM | Recognized correctly | BOM Can Cause Issues as Extra Characters |
| UTF-16 LE with BOM | — | Recognized correctly | Rarely used |
How to prevent character encoding issues with UTF-8 BOM CSV
BOM (Byte Order Mark) is a special byte sequence added at the beginning of a file, serving as a signature to identify the character encoding of a text file. The UTF-8 BOM is 3 bytes: <code>EF BB BF</code> (in hexadecimal).
Excel detects this BOM when opening a file and determines "this file is written in UTF-8". This is why <strong>using UTF-8 with BOM CSV prevents garbled text in Excel</strong>.
To verify the difference between UTF-8 with BOM and UTF-8 without BOM, check the file header bytes using a text editor or the <code>hexdump</code> command.
# Linux / macOS での確認
hexdump -C sample.csv | head -1
# BOM なし UTF-8 の出力例:
# 00000000 e5 90 8d e5 89 8d 2c e5 ...
# BOM 付き UTF-8 の出力例:
# 00000000 ef bb bf e5 90 8d e5 89 ...(先頭に ef bb bf)
Steps to Import from "Data" → "Text File" in Excel
To open UTF-8 CSV files without BOM correctly in Excel, use the "Import Data" feature instead of double-clicking. The steps vary by version, but the following procedure works for Excel 2016 and later (including Microsoft 365).
- Launch Excel and open a new workbook.
- Click the 「Data」 tab → 「From Text or CSV」.
- Select the target CSV file and click <strong>Import</strong>.
- A preview screen is displayed. Select <strong>65001: Unicode (UTF-8)</strong> from the "File origin" dropdown.
- Verify that the delimiter is set to 「comma」 and click 「Load」.
For Excel 2013 and earlier, use "Data" → "Import External Data" → "Text File", and change the character encoding to UTF-8 (65001) in step 2 of the "Text File Wizard".
Code to output CSV with UTF-8 BOM in PHP
When allowing CSV downloads from a web system, outputting a UTF-8 CSV with BOM in PHP is the simplest solution.
// UTF-8 BOM 付き CSV のダウンロード出力
function outputCsvWithBom(array $headers, array $rows, string $filename = 'export.csv'): void
{
// キャッシュ無効化・ダウンロードヘッダーを設定
header('Content-Type: text/csv; charset=UTF-8');
header('Content-Disposition: attachment; filename="' . $filename . '"');
header('Cache-Control: no-cache, no-store, must-revalidate');
$output = fopen('php://output', 'w');
// UTF-8 BOM を出力(EF BB BF)
fputs($output, "\xEF\xBB\xBF");
// ヘッダー行を出力
fputcsv($output, $headers);
// データ行を出力
foreach ($rows as $row) {
fputcsv($output, $row);
}
fclose($output);
exit;
}
// 使用例
$headers = ['名前', 'メールアドレス', '登録日'];
$rows = [
['山田 太郎', 'taro@example.com', '2026-04-14'],
['鈴木 花子', 'hanako@example.com', '2026-04-13'],
];
outputCsvWithBom($headers, $rows, 'users_' . date('Ymd') . '.csv');
Note that <code>fputcsv()</code> uses comma-separated values and double-quote escaping by default. If you want to change the delimiter, you can specify it with the third argument (for example, for tab-separated values use <code>"\t"</code>).
Character encoding conversion code in Python
Python is convenient for converting the character encoding of existing CSV files or converting data received in Shift_JIS to UTF-8.
import csv
import codecs
# BOM なし UTF-8 → BOM 付き UTF-8 に変換して保存
def add_bom_to_utf8_csv(input_path: str, output_path: str) -> None:
with open(input_path, 'r', encoding='utf-8') as infile:
content = infile.read()
with open(output_path, 'w', encoding='utf-8-sig') as outfile:
# 'utf-8-sig' は自動的に BOM を付加する
outfile.write(content)
# Shift_JIS CSV → UTF-8 BOM 付き CSV に変換
def convert_sjis_to_utf8_bom(input_path: str, output_path: str) -> None:
with open(input_path, 'r', encoding='shift_jis', errors='replace') as infile:
rows = list(csv.reader(infile))
with open(output_path, 'w', encoding='utf-8-sig', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(rows)
# 文字コードを自動検出して変換(chardet を使用)
# pip install chardet
import chardet
def detect_and_convert(input_path: str, output_path: str) -> None:
with open(input_path, 'rb') as f:
raw_data = f.read()
detected = chardet.detect(raw_data)
encoding = detected['encoding'] or 'utf-8'
print(f'検出した文字コード: {encoding}(信頼度: {detected["confidence"]*100:.0f}%)')
content = raw_data.decode(encoding, errors='replace')
with open(output_path, 'w', encoding='utf-8-sig', newline='') as outfile:
outfile.write(content)
When specifying <code>encoding='utf-8-sig'</code> in Python's <code>open()</code>, the BOM is automatically added when writing and automatically removed when reading.
Notes on Excel for macOS
Excel for macOS (Microsoft 365 for Mac) has behavioral differences compared to the Windows version.
- <strong>CSV with UTF-8 BOM</strong>: Opens correctly with a double-click even in Excel for macOS (from relatively recent versions).
- <strong>Older versions (Excel 2016 for Mac, etc.)</strong>: Character corruption can occur even with UTF-8 BOM. In such cases, use the "Import Data" feature
- <strong>Select CSV in "Save As"</strong>: In macOS versions, it may be saved without UTF-8 BOM. CSV files passed to Windows environments require re-verification.
- <strong>Mixed use with Numbers.app</strong>: macOS Numbers treats UTF-8 as the standard, but compatibility with Excel requires caution.
Line Break Code Issues (<code>CRLF</code> vs <code>LF</code>)
Along with garbled characters in CSV, differences in line endings can also be problematic.
| Line Break Code | Byte String | Main environments | Handling in CSV |
|---|---|---|---|
| CRLF | 0D 0A |
Windows、HTTP | Standard defined in RFC 4180 |
| LF | 0A |
Linux, macOS, Git Default | Works in most cases with Excel |
| CR | 0D |
Legacy macOS (9 and earlier) | It may only be a problem with older versions of Excel |
While RFC 4180, the standard specification for CSV, specifies CRLF, modern Excel handles CSV files with only LF without issues. However, when CSV contains line breaks within fields, the handling of line endings becomes important. PHP's <code>fputcsv()</code> uses LF by default, but if Windows compatibility is important, consider converting with <code>str_replace("\n", "\r\n", $output)</code> after output.
// PHP で CRLF 改行の CSV を出力する方法
function outputCsvCrlfWithBom(array $headers, array $rows, string $filename = 'export.csv'): void
{
header('Content-Type: text/csv; charset=UTF-8');
header('Content-Disposition: attachment; filename="' . $filename . '"');
// 一旦バッファに書き出して CRLF に変換する
ob_start();
$output = fopen('php://output', 'w');
fputs($output, "\xEF\xBB\xBF");
fputcsv($output, $headers);
foreach ($rows as $row) {
fputcsv($output, $row);
}
fclose($output);
$csv = ob_get_clean();
// LF を CRLF に変換(すでに CRLF になっているものは除外)
$csv = str_replace(["\r\n", "\n"], "\r\n", $csv);
echo $csv;
exit;
}
Summary: Best practices for distributing CSV files without character corruption
- Output CSV with UTF-8 BOM when distributing to Excel users
- In PHP, add BOM at the beginning using <code>fputs($output, "\xEF\xBB\xBF")</code>
- In Python, specify <code>encoding='utf-8-sig'</code>
- To check the character encoding of a received CSV, use the <code>chardet</code> or <code>file</code> command
- When providing UTF-8 CSV without BOM to Excel users, guide them through the "Import Data" process.
- <code>CRLF</code> is the RFC standard for line break codes, but modern Excel also accepts <code>LF</code>
- Excel for macOS may behave differently; it is recommended to verify operation on the receiving end
Test files for this article
- <a href="/ja/files/encoding/" class="text-primary-600 dark:text-primary-400 hover:underline">Character Encoding Test Files</a> — Samples of various encodings including UTF-8 with/without BOM, Shift_JIS, and more
- <a href="/ja/files/csv/" class="text-primary-600 dark:text-primary-400 hover:underline">Test CSV files list</a> — Sample combinations of line endings, character encoding, and BOM presence/absence
- <a href="/ja/files/newline/" class="text-primary-600 dark:text-primary-400 hover:underline">Newline Code Test File List</a> — Check CRLF / LF / CR patterns
Related articles
- <a href="/ja/blog/csv-encoding-trouble-guide/" class="text-primary-600 dark:text-primary-400 hover:underline">Completely Solve CSV Character Encoding Issues: Fundamentals of Character Codes, BOM, and Line Breaks</a>
- <a href="/ja/blog/file-format-quick-reference/" class="text-primary-600 dark:text-primary-400 hover:underline">File Format Quick Reference for Developers</a>
- <a href="/ja/blog/base64-size-increase/" class="text-primary-600 dark:text-primary-400 hover:underline">Why Base64 Encoding Increases File Size by 33%</a>