Skip to content

How to correctly open and save CSV in Excel | UTF-8 BOM support to prevent character corruption

Category: Data Processing / Excel
This article is currently available in Japanese only. We are working on translations.

「When I exported a CSV file from the system and opened it in Excel, Japanese characters became garbled」——this problem occurs daily in the field of web development, data analysis, and business systems. The root cause is often the difference in character encoding between UTF-8 and Shift_JIS (CP932). This article systematically explains the fundamental cause of character garbling, solutions using UTF-8 BOM CSV, Excel import procedures, implementation methods in PHP and Python, and line code issues.

Excel and CSV mojibake resolution flow UTF-8 CSV Web export BOM? EF BB BF Mojibake 縺ゅ>縺・▲ ... Correct display あいうえお No Yes Solutions 1. Output UTF-8 with BOM (3 bytes prefix) 2. Use Excel data import wizard 3. Convert to Shift_JIS (CP932) (legacy) Mac Excel BOM recognition unstable — UTF-8 LF recommended
Fig 1: Mojibake detection and resolution path when opening UTF-8 CSV in Excel

Issue: Excel Expects Shift_JIS by Default

When opening CSV files by double-clicking in Excel for Windows, the encoding is automatically detected based on the OS locale settings. On Japanese Windows, this default encoding is <strong>Shift_JIS (code page 932)</strong>.

For this reason, when you double-click a CSV saved in UTF-8 to open it directly, multibyte characters (such as Japanese) are not interpreted correctly, resulting in garbled text. This is why the problem of "a non-engineer opening a UTF-8 CSV generated by an engineer in a web system with Excel and encountering garbled text" repeatedly occurs.

The correspondence of character encodings can be organized as follows.

Character Encoding Alias Handling in Excel Handling on Web and Linux
Shift_JIS CP932、Windows-31J Standard in Japanese version Legacy. Not recommended for use
UTF-8 (without BOM) Character corruption (older version) Web Standards
UTF-8 with BOM UTF-8 with BOM Recognized correctly BOM Can Cause Issues as Extra Characters
UTF-16 LE with BOM Recognized correctly Rarely used

How to prevent character encoding issues with UTF-8 BOM CSV

BOM (Byte Order Mark) is a special byte sequence added at the beginning of a file, serving as a signature to identify the character encoding of a text file. The UTF-8 BOM is 3 bytes: <code>EF BB BF</code> (in hexadecimal).

Excel detects this BOM when opening a file and determines "this file is written in UTF-8". This is why <strong>using UTF-8 with BOM CSV prevents garbled text in Excel</strong>.

To verify the difference between UTF-8 with BOM and UTF-8 without BOM, check the file header bytes using a text editor or the <code>hexdump</code> command.

# Linux / macOS での確認
hexdump -C sample.csv | head -1

# BOM なし UTF-8 の出力例:
# 00000000  e5 90 8d e5 89 8d 2c e5  ...
# BOM 付き UTF-8 の出力例:
# 00000000  ef bb bf e5 90 8d e5 89  ...(先頭に ef bb bf)

Steps to Import from "Data" → "Text File" in Excel

To open UTF-8 CSV files without BOM correctly in Excel, use the "Import Data" feature instead of double-clicking. The steps vary by version, but the following procedure works for Excel 2016 and later (including Microsoft 365).

  1. Launch Excel and open a new workbook.
  2. Click the 「Data」 tab → 「From Text or CSV」.
  3. Select the target CSV file and click <strong>Import</strong>.
  4. A preview screen is displayed. Select <strong>65001: Unicode (UTF-8)</strong> from the "File origin" dropdown.
  5. Verify that the delimiter is set to 「comma」 and click 「Load」.

For Excel 2013 and earlier, use "Data" → "Import External Data" → "Text File", and change the character encoding to UTF-8 (65001) in step 2 of the "Text File Wizard".

Code to output CSV with UTF-8 BOM in PHP

When allowing CSV downloads from a web system, outputting a UTF-8 CSV with BOM in PHP is the simplest solution.

// UTF-8 BOM 付き CSV のダウンロード出力
function outputCsvWithBom(array $headers, array $rows, string $filename = 'export.csv'): void
{
    // キャッシュ無効化・ダウンロードヘッダーを設定
    header('Content-Type: text/csv; charset=UTF-8');
    header('Content-Disposition: attachment; filename="' . $filename . '"');
    header('Cache-Control: no-cache, no-store, must-revalidate');

    $output = fopen('php://output', 'w');

    // UTF-8 BOM を出力(EF BB BF)
    fputs($output, "\xEF\xBB\xBF");

    // ヘッダー行を出力
    fputcsv($output, $headers);

    // データ行を出力
    foreach ($rows as $row) {
        fputcsv($output, $row);
    }

    fclose($output);
    exit;
}

// 使用例
$headers = ['名前', 'メールアドレス', '登録日'];
$rows = [
    ['山田 太郎', 'taro@example.com', '2026-04-14'],
    ['鈴木 花子', 'hanako@example.com', '2026-04-13'],
];

outputCsvWithBom($headers, $rows, 'users_' . date('Ymd') . '.csv');

Note that <code>fputcsv()</code> uses comma-separated values and double-quote escaping by default. If you want to change the delimiter, you can specify it with the third argument (for example, for tab-separated values use <code>"\t"</code>).

Character encoding conversion code in Python

Python is convenient for converting the character encoding of existing CSV files or converting data received in Shift_JIS to UTF-8.

import csv
import codecs

# BOM なし UTF-8 → BOM 付き UTF-8 に変換して保存
def add_bom_to_utf8_csv(input_path: str, output_path: str) -> None:
    with open(input_path, 'r', encoding='utf-8') as infile:
        content = infile.read()

    with open(output_path, 'w', encoding='utf-8-sig') as outfile:
        # 'utf-8-sig' は自動的に BOM を付加する
        outfile.write(content)


# Shift_JIS CSV → UTF-8 BOM 付き CSV に変換
def convert_sjis_to_utf8_bom(input_path: str, output_path: str) -> None:
    with open(input_path, 'r', encoding='shift_jis', errors='replace') as infile:
        rows = list(csv.reader(infile))

    with open(output_path, 'w', encoding='utf-8-sig', newline='') as outfile:
        writer = csv.writer(outfile)
        writer.writerows(rows)


# 文字コードを自動検出して変換(chardet を使用)
# pip install chardet
import chardet

def detect_and_convert(input_path: str, output_path: str) -> None:
    with open(input_path, 'rb') as f:
        raw_data = f.read()

    detected = chardet.detect(raw_data)
    encoding = detected['encoding'] or 'utf-8'
    print(f'検出した文字コード: {encoding}(信頼度: {detected["confidence"]*100:.0f}%)')

    content = raw_data.decode(encoding, errors='replace')

    with open(output_path, 'w', encoding='utf-8-sig', newline='') as outfile:
        outfile.write(content)

When specifying <code>encoding='utf-8-sig'</code> in Python's <code>open()</code>, the BOM is automatically added when writing and automatically removed when reading.

Notes on Excel for macOS

Excel for macOS (Microsoft 365 for Mac) has behavioral differences compared to the Windows version.

  • <strong>CSV with UTF-8 BOM</strong>: Opens correctly with a double-click even in Excel for macOS (from relatively recent versions).
  • <strong>Older versions (Excel 2016 for Mac, etc.)</strong>: Character corruption can occur even with UTF-8 BOM. In such cases, use the "Import Data" feature
  • <strong>Select CSV in "Save As"</strong>: In macOS versions, it may be saved without UTF-8 BOM. CSV files passed to Windows environments require re-verification.
  • <strong>Mixed use with Numbers.app</strong>: macOS Numbers treats UTF-8 as the standard, but compatibility with Excel requires caution.

Line Break Code Issues (<code>CRLF</code> vs <code>LF</code>)

Along with garbled characters in CSV, differences in line endings can also be problematic.

Line Break Code Byte String Main environments Handling in CSV
CRLF 0D 0A Windows、HTTP Standard defined in RFC 4180
LF 0A Linux, macOS, Git Default Works in most cases with Excel
CR 0D Legacy macOS (9 and earlier) It may only be a problem with older versions of Excel

While RFC 4180, the standard specification for CSV, specifies CRLF, modern Excel handles CSV files with only LF without issues. However, when CSV contains line breaks within fields, the handling of line endings becomes important. PHP's <code>fputcsv()</code> uses LF by default, but if Windows compatibility is important, consider converting with <code>str_replace("\n", "\r\n", $output)</code> after output.

// PHP で CRLF 改行の CSV を出力する方法
function outputCsvCrlfWithBom(array $headers, array $rows, string $filename = 'export.csv'): void
{
    header('Content-Type: text/csv; charset=UTF-8');
    header('Content-Disposition: attachment; filename="' . $filename . '"');

    // 一旦バッファに書き出して CRLF に変換する
    ob_start();
    $output = fopen('php://output', 'w');
    fputs($output, "\xEF\xBB\xBF");
    fputcsv($output, $headers);
    foreach ($rows as $row) {
        fputcsv($output, $row);
    }
    fclose($output);
    $csv = ob_get_clean();

    // LF を CRLF に変換(すでに CRLF になっているものは除外)
    $csv = str_replace(["\r\n", "\n"], "\r\n", $csv);

    echo $csv;
    exit;
}

Summary: Best practices for distributing CSV files without character corruption

  • Output CSV with UTF-8 BOM when distributing to Excel users
  • In PHP, add BOM at the beginning using <code>fputs($output, "\xEF\xBB\xBF")</code>
  • In Python, specify <code>encoding='utf-8-sig'</code>
  • To check the character encoding of a received CSV, use the <code>chardet</code> or <code>file</code> command
  • When providing UTF-8 CSV without BOM to Excel users, guide them through the "Import Data" process.
  • <code>CRLF</code> is the RFC standard for line break codes, but modern Excel also accepts <code>LF</code>
  • Excel for macOS may behave differently; it is recommended to verify operation on the receiving end

Test files for this article

  • <a href="/ja/files/encoding/" class="text-primary-600 dark:text-primary-400 hover:underline">Character Encoding Test Files</a> — Samples of various encodings including UTF-8 with/without BOM, Shift_JIS, and more
  • <a href="/ja/files/csv/" class="text-primary-600 dark:text-primary-400 hover:underline">Test CSV files list</a> — Sample combinations of line endings, character encoding, and BOM presence/absence
  • <a href="/ja/files/newline/" class="text-primary-600 dark:text-primary-400 hover:underline">Newline Code Test File List</a> — Check CRLF / LF / CR patterns

Related articles

  • <a href="/ja/blog/csv-encoding-trouble-guide/" class="text-primary-600 dark:text-primary-400 hover:underline">Completely Solve CSV Character Encoding Issues: Fundamentals of Character Codes, BOM, and Line Breaks</a>
  • <a href="/ja/blog/file-format-quick-reference/" class="text-primary-600 dark:text-primary-400 hover:underline">File Format Quick Reference for Developers</a>
  • <a href="/ja/blog/base64-size-increase/" class="text-primary-600 dark:text-primary-400 hover:underline">Why Base64 Encoding Increases File Size by 33%</a>

Frequently Asked Questions

Why does CSV get corrupted when opened in Excel?

Excel reads files as Shift_JIS (Windows environment) by default, causing UTF-8 CSV to become corrupted. This is resolved by saving with UTF-8 BOM.

What is BOM (Byte Order Mark)?

A 3-byte mark (EF BB BF) prepended to the beginning of a file tells software that it is UTF-8. This is necessary for Excel to correctly recognize UTF-8.

How to correctly open CSV in Excel for macOS?

Even in Excel for macOS, it is reliable to use Text File Import from the Data tab and specify UTF-8 as the character encoding.

📚 Reference