Why accented characters break after a round trip through Excel

Data & Database Workflows

Apr 11, 2026·By Elysiate·Updated Apr 11, 2026·

csvdatadata-pipelinesencodingexcelutf-8

·

Level: intermediate · ~13 min read · Intent: informational

Audience: Developers, Data analysts, Ops engineers, Technical teams

Prerequisites

Basic familiarity with CSV files
Optional: SQL or ETL concepts

Key takeaways

Accented characters usually break after an Excel round trip because the same bytes were opened or saved through different encoding assumptions, not because accents are inherently unsafe in CSV.
Excel’s CSV behavior is not one single path. Opening UTF-8 correctly, importing through Get Data, and saving back out as a specific CSV variant can produce different encoding results.
The safest workflow is to preserve the original file, import with explicit encoding, and re-export deliberately rather than opening, editing, and overwriting the only copy.
A file can be structurally valid CSV and still be text-corrupted for downstream systems if one tool decoded it as UTF-8 and another consumed it as Windows-1252 or a Windows-compatible CSV flavor.

References

FAQ

Why do accented characters break after Excel touches the CSV?: Usually because Excel opened or saved the file using one encoding path and the next consumer used another. The common result is mojibake like Ã© instead of é.
Does UTF-8 BOM matter for Excel?: Yes. Microsoft’s current guidance says UTF-8 CSV files can be opened normally in Excel when they were saved with a BOM, otherwise import through Data or the text import flow is safer.
Should I just open the CSV directly in Excel and save it again?: Usually no. That can change both encoding and delimiter behavior, and it can destroy the original evidence about what bytes were there before the round trip.
What is the safest recovery workflow?: Keep the original file, identify the actual source encoding, import with explicit settings, and re-export deliberately in the exact encoding your downstream consumer expects.
What is the biggest mistake teams make?: Treating Excel as a neutral editor for CSV. It is a spreadsheet product with import and save behaviors that can reinterpret text, delimiters, and formats.

0

This guide tackles Why accented characters break after a round trip through Excel—a problem that looks like random text corruption but is usually much more specific:

the bytes were valid, and one step in the workflow read or wrote them with the wrong encoding assumptions.

That is why the same file can:

look correct in Excel
look broken in a database load
look broken in a browser preview
or show classic mojibake like FranÃ§ois where François should be

The issue is rarely “accented characters are unsafe.” The issue is usually:

which encoding the file started with
how Excel opened it
what format Excel saved it back out as
and what the next system expected to read

Once you frame it that way, the failures become much easier to diagnose.

Who this is for

This article is for teams who pass CSV through spreadsheet workflows even though the file still feeds a system later:

engineers supporting imports and backfills
analysts doing quick edits or reviews in Excel
ops teams debugging “it worked until someone opened it”
support teams investigating why names with accents broke after a handoff

If your CSV never leaves an automated pipeline, this may be a rare edge case. If someone regularly opens the file in Excel before it goes to a loader, warehouse, or app import, this is one of the most common text-fidelity incidents you will see.

The central problem: CSV is text, Excel is a product with encoding behavior

RFC 4180 gives a structural baseline for CSV:

rows
delimiters
quoted fields
optional header row

It says nothing about a universal encoding that every tool will choose the same way. That is exactly why accented-character failures happen after round trips: the structure can stay valid while the text gets reinterpreted. citeturn329462search0turn967399search0turn329462search3

Excel is not only “opening a text file.” It is deciding how to interpret that text and, later, how to write it back out.

That means this is not merely a CSV problem. It is a CSV plus Excel import/save semantics problem.

The most common round-trip failure

The classic pattern looks like this:

A source system exports UTF-8 text.
Someone opens the CSV directly in Excel.
Excel interprets or saves the file through a different path than the next system expects.
The next consumer reads those bytes differently.
Accented characters become mojibake or question marks.

Typical symptoms:

é becomes Ã©
François becomes FranÃ§ois
smart punctuation turns into â€™ or â€œ
some characters disappear or become replacement glyphs

These are not random artifacts. They are clues about which decode path was wrong.

Why UTF-8 BOM matters specifically in Excel

Microsoft’s current support article says a CSV file encoded with UTF-8 can be opened normally in Excel if it was saved with BOM. Otherwise, Microsoft recommends opening it through Get Data / From Text/CSV or the text import flow. citeturn329462search1

That is one of the most practical Excel-specific facts in this whole topic.

It means:

UTF-8 without BOM may still be fine as data
but Excel’s default open behavior is not guaranteed to interpret it the way you expect
so the same file can look right in one open path and wrong in another

This is why teams often say:

“the export was fine until someone opened it in Excel”

The real issue is:

which Excel import path was used

not merely “Excel bad” in the abstract.

Why save behavior matters just as much as open behavior

Microsoft’s “Save a workbook to text format (.txt or .csv)” page makes a more general point: when you save a workbook into text or CSV formats, Excel can strip formatting and other workbook features, and you must explicitly choose a file type. citeturn329462search0

That matters because the save format is part of the contract.

Excel does not have one universal “CSV save.” There can be different CSV variants depending on platform and version. Microsoft’s Excel for Mac file-format support page explicitly lists multiple export variants, including:

CSV UTF-8 (Comma delimited) (.csv)
Comma Separated Values (.csv)
Windows Comma Separated (.csv)
and several text encodings such as UTF-16 text citeturn329462search10

That is the core round-trip danger: someone may believe they “saved it back as CSV,” but not realize they changed:

encoding
compatibility flavor
or delimiter behavior

To a downstream pipeline, that is not “the same file.”

Why accented characters are especially exposed

ASCII-only text can survive many sloppy round trips because the basic characters overlap across common encodings.

Accented characters expose the mismatch faster.

The WHATWG Encoding Standard explains that labels like latin1, iso-8859-1, and even ascii are treated as aliases for windows-1252 in web-compatible software, which has historically confused developers. citeturn329462search3

That matters because once Excel or another tool saves or reopens a CSV in a Windows-compatible path, the next consumer may not interpret those bytes the same way the original exporter intended.

So accented characters are not the cause. They are often the first visible casualty of an encoding mismatch.

The second Excel-specific trap: locale and delimiter shifts

Encoding is the headline issue, but delimiters also shift in spreadsheet workflows.

The seed version you gave points at a real operational pattern: regional settings can change list separators and decimal symbols, and Excel can save with semicolon-style CSV behavior in some locales.

That means a round trip can change more than accents:

comma-separated becomes semicolon-separated
decimals become ambiguous
loaders see the wrong number of columns
and support blames “bad characters” when the file contract actually drifted in two ways at once

This is why the safest workflow checks:

encoding
delimiter
header row
and row counts not just text rendering.

Why Power Query is safer than double-click open

Microsoft’s own guidance points users to Get Data / From Text/CSV or the text import path when opening UTF-8 content that may not include BOM. citeturn329462search1turn329462search8

That is useful because it makes the import step explicit.

In practice, Power Query or import-wizard flows are safer than raw double-click open because they let you:

choose the delimiter
inspect the preview
and control the interpretation path instead of relying on Excel’s default open behavior

That does not make Excel a canonical CSV editor. But it makes it less dangerous than “open and hope.”

The structure may still be valid while the text is wrong

This is the most important distinction to keep in mind.

A file can still be:

valid CSV structurally
same row counts
same columns
same quotes
same delimiter

and yet the text inside can already be wrong for downstream systems.

That is why “the parser succeeds” does not prove “the round trip was harmless.” If the bytes now represent the wrong characters, the CSV can be perfectly valid and still corrupted semantically.

This is also why structural validation should be followed by:

encoding verification
and representative value checks on known accented samples

A practical workflow for diagnosing Excel round-trip damage

Use this when the team suspects Excel changed accented characters.

1. Preserve the original file

Save the untouched source export and compute a checksum. Do not let the “fixed in Excel” version overwrite your only artifact.

2. Compare the before and after bytes

If possible, compare:

file size
checksum
first few bytes for BOM
encoding metadata if your source provides it

If the post-Excel file differs, that is expected. The point is to know how it differs.

3. Test the open path

Ask:

Was it double-click opened?
Imported via Get Data?
Saved as CSV UTF-8?
Saved as a Windows-compatible CSV variant?
Opened on Mac or Windows?

Those details matter because Excel’s paths are not identical. citeturn329462search1turn329462search10turn329462search0

4. Look for mojibake signatures

Common signs:

Ã©
Ã±
â€™
â€œ
â€“

These usually suggest a UTF-8 / Windows-1252 mismatch, not random damage.

5. Validate delimiter and row shape too

If the same Excel round trip also changed delimiter expectations, you may have both:

text corruption
and column parsing problems

6. Re-export deliberately

Once the right import path is known, export in the exact encoding and CSV flavor the downstream system expects.

This is much better than repeated save-as guessing.

A safer round-trip pattern

If someone must touch the data in Excel, the least fragile pattern is:

Keep the original CSV unchanged.
Import into Excel through Data > From Text/CSV or equivalent, with explicit settings when needed. citeturn329462search1turn329462search8
Do the necessary review or edits.
Export deliberately using the correct target format.
Validate the exported file before sending it downstream.

Even better:

keep the spreadsheet editing workflow separate from the canonical machine-ingest artifact
and regenerate the final CSV from a controlled transform rather than trusting manual save behavior

Why database loaders expose the problem immediately

Database and warehouse loaders are much less forgiving than spreadsheet UIs.

They do not care that the text “looked fine on screen.” They care about:

bytes
encoding
delimiter
row width
and schema mapping

That is why a PostgreSQL COPY or DuckDB import can be the first place the issue becomes obvious: the loader is consuming text, not Excel’s formatted display of it. citeturn310433search1turn310433search2

So when support says:

“it looked fine in Excel”

the technical response should be:

“what bytes did Excel actually save, and what does the loader expect?”

That is the real diagnostic question.

The riskiest anti-patterns

Avoid these when accented-character incidents keep recurring.

1. Opening the only copy directly in Excel

This destroys the clean baseline you need for diagnosis.

2. Assuming “CSV is CSV”

Excel exposes multiple CSV/text save behaviors, and they are not interchangeable. citeturn329462search0turn329462search10

3. Treating UTF-8 without BOM as universally safe in Excel defaults

Microsoft’s own support guidance says BOM changes the safe default-open story. citeturn329462search1

4. Fixing mojibake by repeated manual resaves

That often compounds the confusion instead of revealing the source mismatch.

5. Blaming the warehouse first

The warehouse or database may only be the first strict consumer that exposed a problem already introduced earlier.

A decision framework for teams

When a stakeholder says “Excel broke the accents,” walk through this order:

Did the source export start as UTF-8, and was it documented?
Was the file opened directly or imported through an explicit text/CSV flow?
Was the resave format exactly the one the downstream pipeline expects?
Did delimiter or locale behavior also change?
Can the human edit step be replaced with a repeatable transform instead?

This keeps the conversation focused on contracts rather than blame.

Elysiate tools and topic hubs

The most relevant tools for this issue are:

They fit because accented-character issues often sit at the boundary between:

structural parsing
encoding diagnosis
and repeatable validation before database load

For safer transition workflows, the adjacent conversion tools are still useful:

These are especially useful when the team needs to compare a pre-Excel and post-Excel artifact without uploading sensitive production files to third-party services.

FAQ

Why do accented characters break after Excel touches the CSV?

Usually because Excel opened or saved the file using one encoding path and the next consumer used another. The common result is mojibake like Ã© instead of é.

Does UTF-8 BOM matter for Excel?

Yes. Microsoft’s current guidance says UTF-8 CSV files can be opened normally in Excel when they were saved with a BOM, otherwise import through Data or the text import flow is safer.

Should I just open the CSV directly in Excel and save it again?

Usually no. That can change both encoding and delimiter behavior, and it can destroy the original evidence about what bytes were there before the round trip.

What is the safest recovery workflow?

Keep the original file, identify the actual source encoding, import with explicit settings, and re-export deliberately in the exact encoding your downstream consumer expects.

What is the biggest mistake teams make?

Treating Excel as a neutral editor for CSV. It is a spreadsheet product with import and save behaviors that can reinterpret text, delimiters, and formats.

Final takeaway

Accented characters usually break after an Excel round trip because the file crossed an encoding boundary without the workflow making that boundary explicit.

The safest baseline is:

preserve the original export
know whether the source is UTF-8 and whether BOM matters for the target Excel path
import through explicit text/CSV flows instead of assuming default open is safe
re-export deliberately in the format your downstream consumer expects
and validate both text fidelity and structural CSV behavior before the next load

That is how you turn “Excel broke the accents” from a vague complaint into a repeatable diagnosis and prevention workflow.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

View author profile Read editorial policy

Free, privacy-first utilities in your browser — no uploads required for most workflows.

CSV ValidatorFree CSV validator that checks for malformed rows, duplicate headers, delimiter issues, and encoding problems. Runs entirely in your browser - no uploads required.

CSV & data files cluster

Explore guides on CSV validation, encoding, conversion, cleaning, and browser-first workflows—paired with Elysiate’s CSV tools hub.

Pillar guide

Free CSV Tools for Developers (2025 Guide) - CLI, Libraries & Online Tools

Comprehensive guide to free CSV tools for developers in 2025. Compare CLI tools, libraries, online tools, and frameworks for data processing.

View all CSV guides →