Why does Redshift COPY fail on files that open fine in Excel?

Excel is a viewer and editor with its own import behavior. Redshift COPY expects the file contract to match delimiter, quote, encoding, null, and date settings exactly, so spreadsheet-friendly files can still fail in the warehouse.

How should I debug a Redshift COPY load before production?

Validate the file first, then use a staging table, test with NOLOAD, inspect delimiter and quote rules, and only then run the real COPY with the right compression and error-handling options.

Back to Blog

Redshift COPY: delimiter surprises and compression

Data & Database Workflows

Apr 10, 2026·By Elysiate·Updated Apr 10, 2026·

redshiftamazon-redshiftcsvcopy-commanddata-warehousingetl

·

Level: intermediate · ~14 min read · Intent: informational

Audience: data engineers, analytics engineers, backend developers, ops engineers, technical teams

Prerequisites

basic familiarity with CSV files
basic familiarity with SQL
basic familiarity with Amazon S3

Key takeaways

Redshift COPY is pipe-delimited by default, so teams loading comma-separated files must explicitly use CSV or DELIMITER settings instead of assuming comma is automatic.
CSV mode changes parsing behavior around quoting and embedded delimiters, which is why a plain DELIMITER-based load can fail on files that look like ordinary CSV.
Compression settings such as GZIP, BZIP2, LZOP, and ZSTD affect throughput and parallelism, and compressed loads still need correct delimiter, quote, encoding, and bad-row strategy.
The safest pattern is to validate structure before COPY, stage raw loads deliberately, and use NOLOAD, MAXERROR, manifests, and multi-file S3 layouts to debug before production runs.

References

FAQ

What delimiter does Redshift COPY use by default?: For delimited text, Amazon Redshift COPY expects character-delimited UTF-8 text and uses the pipe character by default unless you specify CSV or a DELIMITER value.
Should I use CSV or DELIMITER in Redshift COPY?: Use CSV when your file follows CSV quoting rules and can contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV-style quote semantics.
Can Redshift COPY load gzip files from S3?: Yes. COPY supports compressed input files with parameters such as GZIP, BZIP2, LZOP, and ZSTD, with ZSTD support documented for S3 loads.
Why does Redshift COPY fail on files that open fine in Excel?: Excel is a viewer and editor with its own import behavior. Redshift COPY expects the file contract to match delimiter, quote, encoding, null, and date settings exactly, so spreadsheet-friendly files can still fail in the warehouse.
How should I debug a Redshift COPY load before production?: Validate the file first, then use a staging table, test with NOLOAD, inspect delimiter and quote rules, and only then run the real COPY with the right compression and error-handling options.

0

Redshift COPY: delimiter surprises and compression

Amazon Redshift COPY is fast enough that teams often trust it too early.

That is usually where the trouble starts.

A file lands in S3, somebody points COPY at the bucket, the load works in dev once, and everyone assumes the contract is settled. Then one producer changes delimiter, another gzips the file, another starts including commas inside quoted text, and the warehouse suddenly fills with partial rows, skipped records, or hard failures.

This topic matters because many teams search for it only after seeing one of these symptoms:

Redshift COPY delimiter error
Redshift COPY CSV vs DELIMITER
Redshift COPY gzip from S3
Redshift COPY quoted commas fail
Redshift COPY load compressed CSV
Redshift COPY default delimiter
Redshift COPY manifest with gzip
Redshift COPY MAXERROR or NOLOAD
Redshift COPY performance on many files
Redshift COPY remove quotes or escape characters

This guide is built for those real production failures.

The central point is simple:

Redshift COPY is not guessing what your file means. It is enforcing a parsing contract.

If that contract is wrong, the warehouse load is wrong.

Why Redshift COPY surprises teams

Teams coming from spreadsheets or pandas often assume a text file is “basically CSV” if it opens in Excel.

Redshift does not work that way.

The Redshift documentation is explicit: by default, COPY expects character-delimited UTF-8 text, and the default delimiter is a pipe (|). That one detail alone explains a huge number of first-load failures.

If your producer emits comma-separated files and your COPY statement does not specify CSV or an explicit delimiter, Redshift is not being difficult. It is doing exactly what it was told.

That makes this page important for several search intents at once:

Redshift default delimiter confusion
Redshift CSV import errors
Redshift warehouse staging best practices
Redshift compressed file loading
Redshift COPY warehouse debugging

The first surprise: Redshift is not comma-delimited by default

This is the highest-value point in the article because it catches so many teams.

For delimited text input, Redshift COPY defaults to:

UTF-8 text
character-delimited input
pipe as the delimiter

So when people ask:

Why did my CSV fail in Redshift?

the first question is often:

Did you actually tell Redshift it was CSV?

If the file is comma-separated and you do not specify CSV, Redshift will not assume comma. If the file is tab-delimited and you do not specify DELIMITER '\t', Redshift will not infer tab. If the file is semicolon-separated because it came from a European export, Redshift will not infer that either.

That is why delimiter debugging is not a minor detail. It is usually the start of the whole load contract.

CSV mode is not the same as plain DELIMITER mode

A lot of teams think CSV is just shorthand for DELIMITER ','.

It is not.

Redshift’s documentation separates CSV format from generic character-delimited text because CSV mode changes how fields are interpreted, especially around quotation marks and delimiters inside quoted strings.

That matters when your input contains values like:

customer_id,company_name,notes
101,"Acme, Inc.","Ships on Fridays"

If you use plain delimiter logic without the right CSV handling, the comma inside "Acme, Inc." can be treated as a field boundary instead of text. The result is a malformed row count, shifted columns, or outright load failure.

This creates a big cluster of search traffic opportunities:

Redshift COPY CSV quotes
Redshift COPY commas inside quoted fields
Redshift COPY invalid quote handling
Redshift COPY file opens in Excel but fails in warehouse

The second surprise: quoting rules decide whether the row is valid

CSV is simple until quotation marks show up.

Then it becomes a contract problem.

In Redshift CSV mode, quoted fields follow CSV-style handling. The docs note that fields containing the delimiter can be enclosed in quotation marks, and embedded quotation marks must be doubled appropriately.

That means a field like this:

"A ""quoted"" word"

is valid CSV text for a value containing inner quotes.

But teams often receive files where producers do one of these instead:

forget to quote a field containing a delimiter
escape quotes with backslashes instead of CSV doubling
mix quoted and unquoted conventions across exports
emit line breaks inside fields without consistent CSV wrapping

Once that happens, Redshift is not dealing with “messy text.” It is dealing with broken structure.

The third surprise: compression does not fix bad structure

Compressed files load efficiently, but compression is not a parsing feature.

Redshift supports compressed load inputs such as:

GZIP
BZIP2
LZOP
ZSTD

The docs also note that ZSTD is supported with COPY from Amazon S3.

That is useful for performance and storage, but it creates a common misunderstanding:

We gzipped the file, so why does COPY still fail?

Because compression only changes how bytes are stored and transferred. It does not change:

delimiter behavior
quote rules
encoding assumptions
bad-line handling
header row expectations
column count consistency

A broken CSV inside a .gz file is still a broken CSV.

Compression and throughput: what teams usually miss

Compression helps, but the load pattern matters just as much.

Redshift documentation recommends splitting large loads into multiple smaller files and loading them in parallel. The COPY command can load multiple files from a common S3 prefix or from a manifest, which is important because one giant compressed file can limit parallelism compared with a well-sized set of multiple files.

This is one of the best SEO and practical angles for the page, because it captures queries like:

Redshift COPY gzip performance
Redshift COPY multiple files vs one file
Redshift load compressed CSV from S3
Redshift manifest best practices
Redshift parallel load compressed files

Good practical rule

Do not only ask:

“Should the file be gzipped?”

Also ask:

“Should the load be one big file or many parallelizable files?”

In many real pipelines, that matters more.

The fourth surprise: file compression and table compression are different topics

Another reason people search for “compression” here is that Redshift has file compression for input files and column compression encodings inside the warehouse.

Those are not the same thing.

File compression

This is about the source file:

.gz
.bz2
.lzo
.zst

Table compression / encodings

This is about how Redshift stores columns after load.

The COMPUPDATE option matters here. Redshift’s load-operation docs explain that when COMPUPDATE is omitted, Redshift can choose column encodings only under certain conditions, such as empty target tables without preassigned encodings beyond RAW.

That means teams discussing “compression” often mix two separate questions:

How should I compress the input file?
How should Redshift encode the loaded table?

They are related operationally, but they are not interchangeable.

The fifth surprise: S3 layout can be as important as SQL syntax

A lot of failed COPY discussions focus entirely on the SQL statement. But the S3 layout often decides whether the load is clean and fast.

Redshift’s S3 loading docs allow:

a single object path
a common key prefix
a manifest file listing exact objects

That matters because production pipelines often need one of these behaviors:

Prefix-based loading

Useful when every file in a path belongs to the same load batch and format.

Manifest-based loading

Useful when:

only certain files belong to the batch
files are spread across locations
you want explicit reproducibility
you need to avoid accidental stray files matching a prefix

Multi-file parallel loading

Useful when you want Redshift to load more efficiently from several files rather than a single oversized export.

This creates more strong long-tail coverage:

Redshift COPY manifest
Redshift COPY load by S3 prefix
Redshift COPY exact file list
Redshift compressed file parallel loading

The delimiter surprises teams hit most often

Here are the patterns that keep breaking pipelines.

1. Assuming comma is the default

It is not. Pipe is the default for delimited text.

2. Using DELIMITER on real CSV files with quoted commas

A plain delimiter load can fail when quoted CSV semantics are actually needed.

3. Loading semicolon exports without declaring the delimiter

Very common with regional vendor exports.

4. Using tab-delimited files that are named `.csv`

Spreadsheet users do this all the time.

5. Forgetting headers

A file with a header row can fail or misload if IGNOREHEADER strategy is not deliberate.

6. Misunderstanding NULLs and blanks

Redshift conversion options like NULL AS, BLANKSASNULL, and EMPTYASNULL change semantics materially. Those should be chosen, not guessed.

7. Mixing quote-repair with structure-repair

If quoting is wrong, this is not just a type issue. It is a row-boundary problem.

Redshift COPY options that matter most for this topic

This is the practical center of the article.

`CSV`

Use it when your file is real CSV and may contain delimiters inside quoted fields.

`DELIMITER`

Use it when the producer contract is known and the file is plain character-delimited text. Examples:

DELIMITER ','
DELIMITER '|'
DELIMITER '\t'

`IGNOREHEADER`

Useful when the file includes a single header row you want Redshift to skip.

`GZIP`, `BZIP2`, `LZOP`, `ZSTD`

Tell Redshift the source files are compressed and should be decompressed during load.

`MAXERROR`

Useful when you need controlled tolerance for some row errors, but dangerous if teams start using it as a substitute for understanding the problem.

`NOLOAD`

One of the best debugging options. It validates the load without actually inserting data.

`ESCAPE` and `REMOVEQUOTES`

Important when source text includes quoting or escaping behavior that differs from a clean CSV producer.

`COMPUPDATE`

Relevant to warehouse storage encoding decisions, especially on empty tables and first-load scenarios.

The safest Redshift COPY workflow for delimited and compressed files

If the data matters, this is the pattern to use.

Step 1. Validate the raw file before warehouse load

Check:

delimiter
header behavior
encoding
consistent column counts
quoted commas
quoted newlines
broken rows

Step 2. Decide whether the file is real CSV or generic delimited text

Do not blur these together.

Step 3. Choose the right compression and file layout

Ask:

one large file or many smaller files?
prefix-based load or manifest?
gzip or another supported compression format?

Step 4. Test with `NOLOAD`

This catches structure issues early.

Step 5. Load into a staging table first

Especially when:

the producer is external
the schema changes often
you are debugging dates, nulls, or coercions
you do not fully trust the file yet

Step 6. Apply domain rules after structural validity

Structure first. Business rules second.

That order is what keeps error triage sane.

Example patterns

Example 1: clean comma-separated CSV from S3

Use:

CSV
optional IGNOREHEADER 1
maybe GZIP if compressed

This is the typical “happy path.”

Example 2: semicolon-delimited vendor export

Use:

DELIMITER ';'
only use CSV if the vendor is truly emitting CSV-style quoting around semicolon-delimited fields

Example 3: pipe-delimited compressed batch files

Use:

default pipe delimiter or explicit DELIMITER '|'
GZIP or other compression flag
multi-file S3 prefix or manifest

Example 4: debugging a suspicious load

Use:

NOLOAD
staging table
explicit delimiter
minimal conversion assumptions
small representative sample

What NOT to do

These are the anti-patterns most worth ranking for.

Do not assume spreadsheet behavior equals warehouse behavior

Excel opening the file proves very little.

Do not use `MAXERROR` to hide a producer problem

It can help triage, but it should not become the architecture.

Do not compress first and validate never

Compression is not quality.

Do not let one giant monolithic file become the default forever

Parallel file layout is often better operationally.

Do not confuse file compression with Redshift column compression

They solve different problems.

Do not skip staging when the source is unreliable

Warehouse facts tables are a bad place to discover a delimiter bug.

Debugging checklist for Redshift COPY failures

When a load fails, check in this order:

Is the file actually UTF-8 text in the expected encoding?
Is the delimiter declared correctly?
Is the file real CSV that needs CSV mode?
Are there delimiters inside quoted fields?
Is there a header row?
Are compressed files declared with the right compression option?
Are multiple files being loaded by prefix when a manifest would be safer?
Would NOLOAD reveal the problem before insert?
Should the table be loaded through staging first?
Are conversion settings causing the apparent failure rather than delimiter rules?

That sequence saves a lot of time.

Which tools fit this article naturally?

The best companions for this page are structural validators, not spreadsheet tips.

Use:

These are most useful before the file ever reaches S3 or the warehouse.

Why this page can rank broadly

To support your impressions goal, this page is intentionally shaped around multiple search clusters at once:

Redshift syntax intent

redshift copy delimiter
redshift copy csv
redshift copy gzip
redshift copy ignoreheader
redshift copy noload

Failure intent

redshift copy failed on csv
redshift copy quoted commas
redshift copy wrong delimiter
redshift copy compressed file error
redshift copy file opens in excel but fails

Architecture intent

redshift copy manifest vs prefix
redshift copy multiple files
redshift copy staging table
redshift copy maxerror best practices
redshift copy compression strategy

That breadth is how one page ranks for more than one exact title match.

FAQ

What delimiter does Redshift COPY use by default?

For delimited text input, Redshift expects character-delimited UTF-8 text and uses the pipe character by default unless you specify otherwise.

Should I use CSV or DELIMITER in Redshift COPY?

Use CSV when the file follows CSV quoting rules and may contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV quote semantics.

Can Redshift COPY load gzip files from S3?

Yes. Redshift supports compressed input with options including GZIP, BZIP2, LZOP, and ZSTD, with ZSTD documented for S3-based loads.

Why does Redshift COPY fail on files that open in Excel?

Because Excel is not the parsing contract. Redshift needs the file’s delimiter, quote behavior, encoding, header handling, and null conventions to match the COPY statement exactly.

What is the best way to debug a Redshift COPY load?

Validate the file structure first, then test the load with NOLOAD, verify delimiter and quote rules, and use a staging table before loading the final target.

Final takeaway

Redshift COPY is powerful because it is strict.

That same strictness is what makes delimiter mistakes, CSV misunderstandings, and compression assumptions so expensive in production.

The safest baseline is:

validate the file first
declare delimiter behavior explicitly
use CSV when the file is truly CSV
treat compression as a transport choice, not a repair strategy
split large loads into sensible files
use manifests when precision matters
debug with NOLOAD
stage before final inserts

That is how you turn COPY from a source of surprises into a reliable warehouse ingestion boundary.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

View author profile Read editorial policy

Free, privacy-first utilities in your browser — no uploads required for most workflows.

CSV & data files cluster

Explore guides on CSV validation, encoding, conversion, cleaning, and browser-first workflows—paired with Elysiate’s CSV tools hub.

Pillar guide

Free CSV Tools for Developers (2025 Guide) - CLI, Libraries & Online Tools

Comprehensive guide to free CSV tools for developers in 2025. Compare CLI tools, libraries, online tools, and frameworks for data processing.

View all CSV guides →

Redshift COPY: delimiter surprises and compression

Prerequisites

Key takeaways

References

FAQ

Redshift COPY: delimiter surprises and compression

Why Redshift COPY surprises teams

The first surprise: Redshift is not comma-delimited by default

CSV mode is not the same as plain DELIMITER mode

The second surprise: quoting rules decide whether the row is valid

The third surprise: compression does not fix bad structure

Compression and throughput: what teams usually miss

Good practical rule

The fourth surprise: file compression and table compression are different topics

File compression

Table compression / encodings

The fifth surprise: S3 layout can be as important as SQL syntax

Prefix-based loading

Manifest-based loading

Multi-file parallel loading

The delimiter surprises teams hit most often

1. Assuming comma is the default

2. Using DELIMITER on real CSV files with quoted commas

3. Loading semicolon exports without declaring the delimiter

4. Using tab-delimited files that are named .csv

5. Forgetting headers

6. Misunderstanding NULLs and blanks

7. Mixing quote-repair with structure-repair

Redshift COPY options that matter most for this topic

CSV

DELIMITER

IGNOREHEADER

GZIP, BZIP2, LZOP, ZSTD

MAXERROR

NOLOAD

ESCAPE and REMOVEQUOTES

COMPUPDATE

The safest Redshift COPY workflow for delimited and compressed files

Step 1. Validate the raw file before warehouse load

Step 2. Decide whether the file is real CSV or generic delimited text

Step 3. Choose the right compression and file layout

Step 4. Test with NOLOAD

Step 5. Load into a staging table first

Step 6. Apply domain rules after structural validity

Example patterns

Example 1: clean comma-separated CSV from S3

Example 2: semicolon-delimited vendor export

Example 3: pipe-delimited compressed batch files

Example 4: debugging a suspicious load

What NOT to do

Do not assume spreadsheet behavior equals warehouse behavior

Do not use MAXERROR to hide a producer problem

Do not compress first and validate never

Do not let one giant monolithic file become the default forever

Do not confuse file compression with Redshift column compression

Do not skip staging when the source is unreliable

Debugging checklist for Redshift COPY failures

Which tools fit this article naturally?

Why this page can rank broadly

Redshift syntax intent

Failure intent

Architecture intent

FAQ

What delimiter does Redshift COPY use by default?

Should I use CSV or DELIMITER in Redshift COPY?

Can Redshift COPY load gzip files from S3?

Why does Redshift COPY fail on files that open in Excel?

What is the best way to debug a Redshift COPY load?

Final takeaway

About the author

Use these tools

CSV & data files cluster

Related posts

4. Using tab-delimited files that are named `.csv`

`CSV`

`DELIMITER`

`IGNOREHEADER`

`GZIP`, `BZIP2`, `LZOP`, `ZSTD`

`MAXERROR`

`NOLOAD`

`ESCAPE` and `REMOVEQUOTES`

`COMPUPDATE`

Step 4. Test with `NOLOAD`

Do not use `MAXERROR` to hide a producer problem