Redshift COPY: delimiter surprises and compression

·By Elysiate·Updated Apr 10, 2026·
redshiftamazon-redshiftcsvcopy-commanddata-warehousingetl
·

Level: intermediate · ~14 min read · Intent: informational

Audience: data engineers, analytics engineers, backend developers, ops engineers, technical teams

Prerequisites

  • basic familiarity with CSV files
  • basic familiarity with SQL
  • basic familiarity with Amazon S3

Key takeaways

  • Redshift COPY is pipe-delimited by default, so teams loading comma-separated files must explicitly use CSV or DELIMITER settings instead of assuming comma is automatic.
  • CSV mode changes parsing behavior around quoting and embedded delimiters, which is why a plain DELIMITER-based load can fail on files that look like ordinary CSV.
  • Compression settings such as GZIP, BZIP2, LZOP, and ZSTD affect throughput and parallelism, and compressed loads still need correct delimiter, quote, encoding, and bad-row strategy.
  • The safest pattern is to validate structure before COPY, stage raw loads deliberately, and use NOLOAD, MAXERROR, manifests, and multi-file S3 layouts to debug before production runs.

References

FAQ

What delimiter does Redshift COPY use by default?
For delimited text, Amazon Redshift COPY expects character-delimited UTF-8 text and uses the pipe character by default unless you specify CSV or a DELIMITER value.
Should I use CSV or DELIMITER in Redshift COPY?
Use CSV when your file follows CSV quoting rules and can contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV-style quote semantics.
Can Redshift COPY load gzip files from S3?
Yes. COPY supports compressed input files with parameters such as GZIP, BZIP2, LZOP, and ZSTD, with ZSTD support documented for S3 loads.
Why does Redshift COPY fail on files that open fine in Excel?
Excel is a viewer and editor with its own import behavior. Redshift COPY expects the file contract to match delimiter, quote, encoding, null, and date settings exactly, so spreadsheet-friendly files can still fail in the warehouse.
How should I debug a Redshift COPY load before production?
Validate the file first, then use a staging table, test with NOLOAD, inspect delimiter and quote rules, and only then run the real COPY with the right compression and error-handling options.
0

Redshift COPY: delimiter surprises and compression

Amazon Redshift COPY is fast enough that teams often trust it too early.

That is usually where the trouble starts.

A file lands in S3, somebody points COPY at the bucket, the load works in dev once, and everyone assumes the contract is settled. Then one producer changes delimiter, another gzips the file, another starts including commas inside quoted text, and the warehouse suddenly fills with partial rows, skipped records, or hard failures.

This topic matters because many teams search for it only after seeing one of these symptoms:

  • Redshift COPY delimiter error
  • Redshift COPY CSV vs DELIMITER
  • Redshift COPY gzip from S3
  • Redshift COPY quoted commas fail
  • Redshift COPY load compressed CSV
  • Redshift COPY default delimiter
  • Redshift COPY manifest with gzip
  • Redshift COPY MAXERROR or NOLOAD
  • Redshift COPY performance on many files
  • Redshift COPY remove quotes or escape characters

This guide is built for those real production failures.

The central point is simple:

Redshift COPY is not guessing what your file means. It is enforcing a parsing contract.

If that contract is wrong, the warehouse load is wrong.

Why Redshift COPY surprises teams

Teams coming from spreadsheets or pandas often assume a text file is “basically CSV” if it opens in Excel.

Redshift does not work that way.

The Redshift documentation is explicit: by default, COPY expects character-delimited UTF-8 text, and the default delimiter is a pipe (|). That one detail alone explains a huge number of first-load failures.

If your producer emits comma-separated files and your COPY statement does not specify CSV or an explicit delimiter, Redshift is not being difficult. It is doing exactly what it was told.

That makes this page important for several search intents at once:

  • Redshift default delimiter confusion
  • Redshift CSV import errors
  • Redshift warehouse staging best practices
  • Redshift compressed file loading
  • Redshift COPY warehouse debugging

The first surprise: Redshift is not comma-delimited by default

This is the highest-value point in the article because it catches so many teams.

For delimited text input, Redshift COPY defaults to:

  • UTF-8 text
  • character-delimited input
  • pipe as the delimiter

So when people ask:

Why did my CSV fail in Redshift?

the first question is often:

Did you actually tell Redshift it was CSV?

If the file is comma-separated and you do not specify CSV, Redshift will not assume comma. If the file is tab-delimited and you do not specify DELIMITER '\t', Redshift will not infer tab. If the file is semicolon-separated because it came from a European export, Redshift will not infer that either.

That is why delimiter debugging is not a minor detail. It is usually the start of the whole load contract.

CSV mode is not the same as plain DELIMITER mode

A lot of teams think CSV is just shorthand for DELIMITER ','.

It is not.

Redshift’s documentation separates CSV format from generic character-delimited text because CSV mode changes how fields are interpreted, especially around quotation marks and delimiters inside quoted strings.

That matters when your input contains values like:

customer_id,company_name,notes
101,"Acme, Inc.","Ships on Fridays"

If you use plain delimiter logic without the right CSV handling, the comma inside "Acme, Inc." can be treated as a field boundary instead of text. The result is a malformed row count, shifted columns, or outright load failure.

This creates a big cluster of search traffic opportunities:

  • Redshift COPY CSV quotes
  • Redshift COPY commas inside quoted fields
  • Redshift COPY invalid quote handling
  • Redshift COPY file opens in Excel but fails in warehouse

The second surprise: quoting rules decide whether the row is valid

CSV is simple until quotation marks show up.

Then it becomes a contract problem.

In Redshift CSV mode, quoted fields follow CSV-style handling. The docs note that fields containing the delimiter can be enclosed in quotation marks, and embedded quotation marks must be doubled appropriately.

That means a field like this:

"A ""quoted"" word"

is valid CSV text for a value containing inner quotes.

But teams often receive files where producers do one of these instead:

  • forget to quote a field containing a delimiter
  • escape quotes with backslashes instead of CSV doubling
  • mix quoted and unquoted conventions across exports
  • emit line breaks inside fields without consistent CSV wrapping

Once that happens, Redshift is not dealing with “messy text.” It is dealing with broken structure.

The third surprise: compression does not fix bad structure

Compressed files load efficiently, but compression is not a parsing feature.

Redshift supports compressed load inputs such as:

  • GZIP
  • BZIP2
  • LZOP
  • ZSTD

The docs also note that ZSTD is supported with COPY from Amazon S3.

That is useful for performance and storage, but it creates a common misunderstanding:

We gzipped the file, so why does COPY still fail?

Because compression only changes how bytes are stored and transferred. It does not change:

  • delimiter behavior
  • quote rules
  • encoding assumptions
  • bad-line handling
  • header row expectations
  • column count consistency

A broken CSV inside a .gz file is still a broken CSV.

Compression and throughput: what teams usually miss

Compression helps, but the load pattern matters just as much.

Redshift documentation recommends splitting large loads into multiple smaller files and loading them in parallel. The COPY command can load multiple files from a common S3 prefix or from a manifest, which is important because one giant compressed file can limit parallelism compared with a well-sized set of multiple files.

This is one of the best SEO and practical angles for the page, because it captures queries like:

  • Redshift COPY gzip performance
  • Redshift COPY multiple files vs one file
  • Redshift load compressed CSV from S3
  • Redshift manifest best practices
  • Redshift parallel load compressed files

Good practical rule

Do not only ask:

  • “Should the file be gzipped?”

Also ask:

  • “Should the load be one big file or many parallelizable files?”

In many real pipelines, that matters more.

The fourth surprise: file compression and table compression are different topics

Another reason people search for “compression” here is that Redshift has file compression for input files and column compression encodings inside the warehouse.

Those are not the same thing.

File compression

This is about the source file:

  • .gz
  • .bz2
  • .lzo
  • .zst

Table compression / encodings

This is about how Redshift stores columns after load.

The COMPUPDATE option matters here. Redshift’s load-operation docs explain that when COMPUPDATE is omitted, Redshift can choose column encodings only under certain conditions, such as empty target tables without preassigned encodings beyond RAW.

That means teams discussing “compression” often mix two separate questions:

  1. How should I compress the input file?
  2. How should Redshift encode the loaded table?

They are related operationally, but they are not interchangeable.

The fifth surprise: S3 layout can be as important as SQL syntax

A lot of failed COPY discussions focus entirely on the SQL statement. But the S3 layout often decides whether the load is clean and fast.

Redshift’s S3 loading docs allow:

  • a single object path
  • a common key prefix
  • a manifest file listing exact objects

That matters because production pipelines often need one of these behaviors:

Prefix-based loading

Useful when every file in a path belongs to the same load batch and format.

Manifest-based loading

Useful when:

  • only certain files belong to the batch
  • files are spread across locations
  • you want explicit reproducibility
  • you need to avoid accidental stray files matching a prefix

Multi-file parallel loading

Useful when you want Redshift to load more efficiently from several files rather than a single oversized export.

This creates more strong long-tail coverage:

  • Redshift COPY manifest
  • Redshift COPY load by S3 prefix
  • Redshift COPY exact file list
  • Redshift compressed file parallel loading

The delimiter surprises teams hit most often

Here are the patterns that keep breaking pipelines.

1. Assuming comma is the default

It is not. Pipe is the default for delimited text.

2. Using DELIMITER on real CSV files with quoted commas

A plain delimiter load can fail when quoted CSV semantics are actually needed.

3. Loading semicolon exports without declaring the delimiter

Very common with regional vendor exports.

4. Using tab-delimited files that are named .csv

Spreadsheet users do this all the time.

5. Forgetting headers

A file with a header row can fail or misload if IGNOREHEADER strategy is not deliberate.

6. Misunderstanding NULLs and blanks

Redshift conversion options like NULL AS, BLANKSASNULL, and EMPTYASNULL change semantics materially. Those should be chosen, not guessed.

7. Mixing quote-repair with structure-repair

If quoting is wrong, this is not just a type issue. It is a row-boundary problem.

Redshift COPY options that matter most for this topic

This is the practical center of the article.

CSV

Use it when your file is real CSV and may contain delimiters inside quoted fields.

DELIMITER

Use it when the producer contract is known and the file is plain character-delimited text. Examples:

  • DELIMITER ','
  • DELIMITER '|'
  • DELIMITER '\t'

IGNOREHEADER

Useful when the file includes a single header row you want Redshift to skip.

GZIP, BZIP2, LZOP, ZSTD

Tell Redshift the source files are compressed and should be decompressed during load.

MAXERROR

Useful when you need controlled tolerance for some row errors, but dangerous if teams start using it as a substitute for understanding the problem.

NOLOAD

One of the best debugging options. It validates the load without actually inserting data.

ESCAPE and REMOVEQUOTES

Important when source text includes quoting or escaping behavior that differs from a clean CSV producer.

COMPUPDATE

Relevant to warehouse storage encoding decisions, especially on empty tables and first-load scenarios.

The safest Redshift COPY workflow for delimited and compressed files

If the data matters, this is the pattern to use.

Step 1. Validate the raw file before warehouse load

Check:

  • delimiter
  • header behavior
  • encoding
  • consistent column counts
  • quoted commas
  • quoted newlines
  • broken rows

Step 2. Decide whether the file is real CSV or generic delimited text

Do not blur these together.

Step 3. Choose the right compression and file layout

Ask:

  • one large file or many smaller files?
  • prefix-based load or manifest?
  • gzip or another supported compression format?

Step 4. Test with NOLOAD

This catches structure issues early.

Step 5. Load into a staging table first

Especially when:

  • the producer is external
  • the schema changes often
  • you are debugging dates, nulls, or coercions
  • you do not fully trust the file yet

Step 6. Apply domain rules after structural validity

Structure first. Business rules second.

That order is what keeps error triage sane.

Example patterns

Example 1: clean comma-separated CSV from S3

Use:

  • CSV
  • optional IGNOREHEADER 1
  • maybe GZIP if compressed

This is the typical “happy path.”

Example 2: semicolon-delimited vendor export

Use:

  • DELIMITER ';'
  • only use CSV if the vendor is truly emitting CSV-style quoting around semicolon-delimited fields

Example 3: pipe-delimited compressed batch files

Use:

  • default pipe delimiter or explicit DELIMITER '|'
  • GZIP or other compression flag
  • multi-file S3 prefix or manifest

Example 4: debugging a suspicious load

Use:

  • NOLOAD
  • staging table
  • explicit delimiter
  • minimal conversion assumptions
  • small representative sample

What NOT to do

These are the anti-patterns most worth ranking for.

Do not assume spreadsheet behavior equals warehouse behavior

Excel opening the file proves very little.

Do not use MAXERROR to hide a producer problem

It can help triage, but it should not become the architecture.

Do not compress first and validate never

Compression is not quality.

Do not let one giant monolithic file become the default forever

Parallel file layout is often better operationally.

Do not confuse file compression with Redshift column compression

They solve different problems.

Do not skip staging when the source is unreliable

Warehouse facts tables are a bad place to discover a delimiter bug.

Debugging checklist for Redshift COPY failures

When a load fails, check in this order:

  1. Is the file actually UTF-8 text in the expected encoding?
  2. Is the delimiter declared correctly?
  3. Is the file real CSV that needs CSV mode?
  4. Are there delimiters inside quoted fields?
  5. Is there a header row?
  6. Are compressed files declared with the right compression option?
  7. Are multiple files being loaded by prefix when a manifest would be safer?
  8. Would NOLOAD reveal the problem before insert?
  9. Should the table be loaded through staging first?
  10. Are conversion settings causing the apparent failure rather than delimiter rules?

That sequence saves a lot of time.

Which tools fit this article naturally?

The best companions for this page are structural validators, not spreadsheet tips.

Use:

These are most useful before the file ever reaches S3 or the warehouse.

Why this page can rank broadly

To support your impressions goal, this page is intentionally shaped around multiple search clusters at once:

Redshift syntax intent

  • redshift copy delimiter
  • redshift copy csv
  • redshift copy gzip
  • redshift copy ignoreheader
  • redshift copy noload

Failure intent

  • redshift copy failed on csv
  • redshift copy quoted commas
  • redshift copy wrong delimiter
  • redshift copy compressed file error
  • redshift copy file opens in excel but fails

Architecture intent

  • redshift copy manifest vs prefix
  • redshift copy multiple files
  • redshift copy staging table
  • redshift copy maxerror best practices
  • redshift copy compression strategy

That breadth is how one page ranks for more than one exact title match.

FAQ

What delimiter does Redshift COPY use by default?

For delimited text input, Redshift expects character-delimited UTF-8 text and uses the pipe character by default unless you specify otherwise.

Should I use CSV or DELIMITER in Redshift COPY?

Use CSV when the file follows CSV quoting rules and may contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV quote semantics.

Can Redshift COPY load gzip files from S3?

Yes. Redshift supports compressed input with options including GZIP, BZIP2, LZOP, and ZSTD, with ZSTD documented for S3-based loads.

Why does Redshift COPY fail on files that open in Excel?

Because Excel is not the parsing contract. Redshift needs the file’s delimiter, quote behavior, encoding, header handling, and null conventions to match the COPY statement exactly.

What is the best way to debug a Redshift COPY load?

Validate the file structure first, then test the load with NOLOAD, verify delimiter and quote rules, and use a staging table before loading the final target.

Final takeaway

Redshift COPY is powerful because it is strict.

That same strictness is what makes delimiter mistakes, CSV misunderstandings, and compression assumptions so expensive in production.

The safest baseline is:

  • validate the file first
  • declare delimiter behavior explicitly
  • use CSV when the file is truly CSV
  • treat compression as a transport choice, not a repair strategy
  • split large loads into sensible files
  • use manifests when precision matters
  • debug with NOLOAD
  • stage before final inserts

That is how you turn COPY from a source of surprises into a reliable warehouse ingestion boundary.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

CSV & data files cluster

Explore guides on CSV validation, encoding, conversion, cleaning, and browser-first workflows—paired with Elysiate’s CSV tools hub.

Pillar guide

Free CSV Tools for Developers (2025 Guide) - CLI, Libraries & Online Tools

Comprehensive guide to free CSV tools for developers in 2025. Compare CLI tools, libraries, online tools, and frameworks for data processing.

View all CSV guides →

Related posts