Redshift COPY: delimiter surprises and compression
Level: intermediate · ~14 min read · Intent: informational
Audience: data engineers, analytics engineers, backend developers, ops engineers, technical teams
Prerequisites
- basic familiarity with CSV files
- basic familiarity with SQL
- basic familiarity with Amazon S3
Key takeaways
- Redshift COPY is pipe-delimited by default, so teams loading comma-separated files must explicitly use CSV or DELIMITER settings instead of assuming comma is automatic.
- CSV mode changes parsing behavior around quoting and embedded delimiters, which is why a plain DELIMITER-based load can fail on files that look like ordinary CSV.
- Compression settings such as GZIP, BZIP2, LZOP, and ZSTD affect throughput and parallelism, and compressed loads still need correct delimiter, quote, encoding, and bad-row strategy.
- The safest pattern is to validate structure before COPY, stage raw loads deliberately, and use NOLOAD, MAXERROR, manifests, and multi-file S3 layouts to debug before production runs.
References
- Amazon Redshift COPY
- Amazon Redshift data format parameters
- Amazon Redshift file compression parameters
- Amazon Redshift data load operations
- Amazon Redshift COPY from Amazon S3
- Amazon Redshift loading data from compressed and uncompressed files
- Amazon Redshift compressing your data files
- Amazon Redshift COPY examples
- Amazon Redshift data conversion parameters
- RFC 4180
FAQ
- What delimiter does Redshift COPY use by default?
- For delimited text, Amazon Redshift COPY expects character-delimited UTF-8 text and uses the pipe character by default unless you specify CSV or a DELIMITER value.
- Should I use CSV or DELIMITER in Redshift COPY?
- Use CSV when your file follows CSV quoting rules and can contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV-style quote semantics.
- Can Redshift COPY load gzip files from S3?
- Yes. COPY supports compressed input files with parameters such as GZIP, BZIP2, LZOP, and ZSTD, with ZSTD support documented for S3 loads.
- Why does Redshift COPY fail on files that open fine in Excel?
- Excel is a viewer and editor with its own import behavior. Redshift COPY expects the file contract to match delimiter, quote, encoding, null, and date settings exactly, so spreadsheet-friendly files can still fail in the warehouse.
- How should I debug a Redshift COPY load before production?
- Validate the file first, then use a staging table, test with NOLOAD, inspect delimiter and quote rules, and only then run the real COPY with the right compression and error-handling options.
Redshift COPY: delimiter surprises and compression
Amazon Redshift COPY is fast enough that teams often trust it too early.
That is usually where the trouble starts.
A file lands in S3, somebody points COPY at the bucket, the load works in dev once, and everyone assumes the contract is settled. Then one producer changes delimiter, another gzips the file, another starts including commas inside quoted text, and the warehouse suddenly fills with partial rows, skipped records, or hard failures.
This topic matters because many teams search for it only after seeing one of these symptoms:
- Redshift
COPYdelimiter error - Redshift
COPYCSV vs DELIMITER - Redshift
COPYgzip from S3 - Redshift
COPYquoted commas fail - Redshift
COPYload compressed CSV - Redshift
COPYdefault delimiter - Redshift
COPYmanifest with gzip - Redshift
COPYMAXERRORorNOLOAD - Redshift
COPYperformance on many files - Redshift
COPYremove quotes or escape characters
This guide is built for those real production failures.
The central point is simple:
Redshift COPY is not guessing what your file means.
It is enforcing a parsing contract.
If that contract is wrong, the warehouse load is wrong.
Why Redshift COPY surprises teams
Teams coming from spreadsheets or pandas often assume a text file is “basically CSV” if it opens in Excel.
Redshift does not work that way.
The Redshift documentation is explicit: by default, COPY expects character-delimited UTF-8 text, and the default delimiter is a pipe (|). That one detail alone explains a huge number of first-load failures.
If your producer emits comma-separated files and your COPY statement does not specify CSV or an explicit delimiter, Redshift is not being difficult. It is doing exactly what it was told.
That makes this page important for several search intents at once:
- Redshift default delimiter confusion
- Redshift CSV import errors
- Redshift warehouse staging best practices
- Redshift compressed file loading
- Redshift
COPYwarehouse debugging
The first surprise: Redshift is not comma-delimited by default
This is the highest-value point in the article because it catches so many teams.
For delimited text input, Redshift COPY defaults to:
- UTF-8 text
- character-delimited input
- pipe as the delimiter
So when people ask:
Why did my CSV fail in Redshift?
the first question is often:
Did you actually tell Redshift it was CSV?
If the file is comma-separated and you do not specify CSV, Redshift will not assume comma.
If the file is tab-delimited and you do not specify DELIMITER '\t', Redshift will not infer tab.
If the file is semicolon-separated because it came from a European export, Redshift will not infer that either.
That is why delimiter debugging is not a minor detail. It is usually the start of the whole load contract.
CSV mode is not the same as plain DELIMITER mode
A lot of teams think CSV is just shorthand for DELIMITER ','.
It is not.
Redshift’s documentation separates CSV format from generic character-delimited text because CSV mode changes how fields are interpreted, especially around quotation marks and delimiters inside quoted strings.
That matters when your input contains values like:
customer_id,company_name,notes
101,"Acme, Inc.","Ships on Fridays"
If you use plain delimiter logic without the right CSV handling, the comma inside "Acme, Inc." can be treated as a field boundary instead of text.
The result is a malformed row count, shifted columns, or outright load failure.
This creates a big cluster of search traffic opportunities:
- Redshift
COPYCSV quotes - Redshift
COPYcommas inside quoted fields - Redshift
COPYinvalid quote handling - Redshift
COPYfile opens in Excel but fails in warehouse
The second surprise: quoting rules decide whether the row is valid
CSV is simple until quotation marks show up.
Then it becomes a contract problem.
In Redshift CSV mode, quoted fields follow CSV-style handling. The docs note that fields containing the delimiter can be enclosed in quotation marks, and embedded quotation marks must be doubled appropriately.
That means a field like this:
"A ""quoted"" word"
is valid CSV text for a value containing inner quotes.
But teams often receive files where producers do one of these instead:
- forget to quote a field containing a delimiter
- escape quotes with backslashes instead of CSV doubling
- mix quoted and unquoted conventions across exports
- emit line breaks inside fields without consistent CSV wrapping
Once that happens, Redshift is not dealing with “messy text.” It is dealing with broken structure.
The third surprise: compression does not fix bad structure
Compressed files load efficiently, but compression is not a parsing feature.
Redshift supports compressed load inputs such as:
GZIPBZIP2LZOPZSTD
The docs also note that ZSTD is supported with COPY from Amazon S3.
That is useful for performance and storage, but it creates a common misunderstanding:
We gzipped the file, so why does
COPYstill fail?
Because compression only changes how bytes are stored and transferred. It does not change:
- delimiter behavior
- quote rules
- encoding assumptions
- bad-line handling
- header row expectations
- column count consistency
A broken CSV inside a .gz file is still a broken CSV.
Compression and throughput: what teams usually miss
Compression helps, but the load pattern matters just as much.
Redshift documentation recommends splitting large loads into multiple smaller files and loading them in parallel. The COPY command can load multiple files from a common S3 prefix or from a manifest, which is important because one giant compressed file can limit parallelism compared with a well-sized set of multiple files.
This is one of the best SEO and practical angles for the page, because it captures queries like:
- Redshift
COPYgzip performance - Redshift
COPYmultiple files vs one file - Redshift load compressed CSV from S3
- Redshift manifest best practices
- Redshift parallel load compressed files
Good practical rule
Do not only ask:
- “Should the file be gzipped?”
Also ask:
- “Should the load be one big file or many parallelizable files?”
In many real pipelines, that matters more.
The fourth surprise: file compression and table compression are different topics
Another reason people search for “compression” here is that Redshift has file compression for input files and column compression encodings inside the warehouse.
Those are not the same thing.
File compression
This is about the source file:
.gz.bz2.lzo.zst
Table compression / encodings
This is about how Redshift stores columns after load.
The COMPUPDATE option matters here. Redshift’s load-operation docs explain that when COMPUPDATE is omitted, Redshift can choose column encodings only under certain conditions, such as empty target tables without preassigned encodings beyond RAW.
That means teams discussing “compression” often mix two separate questions:
- How should I compress the input file?
- How should Redshift encode the loaded table?
They are related operationally, but they are not interchangeable.
The fifth surprise: S3 layout can be as important as SQL syntax
A lot of failed COPY discussions focus entirely on the SQL statement.
But the S3 layout often decides whether the load is clean and fast.
Redshift’s S3 loading docs allow:
- a single object path
- a common key prefix
- a manifest file listing exact objects
That matters because production pipelines often need one of these behaviors:
Prefix-based loading
Useful when every file in a path belongs to the same load batch and format.
Manifest-based loading
Useful when:
- only certain files belong to the batch
- files are spread across locations
- you want explicit reproducibility
- you need to avoid accidental stray files matching a prefix
Multi-file parallel loading
Useful when you want Redshift to load more efficiently from several files rather than a single oversized export.
This creates more strong long-tail coverage:
- Redshift
COPYmanifest - Redshift
COPYload by S3 prefix - Redshift
COPYexact file list - Redshift compressed file parallel loading
The delimiter surprises teams hit most often
Here are the patterns that keep breaking pipelines.
1. Assuming comma is the default
It is not. Pipe is the default for delimited text.
2. Using DELIMITER on real CSV files with quoted commas
A plain delimiter load can fail when quoted CSV semantics are actually needed.
3. Loading semicolon exports without declaring the delimiter
Very common with regional vendor exports.
4. Using tab-delimited files that are named .csv
Spreadsheet users do this all the time.
5. Forgetting headers
A file with a header row can fail or misload if IGNOREHEADER strategy is not deliberate.
6. Misunderstanding NULLs and blanks
Redshift conversion options like NULL AS, BLANKSASNULL, and EMPTYASNULL change semantics materially. Those should be chosen, not guessed.
7. Mixing quote-repair with structure-repair
If quoting is wrong, this is not just a type issue. It is a row-boundary problem.
Redshift COPY options that matter most for this topic
This is the practical center of the article.
CSV
Use it when your file is real CSV and may contain delimiters inside quoted fields.
DELIMITER
Use it when the producer contract is known and the file is plain character-delimited text. Examples:
DELIMITER ','DELIMITER '|'DELIMITER '\t'
IGNOREHEADER
Useful when the file includes a single header row you want Redshift to skip.
GZIP, BZIP2, LZOP, ZSTD
Tell Redshift the source files are compressed and should be decompressed during load.
MAXERROR
Useful when you need controlled tolerance for some row errors, but dangerous if teams start using it as a substitute for understanding the problem.
NOLOAD
One of the best debugging options. It validates the load without actually inserting data.
ESCAPE and REMOVEQUOTES
Important when source text includes quoting or escaping behavior that differs from a clean CSV producer.
COMPUPDATE
Relevant to warehouse storage encoding decisions, especially on empty tables and first-load scenarios.
The safest Redshift COPY workflow for delimited and compressed files
If the data matters, this is the pattern to use.
Step 1. Validate the raw file before warehouse load
Check:
- delimiter
- header behavior
- encoding
- consistent column counts
- quoted commas
- quoted newlines
- broken rows
Step 2. Decide whether the file is real CSV or generic delimited text
Do not blur these together.
Step 3. Choose the right compression and file layout
Ask:
- one large file or many smaller files?
- prefix-based load or manifest?
- gzip or another supported compression format?
Step 4. Test with NOLOAD
This catches structure issues early.
Step 5. Load into a staging table first
Especially when:
- the producer is external
- the schema changes often
- you are debugging dates, nulls, or coercions
- you do not fully trust the file yet
Step 6. Apply domain rules after structural validity
Structure first. Business rules second.
That order is what keeps error triage sane.
Example patterns
Example 1: clean comma-separated CSV from S3
Use:
CSV- optional
IGNOREHEADER 1 - maybe
GZIPif compressed
This is the typical “happy path.”
Example 2: semicolon-delimited vendor export
Use:
DELIMITER ';'- only use
CSVif the vendor is truly emitting CSV-style quoting around semicolon-delimited fields
Example 3: pipe-delimited compressed batch files
Use:
- default pipe delimiter or explicit
DELIMITER '|' GZIPor other compression flag- multi-file S3 prefix or manifest
Example 4: debugging a suspicious load
Use:
NOLOAD- staging table
- explicit delimiter
- minimal conversion assumptions
- small representative sample
What NOT to do
These are the anti-patterns most worth ranking for.
Do not assume spreadsheet behavior equals warehouse behavior
Excel opening the file proves very little.
Do not use MAXERROR to hide a producer problem
It can help triage, but it should not become the architecture.
Do not compress first and validate never
Compression is not quality.
Do not let one giant monolithic file become the default forever
Parallel file layout is often better operationally.
Do not confuse file compression with Redshift column compression
They solve different problems.
Do not skip staging when the source is unreliable
Warehouse facts tables are a bad place to discover a delimiter bug.
Debugging checklist for Redshift COPY failures
When a load fails, check in this order:
- Is the file actually UTF-8 text in the expected encoding?
- Is the delimiter declared correctly?
- Is the file real CSV that needs
CSVmode? - Are there delimiters inside quoted fields?
- Is there a header row?
- Are compressed files declared with the right compression option?
- Are multiple files being loaded by prefix when a manifest would be safer?
- Would
NOLOADreveal the problem before insert? - Should the table be loaded through staging first?
- Are conversion settings causing the apparent failure rather than delimiter rules?
That sequence saves a lot of time.
Which tools fit this article naturally?
The best companions for this page are structural validators, not spreadsheet tips.
Use:
- CSV Validator
- CSV Delimiter Checker
- CSV Header Checker
- CSV Row Checker
- Malformed CSV Checker
- CSV Splitter
- CSV Merge
These are most useful before the file ever reaches S3 or the warehouse.
Why this page can rank broadly
To support your impressions goal, this page is intentionally shaped around multiple search clusters at once:
Redshift syntax intent
- redshift copy delimiter
- redshift copy csv
- redshift copy gzip
- redshift copy ignoreheader
- redshift copy noload
Failure intent
- redshift copy failed on csv
- redshift copy quoted commas
- redshift copy wrong delimiter
- redshift copy compressed file error
- redshift copy file opens in excel but fails
Architecture intent
- redshift copy manifest vs prefix
- redshift copy multiple files
- redshift copy staging table
- redshift copy maxerror best practices
- redshift copy compression strategy
That breadth is how one page ranks for more than one exact title match.
FAQ
What delimiter does Redshift COPY use by default?
For delimited text input, Redshift expects character-delimited UTF-8 text and uses the pipe character by default unless you specify otherwise.
Should I use CSV or DELIMITER in Redshift COPY?
Use CSV when the file follows CSV quoting rules and may contain delimiters inside quoted fields. Use plain DELIMITER for simpler character-delimited text when you control the producer and do not need CSV quote semantics.
Can Redshift COPY load gzip files from S3?
Yes. Redshift supports compressed input with options including GZIP, BZIP2, LZOP, and ZSTD, with ZSTD documented for S3-based loads.
Why does Redshift COPY fail on files that open in Excel?
Because Excel is not the parsing contract. Redshift needs the file’s delimiter, quote behavior, encoding, header handling, and null conventions to match the COPY statement exactly.
What is the best way to debug a Redshift COPY load?
Validate the file structure first, then test the load with NOLOAD, verify delimiter and quote rules, and use a staging table before loading the final target.
Final takeaway
Redshift COPY is powerful because it is strict.
That same strictness is what makes delimiter mistakes, CSV misunderstandings, and compression assumptions so expensive in production.
The safest baseline is:
- validate the file first
- declare delimiter behavior explicitly
- use
CSVwhen the file is truly CSV - treat compression as a transport choice, not a repair strategy
- split large loads into sensible files
- use manifests when precision matters
- debug with
NOLOAD - stage before final inserts
That is how you turn COPY from a source of surprises into a reliable warehouse ingestion boundary.
About the author
Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.