COPY FROM CSV: NULL Markers and Encoding Gotchas in PostgreSQL

·By Elysiate·Updated Apr 5, 2026·
csvpostgresqldatabasecopyencodingnull
·

Level: intermediate · ~12 min read · Intent: informational

Audience: developers, data analysts, ops engineers, database engineers

Prerequisites

  • basic familiarity with CSV files
  • basic familiarity with PostgreSQL

Key takeaways

  • In PostgreSQL COPY CSV mode, null handling is subtle: unquoted empty input is treated differently from quoted empty strings, and the NULL option changes how those values are interpreted.
  • FORCE_NULL and FORCE_NOT_NULL exist to control how quoted and unquoted values matching the null string are treated, and they can completely change the meaning of empty-looking fields.
  • Encoding problems are often caused by mismatched assumptions between file encoding, client encoding, and PostgreSQL COPY options, which is why staging-as-text is often the safest production pattern.

FAQ

Why does PostgreSQL COPY FROM CSV confuse NULL and empty strings?
Because CSV itself has no universal built-in distinction between NULL and empty string. PostgreSQL COPY CSV mode uses quoting and the NULL option to decide how to interpret empty-looking fields.
What is the default NULL behavior in PostgreSQL COPY CSV?
In CSV mode, the default null string is an unquoted empty string. A quoted empty string is treated as an empty string value instead of NULL unless other options change that behavior.
What do FORCE_NULL and FORCE_NOT_NULL do?
They control whether values matching the null string should be interpreted as NULL even when quoted, or should avoid NULL interpretation when unquoted. They are useful when source files mix ambiguous empty-looking values.
What is the safest way to import messy CSV into PostgreSQL?
A common safe pattern is to validate structure first, load into a staging table as text, and then apply explicit casts and business rules in SQL instead of relying on a single fragile load step.
0

COPY FROM CSV: NULL Markers and Encoding Gotchas in PostgreSQL

COPY FROM CSV is one of PostgreSQL’s best features for bulk loading data quickly.

It is also one of the easiest places to lose time in production when a CSV file looks fine to humans but does not mean what PostgreSQL thinks it means.

The two biggest sources of confusion are usually:

  • NULL markers
  • encoding

Those problems are especially dangerous because they do not always look dramatic. Sometimes the load fails loudly. Sometimes it succeeds and quietly turns the wrong values into NULL, empty strings, or bad text. Sometimes a date column fails only on a few rows. Sometimes a non-UTF-8 file works in one tool but not another. Sometimes the source system and the database are both "correct" but their assumptions are different.

This guide explains how PostgreSQL COPY FROM ... WITH (FORMAT csv) actually thinks about nulls and encodings, where the gotchas come from, and why staging tables are often the safest production pattern.

If you want the practical tools first, start with the Malformed CSV Checker, CSV Validator, CSV Splitter, CSV Merge, CSV to JSON, or the universal converter.

Why CSV null handling is awkward in the first place

CSV is a text interchange format, not a typed database format.

That means a CSV file can show values like:

  • nothing between delimiters
  • ""
  • NULL
  • N/A
  • \N
  • a blank-looking field that is actually quoted
  • a value that only looks empty because of whitespace or encoding weirdness

PostgreSQL has to map those text patterns into database semantics, and there is no universally perfect answer because CSV itself does not carry rich null semantics.

This is why null handling in COPY CSV is not just a syntax detail. It is part of the data contract between the file producer and the database loader.

The first thing to understand: quoted and unquoted values do not mean the same thing

This is the heart of most PostgreSQL CSV null confusion.

In CSV mode, PostgreSQL’s current docs explain that the default NULL string is an unquoted empty string. They also explain that quoted and unquoted values that match the null string can be treated differently, which is exactly why FORCE_NULL and FORCE_NOT_NULL exist. citeturn218495search0

In practical terms, this means PostgreSQL distinguishes between cases like:

  • an unquoted empty field
  • a quoted empty string
  • a literal string that matches your configured null marker
  • a quoted version of that same literal string

That is easy to forget when you are looking at a spreadsheet export where all of these may appear visually similar.

The default CSV null behavior in PostgreSQL

PostgreSQL’s documentation and long-standing COPY behavior make one subtle rule especially important:

  • in CSV mode, the default null marker is an unquoted empty string
  • a quoted empty string is treated as an empty string, not as NULL, under the default behavior citeturn218495search0turn218495search2

That difference is why this kind of file can behave differently than many teams expect:

id,name,note
1,Alice,
2,Bob,""

Under default CSV null behavior, those last fields are not necessarily the same thing.

That matters a lot for columns like:

  • optional text
  • dates
  • numbers
  • foreign keys
  • status fields
  • anything that downstream logic distinguishes between "missing" and "present but empty"

Why this breaks real loads

This usually causes pain in one of three ways.

1. The file producer thinks empty means empty string

A source system may emit blank fields because it wants to represent empty text, not missing data.

2. The database team thinks blank means NULL

The PostgreSQL loader may be expected to treat certain blank-looking values as missing values.

3. Both are true in different columns

This is where the worst CSV contracts live. One file uses the same textual pattern to mean different things depending on the column.

That is exactly when production loads start becoming fragile.

The NULL option changes the contract

PostgreSQL COPY lets you choose a different null marker with the NULL option.

That means you can tell PostgreSQL that a specific string should be interpreted as null. But the moment you do that, you are explicitly choosing a text contract that the file producer must follow consistently.

A typical idea looks like this:

COPY staging_table
FROM '/path/data.csv'
WITH (
  FORMAT csv,
  HEADER true,
  NULL 'NULL'
);

This can be much clearer than relying on blank fields, but only if the source file is disciplined.

If the source system sometimes writes:

  • blank for null
  • NULL for null
  • "" for empty string
  • N/A for unknown
  • quoted "NULL" for a literal string

then you are still dealing with ambiguity. You have simply moved the ambiguity into a different marker.

FORCE_NULL: when quoted null-looking values should still become NULL

This option exists because quoted and unquoted values are normally treated differently.

The PostgreSQL docs say that FORCE_NULL causes values matching the null string to be treated as NULL even if the value is quoted. They also note that FORCE_NULL and FORCE_NOT_NULL can be used simultaneously on the same column. citeturn218495search0turn218495search3

That means FORCE_NULL is useful when the source file sometimes wraps null markers in quotes but you still want them interpreted as missing values.

Practical case:

  • source file emits "NULL" instead of bare NULL
  • without FORCE_NULL, that may be treated as literal text
  • with FORCE_NULL, PostgreSQL can still interpret it as NULL

This is useful, but it is also a signal that the upstream file contract is already a bit messy.

FORCE_NOT_NULL: when unquoted null-looking values should stay non-null

FORCE_NOT_NULL works in the opposite direction.

The PostgreSQL docs explain that it prevents values matching the null string from being interpreted as null for the specified columns. This is especially useful when you want empty-looking input to remain an empty string rather than becoming NULL. citeturn218495search0turn218495search3

This is helpful when:

  • your file producer uses blank fields for empty strings
  • the destination text column should preserve empty string as a real value
  • you do not want CSV default null behavior for specific columns

This is one of the cleanest ways to avoid the "blank text field became NULL" problem.

FORCE_NULL and FORCE_NOT_NULL can be used together

This is a subtle but very practical PostgreSQL feature.

The current docs say that FORCE_NULL and FORCE_NOT_NULL can be used simultaneously on the same column, and that the result is:

  • quoted null strings become NULL
  • unquoted null strings become empty strings citeturn218495search0

This gives you a way to distinguish intentionally quoted null markers from bare empty-looking fields.

That is powerful, but it also means your loader logic is now encoding quite a lot of contract meaning. If you need this level of nuance, document it clearly so another engineer does not "simplify" the load later and change the data semantics accidentally.

The safest mental model for nulls

When loading CSV into PostgreSQL, ask these questions explicitly:

  • What exact text means NULL?
  • What exact text means empty string?
  • Does quoting change the meaning?
  • Are those rules consistent across all columns?
  • Are text columns and numeric/date columns supposed to behave the same way?
  • Does the file producer know this contract?

If those questions do not have clear answers, COPY is not the real problem. The file contract is.

Encoding problems are the other half of the pain

After null handling, encoding is the next major source of trouble.

A CSV file can look fine in one tool and still fail in PostgreSQL because:

  • the file is not actually UTF-8
  • the file contains characters outside the expected encoding
  • the source system exported in a legacy code page
  • the client and server are not assuming the same encoding
  • a BOM or locale-specific characters show up unexpectedly
  • a tool upstream silently rewrote the bytes

PostgreSQL’s character set documentation explains that clients interact with the server using client encodings, while the server itself has its own database encoding. It also notes that clients ordinarily care most about client_encoding. citeturn218495search1turn218495search10

That means encoding bugs are not only about "what bytes are in the file." They are also about which encoding PostgreSQL thinks it is reading.

Why ENCODING in COPY matters

COPY supports an ENCODING option, and the PostgreSQL docs note that if it is omitted, COPY uses the current client encoding. That one detail explains a lot of mysterious import behavior. citeturn218495search0turn218495search1

If the file was exported as one encoding but PostgreSQL is interpreting it through a different client_encoding, you can get:

  • load failures
  • unexpected replacement characters
  • silently wrong text
  • only certain rows failing when non-ASCII characters appear

So the encoding contract for CSV imports should not be implicit.

A disciplined load process should know:

  • what encoding the file producer emits
  • what encoding the COPY step expects
  • whether the session client_encoding already matches
  • whether the ENCODING option should be set explicitly

Common PostgreSQL encoding gotchas

1. Files that are not UTF-8 but are treated as UTF-8

This is one of the most common failure modes in multilingual or legacy-system environments.

2. Mixed assumptions across tools

A spreadsheet, a text editor, and PostgreSQL may not all agree on the encoding of the same file.

3. SQL_ASCII or legacy environments

PostgreSQL supports more than one encoding configuration, and that means environments are not always as strict or uniform as teams assume. The official character set docs make clear that PostgreSQL supports many encodings and that not all behave the same operationally. citeturn218495search1

4. Source systems that silently change export behavior

A feed that used to be clean UTF-8 may later include smart quotes, accented names, or Windows-1252-style exports without anyone updating the documentation.

Why staging tables are often the safest strategy

When CSV contracts are even slightly messy, loading directly into typed production tables is often the wrong first step.

A safer pattern is:

  1. validate the CSV structure
  2. load into a staging table with text columns
  3. inspect null-like values and encoding behavior
  4. apply controlled casts and business rules in SQL
  5. quarantine bad rows before they hit core tables

This is slower than a single elegant COPY into the final schema.

It is also much safer.

The reason is simple: once bad null interpretation or encoding damage happens during a direct typed load, diagnosis gets harder and rollback gets uglier.

Why text-first staging helps so much

A text staging table lets you answer questions like:

  • which values are literally blank
  • which values are quoted empties
  • which rows contain suspicious characters
  • which null markers are actually present
  • which columns have mixed conventions
  • which rows fail numeric or date casts after you inspect them

That gives you visibility before irreversible coercion happens.

A practical import pattern

A robust PostgreSQL CSV pipeline often looks like this:

Step 1: Validate structure outside the database

Check:

  • delimiter
  • quoting
  • row consistency
  • header behavior
  • obvious encoding issues

Step 2: Load to a text staging table

Use COPY ... CSV with explicit options where possible.

Step 3: Profile null-like values

Look for:

  • blank fields
  • empty strings
  • NULL
  • N/A
  • \N
  • whitespace-only values
  • suspicious quoted markers

Step 4: Apply explicit cast logic

Only after you understand the actual field patterns should you cast to:

  • integer
  • numeric
  • date
  • timestamp
  • boolean
  • foreign keys

Step 5: Insert or upsert into final tables

Now the semantics are explicit instead of hidden inside a single fragile load step.

Real-world examples of fragile assumptions

"Blank means NULL everywhere"

This breaks the moment a text field genuinely needs to preserve empty string.

"Quoted NULL means the same as NULL"

Not unless your options make it so.

"The file is UTF-8 because it opened on my machine"

Not a trustworthy validation strategy.

"COPY failed on one row, so the row is bad"

Sometimes the row is bad. Sometimes the encoding assumption is wrong. Sometimes the actual problem began one row earlier with a malformed quoted field.

What to document in your data contract

If PostgreSQL CSV loads matter operationally, document at least this:

  • delimiter
  • quote character
  • header presence
  • null marker
  • whether blank and empty string are different
  • whether quoted null markers should be NULL
  • expected file encoding
  • whether COPY sets ENCODING explicitly
  • which columns are staged as text
  • which casts happen later

This does not need to be a giant spec. But it should be explicit enough that another engineer can reproduce the load logic without guessing.

FAQ

Why does PostgreSQL COPY FROM CSV confuse NULL and empty strings?

Because CSV itself does not carry a universal null model. PostgreSQL uses the null string, quoting rules, and CSV mode behavior to infer the difference.

What is the default NULL behavior in PostgreSQL COPY CSV?

In CSV mode, the default null string is an unquoted empty string. A quoted empty string is treated as an empty string value under default behavior. citeturn218495search0turn218495search2

What do FORCE_NULL and FORCE_NOT_NULL do?

They let you control whether values matching the null string should be treated as NULL even when quoted, or should avoid null interpretation in certain columns. PostgreSQL also allows them to be used together. citeturn218495search0turn218495search3

Why does encoding still break loads even when the CSV looks fine?

Because the visible text is not the same thing as the actual byte encoding, and PostgreSQL COPY interprets the file according to the current client encoding unless you specify ENCODING. citeturn218495search0turn218495search1

What is the safest way to import messy CSV into PostgreSQL?

Validate structure first, load into a text staging table, then apply explicit conversions and business rules.

If you are trying to make PostgreSQL CSV loads more predictable, these are the best next steps:

Final takeaway

COPY FROM CSV is fast because PostgreSQL assumes you already know what the file means.

Most production pain appears when that assumption is false.

The biggest traps are usually not mysterious PostgreSQL bugs. They are unspoken contracts about:

  • what counts as NULL
  • what counts as empty string
  • whether quoting changes the meaning
  • what encoding the file actually uses

Once you make those rules explicit, PostgreSQL COPY becomes much more predictable and much easier to trust.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

PostgreSQL cluster

Explore the connected PostgreSQL guides around tuning, indexing, operations, schema design, scaling, and app integrations.

Pillar guide

PostgreSQL Performance Tuning: Complete Developer Guide

A practical PostgreSQL performance tuning guide for developers covering indexing, query plans, caching, connection pooling, vacuum, schema design, and troubleshooting with real examples.

View all PostgreSQL guides →

Related posts