Null sentinels: empty string vs NULL vs N/A vs "-"

·By Elysiate·Updated Apr 9, 2026·
csvnulldata-qualityvalidationetlpostgresql
·

Level: intermediate · ~15 min read · Intent: informational

Audience: developers, data analysts, ops engineers, data engineers, technical teams

Prerequisites

  • basic familiarity with CSV files
  • basic understanding of databases or ETL workflows

Key takeaways

  • CSV itself does not standardize one universal null token. Empty field, quoted empty string, NULL, N/A, and - only become meaningful through a producer-consumer contract.
  • The most dangerous null bugs happen when one system treats a token as missing data while another treats the same token as a real literal string.
  • A safe ingestion workflow defines null behavior per column type, preserves the original file, and configures loader-specific null handling explicitly instead of relying on defaults.

References

FAQ

Does CSV have a standard null value?
No. CSV defines field structure and quoting rules, but not one universal null token. Null meaning must be agreed on by the producer and consumer.
Is an empty field the same as an empty string?
Not always. In many loaders, an unquoted empty field can be treated as NULL while a quoted empty string can be treated as a real empty string.
Should I use N/A or - as null values?
Only if the contract explicitly says so and the values cannot also be legitimate business values in that column.
What is the safest default?
Choose one documented null strategy, apply it explicitly in every loader, and keep the raw export so you can replay and reinterpret if the contract changes.
0

Null sentinels: empty string vs NULL vs N/A vs "-"

A lot of CSV data problems are really null problems in disguise.

The file looks fine. The row counts look fine. The parser succeeds.

Then downstream you discover that one system thought this value meant “missing”:

NULL

while another treated it as the literal string "NULL".

Or one loader interpreted this as a missing value:

,,

while another preserved it as an empty string in text columns.

That is why null sentinels are so dangerous. They often fail semantically, not structurally.

If you want the practical inspection side first, start with the CSV Row Checker, CSV Validator, and Malformed CSV Checker. If you need broader transformation help, the Converter is the natural companion.

This guide explains how empty string, SQL NULL, N/A, and - behave in CSV workflows, why the meaning differs across loaders, and how to define a safe null contract.

Why this topic matters

Teams search for this topic when they need to:

  • decide how missing values should be encoded in CSV
  • distinguish empty string from SQL NULL
  • stop placeholders like N/A or - from polluting analytics
  • configure database loaders consistently
  • avoid silent null coercion during imports
  • reconcile text columns with optional numeric or date columns
  • normalize vendor exports before warehouse load
  • preserve replay safety when null rules change later

This matters because null tokens often cross several layers:

  • source system export
  • spreadsheet editing
  • ETL validation
  • database load
  • BI or analytics layer

If each layer interprets empties and placeholders differently, the file becomes a semantic minefield.

CSV itself does not define one universal null token

RFC 4180 defines field separators, optional headers, quoting rules, and how commas, quotes, and line breaks behave inside fields. It does not define a universal null representation for missing data. citeturn822004search3turn822004search0

That means none of these are standardized by CSV itself:

  • empty field
  • quoted empty field
  • NULL
  • N/A
  • -
  • \N

They only gain meaning when producer and consumer agree on them. citeturn822004search3turn822004search0

This is the most important foundation for the whole topic.

The core semantic distinction

A good null strategy starts by separating four different ideas.

1. Missing value

The value is unknown, unavailable, or not present.

2. Empty string

The value is intentionally present as a zero-length text value.

3. Placeholder literal

A token like N/A, -, or UNKNOWN appears in the source file.

4. Domain-specific “not applicable”

The value is intentionally not relevant, which is not always the same thing as missing.

A lot of bad pipelines collapse all four into one thing. That is where meaning gets lost.

Why empty field and empty string are not the same thing

This is one of the most important operational distinctions.

Consider these examples:

Unquoted empty field:

id,name,email
1,Alice,

Quoted empty string:

id,name,email
1,Alice,""

Those two look similar to humans. They are not necessarily the same to a loader.

PostgreSQL’s COPY docs explicitly describe this distinction. In CSV format, a NULL is written as an unquoted empty string by default, while an empty string data value is written as "". The docs also explain that reading values follows the same distinction, and that options like FORCE_NULL and FORCE_NOT_NULL can override aspects of that behavior. citeturn822004search0

That is a very strong real-world example: unquoted empty and quoted empty are not equivalent in many CSV database workflows. citeturn822004search0

PostgreSQL makes the null contract explicit

PostgreSQL’s COPY docs are among the clearest primary sources for this topic.

They say:

  • in text format, the default null string is \N
  • in CSV format, the default null string is an unquoted empty string
  • the NULL option lets you choose the string that represents NULL
  • FORCE_NULL and FORCE_NOT_NULL can influence how quoted and unquoted values are interpreted for selected columns citeturn822004search0

This means PostgreSQL already models the exact problem many teams struggle with:

  • empty field
  • quoted empty field
  • custom null string
  • per-column interpretation overrides citeturn822004search0

That is a useful mental model even if PostgreSQL is not your final target.

BigQuery behaves differently enough to matter

BigQuery’s CSV loading docs say you can specify custom null markers for CSV data. The BigQuery client-library docs are even more specific: nullMarker sets the string that represents a null value in a CSV file, the default value is the empty string, and if you set a custom null marker, BigQuery throws an error if an empty string is present for all data types except STRING and BYTE. For STRING and BYTE, BigQuery interprets the empty string as an empty value. citeturn822004search1turn822004search5turn822004search17

This is a crucial gotcha.

If you change the null marker to something like \N, then:

  • empty string may still be interpreted as an empty value for text-like columns
  • but empty string in other typed columns can become an error

That means BigQuery’s null semantics are not the same as PostgreSQL’s defaults. citeturn822004search1turn822004search5turn822004search17

So a file that loads “correctly” into PostgreSQL can still behave differently in BigQuery if you do not align the null contract.

DuckDB also expects explicit null-string choices

DuckDB’s COPY statement docs say NULLSTR controls the string written to represent a NULL value, and the default shown for CSV is the empty string. DuckDB’s CSV overview also emphasizes that CSV import often requires explicit configuration when auto-detection is not sufficient. citeturn822004search6turn822004search2

So DuckDB fits the same overall pattern:

  • null meaning is configurable
  • defaults exist
  • but portability depends on making those choices explicit rather than assumed citeturn822004search6turn822004search2

What makes N/A and "-" especially dangerous

Placeholder strings like:

  • N/A
  • -
  • UNKNOWN
  • NONE

feel convenient because humans can read them.

They are dangerous because they can be both:

  • missing-value placeholders and
  • legitimate business values

Examples:

  • - might be a real SKU component
  • N/A might be a real survey response category
  • UNKNOWN might be a distinct business state, not missing data

This is why placeholder strings should not be used casually as null sentinels across all columns.

A safer rule is:

For typed columns

Use one explicit loader-recognized null token.

For descriptive text columns

Be very careful about treating placeholder strings as null, because the literal string may be meaningful.

A practical null-sentinel policy

A strong CSV contract usually defines null handling by column class.

Text columns

Decide explicitly whether you need to distinguish:

  • missing
  • empty string
  • placeholder literal

In many workflows, keeping the difference between missing and empty string is important.

Numeric columns

Avoid N/A and - unless the loader or preprocessor is guaranteed to normalize them before casting.

Date and timestamp columns

Placeholders are especially dangerous because they often trigger parser or cast errors. Prefer one documented null token or a clean empty-field strategy.

Boolean or flag columns

Do not mix:

  • blank
  • N/A
  • -
  • false

These are different meanings and should stay different.

A practical comparison table

Token in source Possible meaning Risk
unquoted empty field missing value may be treated as NULL by some loaders
quoted empty string "" real empty text may be collapsed incorrectly if the loader config is wrong
NULL explicit null sentinel or literal text ambiguous unless contract says otherwise
N/A not applicable or literal text risky in free-text columns
- placeholder or literal hyphen very risky as a global null token
\N explicit loader-oriented null sentinel good when documented and supported

This is why a global “replace all - with NULL” transform is often a mistake.

The safest workflow

A safe null workflow usually looks like this.

1. Preserve the raw file

If null rules later change, you need the original bytes to replay safely.

2. Profile each column’s sentinel values

Check:

  • empty field count
  • quoted empty string count
  • NULL literal count
  • N/A count
  • - count
  • other common placeholders

3. Decide null policy per column

Do not assume one global rule works for every field.

4. Configure the loader explicitly

Use:

  • PostgreSQL COPY options
  • BigQuery null marker settings
  • DuckDB NULLSTR instead of relying on defaults you have not documented. citeturn822004search0turn822004search5turn822004search6

5. Record what rule was used

Your batch metadata should say which null contract applied when the file was loaded.

6. Preserve a review path for ambiguous placeholders

If N/A appears in a text field, that may deserve review instead of automatic coercion.

Good examples

Example 1: safe explicit null token for typed fields

Source:

customer_id,age,signup_date
1001,\N,\N

This can work well if the loader is explicitly configured to treat \N as NULL.

Example 2: preserving empty string as real text

Source:

customer_id,middle_name
1002,""

In PostgreSQL CSV workflows, this can be kept distinct from a missing value if you do not collapse quoted empty strings into NULL. citeturn822004search0

Example 3: dangerous global placeholder

Source:

product_id,product_note
2001,-

Is - missing, intentionally blank, or literally the correct note? Without a contract, you cannot know.

Example 4: BigQuery custom null marker surprise

If you configure BigQuery to use \N as the null marker, empty string behavior changes by type: empty string can still be an empty value for STRING and BYTE, but can produce errors for other types. citeturn822004search5turn822004search17

That is why cross-system null contracts need explicit testing.

Common anti-patterns

Using N/A as a universal null token

This is one of the fastest ways to erase domain meaning.

Treating quoted empty string and unquoted empty field as equivalent

PostgreSQL explicitly models them differently in CSV. citeturn822004search0

Relying on loader defaults you never documented

Defaults differ across platforms.

Normalizing placeholders globally before column profiling

You may destroy legitimate values.

Forgetting that null strategy is part of replay safety

If you cannot say which null rule loaded a batch, later reprocessing becomes hard.

Which Elysiate tools fit this article best?

For this topic, the most natural supporting tools are:

These fit naturally because null-sentinel problems are semantic until they suddenly become loader-specific parse and cast failures.

FAQ

Does CSV have a standard null value?

No. CSV defines field structure and quoting rules, but not one universal null token. Null meaning must be agreed on by the producer and consumer. citeturn822004search3turn822004search0

Is an empty field the same as an empty string?

Not always. In many loaders, an unquoted empty field can be treated as NULL while a quoted empty string can be treated as a real empty string. PostgreSQL’s CSV COPY behavior makes this distinction explicit. citeturn822004search0

Should I use N/A or - as null values?

Only if the contract explicitly says so and the values cannot also be legitimate business values in that column.

Why do the same CSV values behave differently in PostgreSQL and BigQuery?

Because their null-handling semantics and defaults differ. PostgreSQL’s CSV defaults distinguish unquoted empty from quoted empty, while BigQuery’s null marker behavior also varies by column type once a custom marker is configured. citeturn822004search0turn822004search5turn822004search17

What is the safest default?

Choose one documented null strategy, apply it explicitly in every loader, and keep the raw export so you can replay and reinterpret if the contract changes.

What is the biggest null-sentinel mistake?

Letting placeholder strings become de facto nulls without documenting whether they are allowed literal values.

Final takeaway

Null sentinels are not tiny cleanup details. They are part of the data contract.

The safest baseline is:

  • define null behavior explicitly
  • distinguish missing from empty string where it matters
  • avoid casual placeholder tokens like N/A and -
  • configure each loader deliberately
  • preserve the raw file for replay

That is how you keep null handling from quietly changing meaning between systems.

About the author

Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.

PostgreSQL cluster

Explore the connected PostgreSQL guides around tuning, indexing, operations, schema design, scaling, and app integrations.

Pillar guide

PostgreSQL Performance Tuning: Complete Developer Guide

A practical PostgreSQL performance tuning guide for developers covering indexing, query plans, caching, connection pooling, vacuum, schema design, and troubleshooting with real examples.

View all PostgreSQL guides →

Related posts