Null sentinels: empty string vs NULL vs N/A vs "-"
Level: intermediate · ~15 min read · Intent: informational
Audience: developers, data analysts, ops engineers, data engineers, technical teams
Prerequisites
- basic familiarity with CSV files
- basic understanding of databases or ETL workflows
Key takeaways
- CSV itself does not standardize one universal null token. Empty field, quoted empty string, NULL, N/A, and - only become meaningful through a producer-consumer contract.
- The most dangerous null bugs happen when one system treats a token as missing data while another treats the same token as a real literal string.
- A safe ingestion workflow defines null behavior per column type, preserves the original file, and configures loader-specific null handling explicitly instead of relying on defaults.
References
FAQ
- Does CSV have a standard null value?
- No. CSV defines field structure and quoting rules, but not one universal null token. Null meaning must be agreed on by the producer and consumer.
- Is an empty field the same as an empty string?
- Not always. In many loaders, an unquoted empty field can be treated as NULL while a quoted empty string can be treated as a real empty string.
- Should I use N/A or - as null values?
- Only if the contract explicitly says so and the values cannot also be legitimate business values in that column.
- What is the safest default?
- Choose one documented null strategy, apply it explicitly in every loader, and keep the raw export so you can replay and reinterpret if the contract changes.
Null sentinels: empty string vs NULL vs N/A vs "-"
A lot of CSV data problems are really null problems in disguise.
The file looks fine. The row counts look fine. The parser succeeds.
Then downstream you discover that one system thought this value meant “missing”:
NULL
while another treated it as the literal string "NULL".
Or one loader interpreted this as a missing value:
,,
while another preserved it as an empty string in text columns.
That is why null sentinels are so dangerous. They often fail semantically, not structurally.
If you want the practical inspection side first, start with the CSV Row Checker, CSV Validator, and Malformed CSV Checker. If you need broader transformation help, the Converter is the natural companion.
This guide explains how empty string, SQL NULL, N/A, and - behave in CSV workflows, why the meaning differs across loaders, and how to define a safe null contract.
Why this topic matters
Teams search for this topic when they need to:
- decide how missing values should be encoded in CSV
- distinguish empty string from SQL NULL
- stop placeholders like
N/Aor-from polluting analytics - configure database loaders consistently
- avoid silent null coercion during imports
- reconcile text columns with optional numeric or date columns
- normalize vendor exports before warehouse load
- preserve replay safety when null rules change later
This matters because null tokens often cross several layers:
- source system export
- spreadsheet editing
- ETL validation
- database load
- BI or analytics layer
If each layer interprets empties and placeholders differently, the file becomes a semantic minefield.
CSV itself does not define one universal null token
RFC 4180 defines field separators, optional headers, quoting rules, and how commas, quotes, and line breaks behave inside fields. It does not define a universal null representation for missing data. citeturn822004search3turn822004search0
That means none of these are standardized by CSV itself:
- empty field
- quoted empty field
NULLN/A-\N
They only gain meaning when producer and consumer agree on them. citeturn822004search3turn822004search0
This is the most important foundation for the whole topic.
The core semantic distinction
A good null strategy starts by separating four different ideas.
1. Missing value
The value is unknown, unavailable, or not present.
2. Empty string
The value is intentionally present as a zero-length text value.
3. Placeholder literal
A token like N/A, -, or UNKNOWN appears in the source file.
4. Domain-specific “not applicable”
The value is intentionally not relevant, which is not always the same thing as missing.
A lot of bad pipelines collapse all four into one thing. That is where meaning gets lost.
Why empty field and empty string are not the same thing
This is one of the most important operational distinctions.
Consider these examples:
Unquoted empty field:
id,name,email
1,Alice,
Quoted empty string:
id,name,email
1,Alice,""
Those two look similar to humans. They are not necessarily the same to a loader.
PostgreSQL’s COPY docs explicitly describe this distinction. In CSV format, a NULL is written as an unquoted empty string by default, while an empty string data value is written as "". The docs also explain that reading values follows the same distinction, and that options like FORCE_NULL and FORCE_NOT_NULL can override aspects of that behavior. citeturn822004search0
That is a very strong real-world example: unquoted empty and quoted empty are not equivalent in many CSV database workflows. citeturn822004search0
PostgreSQL makes the null contract explicit
PostgreSQL’s COPY docs are among the clearest primary sources for this topic.
They say:
- in text format, the default null string is
\N - in CSV format, the default null string is an unquoted empty string
- the
NULLoption lets you choose the string that represents NULL FORCE_NULLandFORCE_NOT_NULLcan influence how quoted and unquoted values are interpreted for selected columns citeturn822004search0
This means PostgreSQL already models the exact problem many teams struggle with:
- empty field
- quoted empty field
- custom null string
- per-column interpretation overrides citeturn822004search0
That is a useful mental model even if PostgreSQL is not your final target.
BigQuery behaves differently enough to matter
BigQuery’s CSV loading docs say you can specify custom null markers for CSV data. The BigQuery client-library docs are even more specific: nullMarker sets the string that represents a null value in a CSV file, the default value is the empty string, and if you set a custom null marker, BigQuery throws an error if an empty string is present for all data types except STRING and BYTE. For STRING and BYTE, BigQuery interprets the empty string as an empty value. citeturn822004search1turn822004search5turn822004search17
This is a crucial gotcha.
If you change the null marker to something like \N, then:
- empty string may still be interpreted as an empty value for text-like columns
- but empty string in other typed columns can become an error
That means BigQuery’s null semantics are not the same as PostgreSQL’s defaults. citeturn822004search1turn822004search5turn822004search17
So a file that loads “correctly” into PostgreSQL can still behave differently in BigQuery if you do not align the null contract.
DuckDB also expects explicit null-string choices
DuckDB’s COPY statement docs say NULLSTR controls the string written to represent a NULL value, and the default shown for CSV is the empty string. DuckDB’s CSV overview also emphasizes that CSV import often requires explicit configuration when auto-detection is not sufficient. citeturn822004search6turn822004search2
So DuckDB fits the same overall pattern:
- null meaning is configurable
- defaults exist
- but portability depends on making those choices explicit rather than assumed citeturn822004search6turn822004search2
What makes N/A and "-" especially dangerous
Placeholder strings like:
N/A-UNKNOWNNONE
feel convenient because humans can read them.
They are dangerous because they can be both:
- missing-value placeholders and
- legitimate business values
Examples:
-might be a real SKU componentN/Amight be a real survey response categoryUNKNOWNmight be a distinct business state, not missing data
This is why placeholder strings should not be used casually as null sentinels across all columns.
A safer rule is:
For typed columns
Use one explicit loader-recognized null token.
For descriptive text columns
Be very careful about treating placeholder strings as null, because the literal string may be meaningful.
A practical null-sentinel policy
A strong CSV contract usually defines null handling by column class.
Text columns
Decide explicitly whether you need to distinguish:
- missing
- empty string
- placeholder literal
In many workflows, keeping the difference between missing and empty string is important.
Numeric columns
Avoid N/A and - unless the loader or preprocessor is guaranteed to normalize them before casting.
Date and timestamp columns
Placeholders are especially dangerous because they often trigger parser or cast errors. Prefer one documented null token or a clean empty-field strategy.
Boolean or flag columns
Do not mix:
- blank
N/A-false
These are different meanings and should stay different.
A practical comparison table
| Token in source | Possible meaning | Risk |
|---|---|---|
| unquoted empty field | missing value | may be treated as NULL by some loaders |
quoted empty string "" |
real empty text | may be collapsed incorrectly if the loader config is wrong |
NULL |
explicit null sentinel or literal text | ambiguous unless contract says otherwise |
N/A |
not applicable or literal text | risky in free-text columns |
- |
placeholder or literal hyphen | very risky as a global null token |
\N |
explicit loader-oriented null sentinel | good when documented and supported |
This is why a global “replace all - with NULL” transform is often a mistake.
The safest workflow
A safe null workflow usually looks like this.
1. Preserve the raw file
If null rules later change, you need the original bytes to replay safely.
2. Profile each column’s sentinel values
Check:
- empty field count
- quoted empty string count
NULLliteral countN/Acount-count- other common placeholders
3. Decide null policy per column
Do not assume one global rule works for every field.
4. Configure the loader explicitly
Use:
- PostgreSQL
COPYoptions - BigQuery null marker settings
- DuckDB
NULLSTRinstead of relying on defaults you have not documented. citeturn822004search0turn822004search5turn822004search6
5. Record what rule was used
Your batch metadata should say which null contract applied when the file was loaded.
6. Preserve a review path for ambiguous placeholders
If N/A appears in a text field, that may deserve review instead of automatic coercion.
Good examples
Example 1: safe explicit null token for typed fields
Source:
customer_id,age,signup_date
1001,\N,\N
This can work well if the loader is explicitly configured to treat \N as NULL.
Example 2: preserving empty string as real text
Source:
customer_id,middle_name
1002,""
In PostgreSQL CSV workflows, this can be kept distinct from a missing value if you do not collapse quoted empty strings into NULL. citeturn822004search0
Example 3: dangerous global placeholder
Source:
product_id,product_note
2001,-
Is - missing, intentionally blank, or literally the correct note?
Without a contract, you cannot know.
Example 4: BigQuery custom null marker surprise
If you configure BigQuery to use \N as the null marker, empty string behavior changes by type: empty string can still be an empty value for STRING and BYTE, but can produce errors for other types. citeturn822004search5turn822004search17
That is why cross-system null contracts need explicit testing.
Common anti-patterns
Using N/A as a universal null token
This is one of the fastest ways to erase domain meaning.
Treating quoted empty string and unquoted empty field as equivalent
PostgreSQL explicitly models them differently in CSV. citeturn822004search0
Relying on loader defaults you never documented
Defaults differ across platforms.
Normalizing placeholders globally before column profiling
You may destroy legitimate values.
Forgetting that null strategy is part of replay safety
If you cannot say which null rule loaded a batch, later reprocessing becomes hard.
Which Elysiate tools fit this article best?
For this topic, the most natural supporting tools are:
- CSV Row Checker
- Malformed CSV Checker
- CSV Validator
- CSV Splitter
- CSV Merge
- CSV to JSON
- CSV tools hub
These fit naturally because null-sentinel problems are semantic until they suddenly become loader-specific parse and cast failures.
FAQ
Does CSV have a standard null value?
No. CSV defines field structure and quoting rules, but not one universal null token. Null meaning must be agreed on by the producer and consumer. citeturn822004search3turn822004search0
Is an empty field the same as an empty string?
Not always. In many loaders, an unquoted empty field can be treated as NULL while a quoted empty string can be treated as a real empty string. PostgreSQL’s CSV COPY behavior makes this distinction explicit. citeturn822004search0
Should I use N/A or - as null values?
Only if the contract explicitly says so and the values cannot also be legitimate business values in that column.
Why do the same CSV values behave differently in PostgreSQL and BigQuery?
Because their null-handling semantics and defaults differ. PostgreSQL’s CSV defaults distinguish unquoted empty from quoted empty, while BigQuery’s null marker behavior also varies by column type once a custom marker is configured. citeturn822004search0turn822004search5turn822004search17
What is the safest default?
Choose one documented null strategy, apply it explicitly in every loader, and keep the raw export so you can replay and reinterpret if the contract changes.
What is the biggest null-sentinel mistake?
Letting placeholder strings become de facto nulls without documenting whether they are allowed literal values.
Final takeaway
Null sentinels are not tiny cleanup details. They are part of the data contract.
The safest baseline is:
- define null behavior explicitly
- distinguish missing from empty string where it matters
- avoid casual placeholder tokens like
N/Aand- - configure each loader deliberately
- preserve the raw file for replay
That is how you keep null handling from quietly changing meaning between systems.
About the author
Elysiate publishes practical guides and privacy-first tools for data workflows, developer tooling, SEO, and product engineering.