SQL Foreign Keys Explained
Level: intermediate · ~18 min read · Intent: informational
Audience: backend developers, data analysts, data engineers, technical teams, software engineers, students
Prerequisites
- basic familiarity with databases
- basic understanding of tables, rows, columns, and primary keys
Key takeaways
- A foreign key links rows in one table to valid rows in another table, which helps enforce referential integrity and prevents invalid relationships from being stored.
- Foreign keys are one of the most practical database design tools because they protect real business relationships, but they should be used with clear cascade rules, sensible indexing, and a good understanding of application behavior.
FAQ
- What is a foreign key in SQL?
- A foreign key is a column or set of columns in one table that references a valid primary key or unique key in another table. It enforces that the relationship between those tables remains valid.
- Why are foreign keys important?
- Foreign keys are important because they protect data integrity. They stop orphaned rows, invalid references, and relationship mistakes that application code alone may fail to prevent.
- What is the difference between a primary key and a foreign key?
- A primary key uniquely identifies a row in its own table, while a foreign key points to a valid row in another table and represents a relationship between the two tables.
- Should every relationship use a foreign key?
- Not always, but many important business relationships should. The best choice depends on the system, performance requirements, ingestion patterns, and whether strict database-level integrity is needed.
SQL foreign keys are one of the most important features in relational databases because they define and protect the relationships between tables.
Without foreign keys, you can still create tables that look related. But the database itself has no reliable way to enforce that those relationships are valid.
That is where problems start.
You may end up with:
- orders that point to customers who do not exist
- comments linked to deleted posts
- payments tied to invalid invoices
- or child rows that no longer have a real parent row anywhere in the database
These are not small issues. They create integrity problems that can quietly corrupt application logic, reporting, and downstream systems.
That is why foreign keys matter so much.
They are not only about database theory. They are one of the most practical tools for making real systems safer.
This guide explains SQL foreign keys clearly, including:
- what they are
- how they work
- how they relate to primary keys
- how cascade rules behave
- when to use them
- when to be careful
- and the most common mistakes developers make
Why foreign keys matter
In a relational database, tables are not meant to exist in isolation.
Real systems usually model entities such as:
- users
- orders
- invoices
- products
- tickets
- employees
- departments
- subscriptions
- sessions
- and events
Those entities are connected.
Examples:
- an order belongs to a customer
- a comment belongs to a post
- an invoice belongs to an account
- a line item belongs to an order
- an employee belongs to a department
A foreign key is how the database understands and enforces that connection.
Without it, you are relying entirely on:
- application code
- import logic
- API validation
- and developer discipline
Those help, but they are not enough on their own in many systems.
A foreign key gives the database its own ability to say:
- this child row must point to a real parent row
- otherwise the write is invalid
That is the core value.
The most important rule
Before going deeper, remember this:
A foreign key is not just a link between tables. It is a rule that protects the validity of that link.
That matters because people often think of foreign keys only as:
- columns used in joins
But a foreign key is more than that.
A column like customer_id in an orders table can exist without a foreign key constraint.
You can still join on it.
The difference is that without the foreign key constraint:
- the database does not guarantee the value actually points to a valid customer
So the most important distinction is:
- a relationship can exist logically in your schema design
- but a foreign key makes the database enforce that relationship
That is why foreign keys are really about integrity, not only structure.
What a foreign key is
A foreign key is a column or set of columns in one table that references a valid key in another table.
Usually, it points to:
- a primary key or
- a unique key
The table being referenced is often called the:
- parent table
- referenced table
The table containing the foreign key is often called the:
- child table
- referencing table
Example
Suppose you have these tables:
customers
customer_idcustomer_name
orders
order_idcustomer_idtotal_amount
Here, orders.customer_id can be a foreign key pointing to customers.customer_id.
That means:
- every order must refer to a real customer
- unless the foreign key column is nullable and null is allowed by the design
This is a very common parent-child relationship.
Primary key versus foreign key
This is one of the most important distinctions.
Primary key
A primary key uniquely identifies a row in its own table.
Example:
customer_iduniquely identifies each customer
Foreign key
A foreign key points from one table to a valid row in another table.
Example:
orders.customer_idpoints to a validcustomers.customer_id
So:
- primary key = identity of the row itself
- foreign key = relationship to another row elsewhere
That is the cleanest way to understand the difference.
A simple foreign key example
Here is a basic SQL example:
CREATE TABLE customers (
customer_id BIGINT PRIMARY KEY,
customer_name VARCHAR(255) NOT NULL
);
Now the child table:
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
total_amount DECIMAL(12,2) NOT NULL,
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
);
This says:
orders.customer_idmust match a realcustomers.customer_id
If you try to insert an order with a customer ID that does not exist, the database should reject it.
That is referential integrity in action.
What referential integrity means
Referential integrity means:
- references between tables stay valid
In practice, that means a child row cannot point to a parent row that does not exist.
Example:
If orders.customer_id references customers.customer_id, then the database should not allow:
INSERT INTO orders (order_id, customer_id, total_amount)
VALUES (1001, 999999, 149.99);
if customer 999999 does not exist.
Without a foreign key, that row might be inserted. With a foreign key, it should fail.
That is the simplest and most important effect of foreign keys.
What problems foreign keys prevent
Foreign keys help prevent problems like:
1. Orphaned child rows
Example:
- an order that points to no real customer
2. Invalid application data
Example:
- a ticket assigned to a nonexistent user
3. Import mistakes
Example:
- data loaded in the wrong order or with broken references
4. Silent drift between related tables
Example:
- application code changes and starts writing bad IDs
5. Untrustworthy joins and reports
Example:
- reports include rows that look related but are actually invalid
This is why foreign keys are so useful. They catch issues at the database layer before bad data spreads further.
Foreign keys model relationships
A foreign key usually represents a relationship like:
One-to-many
The most common case.
Examples:
- one customer has many orders
- one department has many employees
- one post has many comments
In this design:
- the parent table has one row
- the child table stores many rows referencing that parent
Example:
- one
customers.customer_id - many
orders.customer_idvalues pointing to it
One-to-one
Less common, but valid.
Examples:
- one user has one profile row
- one order has one payment summary row
This usually needs:
- a foreign key
- plus uniqueness enforcement on the child side
Many-to-many
Handled through a junction table.
Examples:
- users belong to many roles
- roles belong to many users
Here the junction table often contains two foreign keys.
Example:
user_roles.user_idreferencesusers.user_iduser_roles.role_idreferencesroles.role_id
That is how many-to-many relationships are typically enforced in relational systems.
Foreign key example with a many-to-many table
Suppose you have:
users
user_id
roles
role_id
user_roles
user_idrole_id
Then:
CREATE TABLE user_roles (
user_id BIGINT NOT NULL,
role_id BIGINT NOT NULL,
CONSTRAINT fk_user_roles_user
FOREIGN KEY (user_id)
REFERENCES users(user_id),
CONSTRAINT fk_user_roles_role
FOREIGN KEY (role_id)
REFERENCES roles(role_id)
);
This ensures:
- each
user_idpoints to a real user - each
role_idpoints to a real role
That is a very common relational design.
Foreign keys do not automatically create indexes
This is an important practical detail.
A foreign key enforces integrity. It does not automatically guarantee that the best supporting indexes exist in every database system.
That means you often still need to think about indexing separately.
For example, if orders.customer_id is used heavily in:
- joins
- filters
- recent order lookups by customer
- or customer deletion/update checks
then an index on orders.customer_id may still be very important.
Example:
CREATE INDEX idx_orders_customer_id
ON orders (customer_id);
This is one of the most practical foreign-key best practices:
- use foreign keys for integrity
- use indexes for performance where needed
The two ideas are related, but not identical.
Why indexing foreign key columns often matters
A foreign key column is often used in common workloads like:
- join orders to customers
- fetch comments for a post
- list tickets for a tenant
- find order items for an order
- delete or update a parent row while checking child references
Without a supporting index, these operations can become slower as data grows.
So while a foreign key is about correctness, the related index is often about speed.
That is why backend developers and data engineers should think about both together.
ON DELETE and ON UPDATE behavior
Foreign keys become especially important when a parent row is changed or removed.
What happens if:
- a customer is deleted
- but orders still reference that customer?
What happens if:
- a parent key value changes
- and child rows still point to the old value?
That is where foreign key actions matter.
Common options include:
RESTRICTNO ACTIONCASCADESET NULLSET DEFAULT
The exact defaults and nuances vary by database, but the concepts are important.
RESTRICT or NO ACTION
These behaviors usually mean:
- do not allow the parent row to be deleted or updated in a way that would break the child relationship
Example: If a customer has orders, trying to delete that customer may fail.
This is often the safest default mindset because it protects data from accidental destruction.
Conceptually:
- parent cannot disappear while children still depend on it
This is often what you want in real business data.
CASCADE
Cascade means:
- changes to the parent are automatically applied to the child relationship in the specified way
The most common example is ON DELETE CASCADE.
That means:
- if the parent row is deleted
- the related child rows are automatically deleted too
Example:
CREATE TABLE order_items (
order_item_id BIGINT PRIMARY KEY,
order_id BIGINT NOT NULL,
product_id BIGINT NOT NULL,
quantity INT NOT NULL,
CONSTRAINT fk_order_items_order
FOREIGN KEY (order_id)
REFERENCES orders(order_id)
ON DELETE CASCADE
);
This means:
- delete the order
- and its order items are removed automatically
This can be very useful when the child rows do not make sense without the parent.
When ON DELETE CASCADE makes sense
Cascade delete often makes sense for:
- order items under an order
- session rows under a user
- child metadata rows under a parent object
- link-table rows in many-to-many relationships
- temporary dependent records that have no meaning on their own
The main idea is:
- if the parent disappears, the child should disappear too
That is a strong and reasonable pattern in many designs.
When ON DELETE CASCADE can be dangerous
Cascade delete can also be risky if used carelessly.
It can be dangerous when:
- deleting one row accidentally deletes a large amount of important related data
- the child data has independent business importance
- auditability matters
- users do not expect the cascade
- the cascade spans too many tables or too much volume
For example:
- deleting a customer and automatically deleting all invoices, payments, and audit records might be much too destructive
That is why cascade rules should be chosen deliberately. They are powerful, but they are not automatically correct.
SET NULL
ON DELETE SET NULL means:
- if the parent row is deleted
- the foreign key column in the child is set to NULL
This only works when the child foreign key column allows NULL.
This can make sense when:
- the child row can still exist meaningfully without the parent
- but the relationship should be cleared
Example:
- a task row may remain even if the assigned user account is removed, but
assigned_user_idbecomes NULL
This is a more flexible option than cascade, but it only makes sense when a null relationship is actually valid.
SET DEFAULT
Some systems support SET DEFAULT.
This means:
- if the parent row is deleted or updated in the relevant way
- the child foreign key is set to its default value
This is less common in day-to-day design, but it exists as an option in some engines and designs.
It only makes sense when:
- there is a meaningful default relationship target
That is less common than cascade or set null.
ON UPDATE behavior
Foreign keys can also define behavior when the parent key value changes.
In many systems, changing primary keys is rare because identity values are often stable. But when it matters, options like:
ON UPDATE CASCADEON UPDATE RESTRICT- or similar rules
can control what happens.
In practice, most application designs avoid changing primary key values.
So ON UPDATE matters less often than ON DELETE, but it is still part of foreign key behavior.
Nullable foreign keys
A foreign key column does not always have to be NOT NULL.
Example:
CREATE TABLE tasks (
task_id BIGINT PRIMARY KEY,
assigned_user_id BIGINT NULL,
title VARCHAR(255) NOT NULL,
CONSTRAINT fk_tasks_assigned_user
FOREIGN KEY (assigned_user_id)
REFERENCES users(user_id)
);
This means:
- a task may or may not be assigned to a user
- but if it is assigned, that user must exist
This is a very useful pattern.
It lets the relationship be optional while still enforcing integrity when the relationship is present.
Foreign keys and application design
Foreign keys matter a lot in backend systems because they decide where integrity lives.
If the relationship is enforced only in application code:
- bugs
- race conditions
- alternate scripts
- data imports
- and maintenance work
can still insert bad data.
If the relationship is enforced in the database:
- invalid writes are rejected centrally
That is why many backend systems benefit from foreign keys even when the application also validates inputs.
The application can give user-friendly errors. The database can still guarantee the rule.
That is a much stronger model.
Foreign keys in data engineering and ingestion systems
Data engineers sometimes treat foreign keys differently depending on the layer.
In raw ingestion zones, foreign keys may be avoided because:
- source data may arrive out of order
- late data may be normal
- the raw layer may need to preserve source state before reconciliation
But in curated or trusted layers, foreign keys or equivalent relationship validation logic can still be extremely useful.
So the question is not always:
- foreign keys everywhere or nowhere
It is often:
- which layers need strict relational enforcement
- and which layers need more flexible landing behavior first
That is an important design distinction.
Foreign keys and data loading order
One practical consequence of foreign keys is that data often must be loaded in a sensible order.
If child rows reference parent rows, the parents usually need to exist first.
For example:
- load customers
- then load orders
- then load order_items
If you try to load child rows first and the referenced parents do not exist yet, foreign key checks can fail.
This is one reason data migrations, ETL jobs, and seed scripts often need careful sequencing.
It is not a reason to avoid foreign keys. It is just part of working with them correctly.
Common foreign key mistakes
There are a few common mistakes that cause most foreign-key confusion.
1. Treating foreign keys as optional metadata
They are integrity rules, not only documentation.
2. Skipping indexes on heavily used child keys
This can create performance pain even when integrity is correct.
3. Using cascade rules without thinking through consequences
Automatic deletes can be much broader than people expect.
4. Forgetting that nullable foreign keys change business meaning
A nullable relationship is not the same thing as a required relationship.
5. Relying only on app logic to enforce relationships
This leaves the database vulnerable to invalid writes from other paths.
6. Losing track of load order in imports or migrations
Parent rows often need to exist before children can be inserted.
7. Using foreign keys without understanding the real lifecycle of the data
The correct action on delete depends on business meaning, not habit.
A simple decision framework for foreign keys
When deciding whether and how to use a foreign key, ask:
1. Is this a real business relationship?
If yes, a foreign key is often worth considering.
2. Must the child always point to a valid parent?
If yes, use a required foreign key.
3. Can the relationship be optional?
If yes, a nullable foreign key may fit.
4. What should happen if the parent is deleted?
Should the delete:
- fail
- cascade
- set null
- or follow another business rule?
5. Is the foreign key column used heavily in joins or filters?
If yes, consider supporting indexes.
That sequence usually leads to much better foreign-key design decisions.
Practical examples
Example 1: orders belong to customers
CREATE TABLE customers (
customer_id BIGINT PRIMARY KEY,
customer_name VARCHAR(255) NOT NULL
);
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
customer_id BIGINT NOT NULL,
total_amount DECIMAL(12,2) NOT NULL,
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id)
REFERENCES customers(customer_id)
);
Meaning:
- every order must belong to a real customer
Example 2: comments belong to posts and should disappear with the post
CREATE TABLE posts (
post_id BIGINT PRIMARY KEY,
title VARCHAR(255) NOT NULL
);
CREATE TABLE comments (
comment_id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
body TEXT NOT NULL,
CONSTRAINT fk_comments_post
FOREIGN KEY (post_id)
REFERENCES posts(post_id)
ON DELETE CASCADE
);
Meaning:
- every comment belongs to a real post
- delete the post, delete its comments
Example 3: tasks may optionally be assigned to a user
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
email VARCHAR(255) NOT NULL
);
CREATE TABLE tasks (
task_id BIGINT PRIMARY KEY,
assigned_user_id BIGINT NULL,
title VARCHAR(255) NOT NULL,
CONSTRAINT fk_tasks_user
FOREIGN KEY (assigned_user_id)
REFERENCES users(user_id)
ON DELETE SET NULL
);
Meaning:
- a task may be unassigned
- if the assigned user disappears, the task remains but becomes unassigned
These examples show that foreign-key design is really about modeling lifecycle and meaning, not only syntax.
FAQ
What is a foreign key in SQL?
A foreign key is a column or set of columns in one table that references a valid primary key or unique key in another table. It enforces that the relationship between those tables remains valid.
Why are foreign keys important?
Foreign keys are important because they protect data integrity. They stop orphaned rows, invalid references, and relationship mistakes that application code alone may fail to prevent.
What is the difference between a primary key and a foreign key?
A primary key uniquely identifies a row in its own table, while a foreign key points to a valid row in another table and represents a relationship between the two tables.
Should every relationship use a foreign key?
Not always, but many important business relationships should. The best choice depends on the system, performance requirements, ingestion patterns, and whether strict database-level integrity is needed.
Final thoughts
SQL foreign keys are one of the most practical and important tools in relational database design because they turn assumed relationships into enforced relationships.
That is what makes them so valuable.
They help the database say:
- this row must point to something real
- and if it does not, the write is invalid
That protects:
- application correctness
- reporting reliability
- join trustworthiness
- and long-term data integrity
The most important things to remember are:
- a foreign key links a child table to a valid parent row
- it is about integrity, not only joins
- primary keys identify rows, foreign keys relate rows
- delete and update rules matter a lot
- indexes often matter alongside foreign keys
- and the right design depends on real business lifecycle rules
If you understand those ideas clearly, foreign keys stop feeling like a formal database feature and start feeling like what they really are:
a powerful way to make your data model safer, clearer, and much harder to corrupt by accident.