CSV to Database Import: Complete Guide (MySQL, PostgreSQL, SQL Server) - 2025

Jan 19, 2025
csvdatabasemysqlpostgresql
0

Importing CSV files into databases is a fundamental skill for data professionals, developers, and analysts. Whether you're migrating data from legacy systems, importing bulk data for analysis, or setting up data pipelines, knowing how to efficiently import CSV data into different database systems is crucial for successful data management.

This comprehensive guide will teach you how to import CSV files into the three most popular database systems: MySQL, PostgreSQL, and SQL Server. You'll learn the specific commands, best practices, and troubleshooting techniques for each system, along with advanced strategies for handling large datasets and complex data types.

Understanding CSV to Database Import

Before diving into specific database systems, let's understand the general process and common challenges.

Import Process Overview

1. Data Preparation:

  • Clean and validate CSV data
  • Ensure consistent formatting
  • Handle special characters and encoding
  • Verify data types and constraints

2. Schema Design:

  • Create appropriate table structure
  • Choose correct data types
  • Set up primary keys and indexes
  • Define constraints and relationships

3. Import Execution:

  • Use appropriate import method
  • Handle errors and exceptions
  • Monitor import progress
  • Validate data integrity

4. Post-Import Validation:

  • Verify row counts
  • Check data quality
  • Test queries and performance
  • Document the process

Common Challenges

Data Type Mismatches:

  • CSV data may not match expected database types
  • Date formats can vary significantly
  • Numeric data may have formatting issues
  • Text data may contain special characters

Performance Issues:

  • Large files can cause timeouts
  • Memory constraints with very large datasets
  • Index creation can slow down imports
  • Network latency for remote databases

Error Handling:

  • Invalid data rows causing import failures
  • Constraint violations
  • Duplicate key errors
  • Encoding and character set issues

MySQL CSV Import

Basic LOAD DATA INFILE Command

Simple Import:

LOAD DATA INFILE '/path/to/your/file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Advanced Import with Options:

LOAD DATA INFILE '/path/to/your/file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(column1, column2, column3, @dummy_column)
SET column4 = NOW();

MySQL Import Examples

Example 1: Basic Customer Data Import

-- Create table
CREATE TABLE customers (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE,
    phone VARCHAR(20),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Import data
LOAD DATA INFILE '/path/to/customers.csv'
INTO TABLE customers
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(name, email, phone);

Example 2: Sales Data with Date Handling

-- Create table
CREATE TABLE sales (
    id INT AUTO_INCREMENT PRIMARY KEY,
    product_name VARCHAR(100),
    quantity INT,
    price DECIMAL(10,2),
    sale_date DATE,
    region VARCHAR(50)
);

-- Import with date conversion
LOAD DATA INFILE '/path/to/sales.csv'
INTO TABLE sales
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(product_name, quantity, price, @sale_date, region)
SET sale_date = STR_TO_DATE(@sale_date, '%Y-%m-%d');

Example 3: Handling NULL Values and Defaults

-- Create table with defaults
CREATE TABLE products (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10,2) DEFAULT 0.00,
    category VARCHAR(50) DEFAULT 'Uncategorized',
    active BOOLEAN DEFAULT TRUE
);

-- Import with NULL handling
LOAD DATA INFILE '/path/to/products.csv'
INTO TABLE products
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(name, @description, @price, @category, @active)
SET 
    description = NULLIF(@description, ''),
    price = NULLIF(@price, ''),
    category = NULLIF(@category, ''),
    active = CASE WHEN @active = '1' THEN TRUE ELSE FALSE END;

MySQL Import Best Practices

1. Performance Optimization:

-- Disable indexes during import
ALTER TABLE your_table DISABLE KEYS;

-- Import data
LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

-- Re-enable indexes
ALTER TABLE your_table ENABLE KEYS;

2. Error Handling:

-- Import with error handling
LOAD DATA INFILE '/path/to/file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(column1, column2, column3)
SET 
    column4 = CASE 
        WHEN column1 IS NULL THEN 'Unknown'
        ELSE column1 
    END;

3. Batch Processing:

-- Process large files in batches
LOAD DATA INFILE '/path/to/large_file.csv'
INTO TABLE your_table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(column1, column2, column3)
SET batch_id = @batch_id;

PostgreSQL CSV Import

Using COPY Command

Basic COPY Import:

COPY your_table FROM '/path/to/your/file.csv' 
WITH (FORMAT csv, HEADER true, DELIMITER ',');

Advanced COPY with Options:

COPY your_table (column1, column2, column3) 
FROM '/path/to/your/file.csv' 
WITH (
    FORMAT csv,
    HEADER true,
    DELIMITER ',',
    QUOTE '"',
    ESCAPE '"',
    NULL ''
);

PostgreSQL Import Examples

Example 1: Basic Data Import

-- Create table
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    department VARCHAR(50),
    salary DECIMAL(10,2),
    hire_date DATE
);

-- Import data
COPY employees (first_name, last_name, email, department, salary, hire_date)
FROM '/path/to/employees.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

Example 2: Handling Different Data Types

-- Create table with various data types
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10,2),
    in_stock BOOLEAN,
    tags TEXT[],
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Import with type conversion
COPY products (name, description, price, in_stock, tags, metadata)
FROM '/path/to/products.csv'
WITH (
    FORMAT csv,
    HEADER true,
    DELIMITER ',',
    QUOTE '"',
    NULL ''
);

Example 3: Import with Data Transformation

-- Create staging table
CREATE TEMP TABLE temp_import (
    raw_data TEXT
);

-- Import raw data
COPY temp_import FROM '/path/to/raw_data.csv'
WITH (FORMAT csv, HEADER true);

-- Transform and insert
INSERT INTO final_table (column1, column2, column3)
SELECT 
    SPLIT_PART(raw_data, ',', 1)::INTEGER,
    SPLIT_PART(raw_data, ',', 2),
    SPLIT_PART(raw_data, ',', 3)::DECIMAL(10,2)
FROM temp_import;

PostgreSQL Import Best Practices

1. Performance Optimization:

-- Disable autocommit for large imports
BEGIN;

-- Import data
COPY your_table FROM '/path/to/file.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

-- Create indexes after import
CREATE INDEX idx_column1 ON your_table (column1);
CREATE INDEX idx_column2 ON your_table (column2);

COMMIT;

2. Error Handling:

-- Import with error logging
CREATE TABLE import_errors (
    line_number INTEGER,
    error_message TEXT,
    raw_data TEXT
);

-- Use COPY with error handling
COPY your_table FROM '/path/to/file.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',')
ON CONFLICT (id) DO UPDATE SET
    column1 = EXCLUDED.column1,
    column2 = EXCLUDED.column2;

3. Parallel Processing:

-- Split large file and import in parallel
-- File 1
COPY your_table FROM '/path/to/file_part1.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

-- File 2
COPY your_table FROM '/path/to/file_part2.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

SQL Server CSV Import

Using BULK INSERT Command

Basic BULK INSERT:

BULK INSERT your_table
FROM 'C:\path\to\your\file.csv'
WITH (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    FIRSTROW = 2
);

Advanced BULK INSERT with Options:

BULK INSERT your_table
FROM 'C:\path\to\your\file.csv'
WITH (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    FIRSTROW = 2,
    FIELDQUOTE = '"',
    KEEPNULLS,
    TABLOCK
);

SQL Server Import Examples

Example 1: Basic Data Import

-- Create table
CREATE TABLE customers (
    id INT IDENTITY(1,1) PRIMARY KEY,
    name NVARCHAR(100) NOT NULL,
    email NVARCHAR(100) UNIQUE,
    phone NVARCHAR(20),
    created_at DATETIME2 DEFAULT GETDATE()
);

-- Import data
BULK INSERT customers
FROM 'C:\path\to\customers.csv'
WITH (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    FIRSTROW = 2
);

Example 2: Import with Data Transformation

-- Create staging table
CREATE TABLE temp_import (
    raw_data NVARCHAR(MAX)
);

-- Import raw data
BULK INSERT temp_import
FROM 'C:\path\to\raw_data.csv'
WITH (
    FIELDTERMINATOR = '\n',
    ROWTERMINATOR = '\n',
    FIRSTROW = 1
);

-- Transform and insert
INSERT INTO final_table (column1, column2, column3)
SELECT 
    PARSENAME(REPLACE(raw_data, ',', '.'), 3) AS column1,
    PARSENAME(REPLACE(raw_data, ',', '.'), 2) AS column2,
    PARSENAME(REPLACE(raw_data, ',', '.'), 1) AS column3
FROM temp_import;

-- Clean up
DROP TABLE temp_import;

Example 3: Import with Error Handling

-- Create error log table
CREATE TABLE import_errors (
    line_number INT,
    error_message NVARCHAR(MAX),
    raw_data NVARCHAR(MAX)
);

-- Import with error handling
BEGIN TRY
    BULK INSERT your_table
    FROM 'C:\path\to\file.csv'
    WITH (
        FIELDTERMINATOR = ',',
        ROWTERMINATOR = '\n',
        FIRSTROW = 2,
        ERRORFILE = 'C:\path\to\errors.txt'
    );
END TRY
BEGIN CATCH
    INSERT INTO import_errors (error_message)
    VALUES (ERROR_MESSAGE());
END CATCH;

SQL Server Import Best Practices

1. Performance Optimization:

-- Use TABLOCK for better performance
BULK INSERT your_table
FROM 'C:\path\to\file.csv'
WITH (
    FIELDTERMINATOR = ',',
    ROWTERMINATOR = '\n',
    FIRSTROW = 2,
    TABLOCK,
    BATCHSIZE = 10000
);

2. Memory Management:

-- Process large files in batches
DECLARE @batch_size INT = 10000;
DECLARE @current_row INT = 1;

WHILE @current_row < (SELECT COUNT(*) FROM your_table)
BEGIN
    -- Process batch
    BULK INSERT your_table
    FROM 'C:\path\to\file.csv'
    WITH (
        FIELDTERMINATOR = ',',
        ROWTERMINATOR = '\n',
        FIRSTROW = @current_row,
        LASTROW = @current_row + @batch_size
    );
    
    SET @current_row = @current_row + @batch_size;
END;

Programmatic Import Solutions

Python Solutions

MySQL Import with Python:

import pandas as pd
import mysql.connector
from sqlalchemy import create_engine

def import_csv_to_mysql(csv_file, table_name, connection_params):
    """Import CSV file to MySQL database"""
    # Read CSV file
    df = pd.read_csv(csv_file)
    
    # Create database connection
    engine = create_engine(f"mysql+pymysql://{connection_params['user']}:{connection_params['password']}@{connection_params['host']}/{connection_params['database']}")
    
    # Import to database
    df.to_sql(table_name, engine, if_exists='append', index=False)
    
    print(f"Successfully imported {len(df)} rows to {table_name}")

# Usage
connection_params = {
    'host': 'localhost',
    'database': 'mydb',
    'user': 'myuser',
    'password': 'mypassword'
}
import_csv_to_mysql('data.csv', 'my_table', connection_params)

PostgreSQL Import with Python:

import pandas as pd
import psycopg2
from sqlalchemy import create_engine

def import_csv_to_postgresql(csv_file, table_name, connection_params):
    """Import CSV file to PostgreSQL database"""
    # Read CSV file
    df = pd.read_csv(csv_file)
    
    # Create database connection
    engine = create_engine(f"postgresql://{connection_params['user']}:{connection_params['password']}@{connection_params['host']}/{connection_params['database']}")
    
    # Import to database
    df.to_sql(table_name, engine, if_exists='append', index=False)
    
    print(f"Successfully imported {len(df)} rows to {table_name}")

# Usage
connection_params = {
    'host': 'localhost',
    'database': 'mydb',
    'user': 'myuser',
    'password': 'mypassword'
}
import_csv_to_postgresql('data.csv', 'my_table', connection_params)

SQL Server Import with Python:

import pandas as pd
import pyodbc
from sqlalchemy import create_engine

def import_csv_to_sqlserver(csv_file, table_name, connection_params):
    """Import CSV file to SQL Server database"""
    # Read CSV file
    df = pd.read_csv(csv_file)
    
    # Create database connection
    engine = create_engine(f"mssql+pyodbc://{connection_params['user']}:{connection_params['password']}@{connection_params['server']}/{connection_params['database']}?driver=ODBC+Driver+17+for+SQL+Server")
    
    # Import to database
    df.to_sql(table_name, engine, if_exists='append', index=False)
    
    print(f"Successfully imported {len(df)} rows to {table_name}")

# Usage
connection_params = {
    'server': 'localhost',
    'database': 'mydb',
    'user': 'myuser',
    'password': 'mypassword'
}
import_csv_to_sqlserver('data.csv', 'my_table', connection_params)

Advanced Python Import with Error Handling

import pandas as pd
import logging
from sqlalchemy import create_engine
from sqlalchemy.exc import SQLAlchemyError

def robust_csv_import(csv_file, table_name, connection_string, chunk_size=10000):
    """Robust CSV import with error handling and chunking"""
    # Set up logging
    logging.basicConfig(level=logging.INFO)
    logger = logging.getLogger(__name__)
    
    try:
        # Create database connection
        engine = create_engine(connection_string)
        
        # Read CSV in chunks
        chunk_count = 0
        total_rows = 0
        
        for chunk in pd.read_csv(csv_file, chunksize=chunk_size):
            try:
                # Import chunk
                chunk.to_sql(table_name, engine, if_exists='append', index=False)
                chunk_count += 1
                total_rows += len(chunk)
                logger.info(f"Imported chunk {chunk_count}: {len(chunk)} rows")
                
            except SQLAlchemyError as e:
                logger.error(f"Error importing chunk {chunk_count}: {str(e)}")
                continue
        
        logger.info(f"Import complete: {total_rows} rows imported in {chunk_count} chunks")
        return True
        
    except Exception as e:
        logger.error(f"Fatal error during import: {str(e)}")
        return False

# Usage
connection_string = "mysql+pymysql://user:password@localhost/database"
success = robust_csv_import('large_file.csv', 'my_table', connection_string, 5000)

Best Practices for CSV to Database Import

Data Preparation

1. Data Cleaning:

  • Remove empty rows and columns
  • Handle missing values appropriately
  • Validate data types and formats
  • Check for duplicate records

2. Schema Design:

  • Choose appropriate data types
  • Set up proper constraints
  • Create necessary indexes
  • Plan for future growth

3. Performance Optimization:

  • Use appropriate import methods
  • Consider batch processing for large files
  • Disable indexes during import
  • Use parallel processing when possible

Error Handling

1. Validation:

  • Check data integrity before import
  • Validate against business rules
  • Handle constraint violations
  • Log and report errors

2. Recovery:

  • Implement rollback procedures
  • Create backup strategies
  • Plan for partial failures
  • Monitor import progress

3. Monitoring:

  • Track import performance
  • Monitor system resources
  • Set up alerts for failures
  • Maintain audit trails

Common Issues and Solutions

Issue 1: Data Type Mismatches

Problem: CSV data doesn't match expected database types

Solutions:

  • Use data type conversion functions
  • Create staging tables for transformation
  • Implement validation rules
  • Use appropriate import tools

Issue 2: Performance Issues

Problem: Import is too slow or causes timeouts

Solutions:

  • Use batch processing
  • Optimize database settings
  • Consider parallel processing
  • Use appropriate import methods

Issue 3: Memory Constraints

Problem: Large files cause memory issues

Solutions:

  • Use streaming import methods
  • Process files in chunks
  • Optimize memory usage
  • Consider database-specific optimizations

Issue 4: Character Encoding Issues

Problem: Special characters not displaying correctly

Solutions:

  • Use UTF-8 encoding consistently
  • Handle BOM characters properly
  • Configure database character sets
  • Validate encoding before import

Conclusion

Importing CSV files into databases is a critical skill that requires understanding both the data and the target database system. The methods we've covered—native SQL commands, programmatic solutions, and best practices—each have their strengths and are suitable for different scenarios.

Choose Native SQL Commands when:

  • Working with standard database systems
  • Need maximum performance
  • Have direct database access
  • Want to leverage database-specific features

Choose Programmatic Solutions when:

  • Need custom data transformation
  • Working with multiple database systems
  • Want to automate the process
  • Need advanced error handling

Choose Hybrid Approaches when:

  • Working with complex data structures
  • Need both performance and flexibility
  • Want to leverage multiple tools
  • Have specific business requirements

Remember that successful CSV to database imports require careful planning, proper data preparation, and thorough testing. Always validate your results and implement proper error handling to ensure data integrity throughout the process.

For more CSV data processing tools and guides, explore our CSV Tools Hub or try our CSV to Excel Converter for data preparation.

Related posts