CSV Surgeon

Clean, normalize, and transform CSV data. Auto-detect delimiters, remove duplicates, normalize dates to ISO 8601.

POST /api/clean-csv
Data Processing
~30ms avg latency
API Key auth
5MB max input

What CSV Surgeon Does

CSV Surgeon is an intelligent data cleaning API designed to handle the most common CSV and TSV data quality issues. Whether you're dealing with inconsistent formatting, mixed delimiters, duplicate records, or poorly named headers, this API automates the cleanup process.

The API processes your raw data and returns a cleaned version along with metadata about the transformation, including row counts and the number of duplicates removed.

Key Features

  • Auto-delimiter detection — Automatically identifies comma, tab, semicolon, or pipe separators
  • Date normalization — Converts various date formats to ISO 8601 standard
  • Header standardization — Transforms column names to snake_case
  • Duplicate removal — Optional deduplication with configurable behavior
  • Whitespace cleanup — Trims leading/trailing spaces and collapses excessive internal whitespace in all fields

Code Examples

curl -X POST https://api.atomicapis.dev/api/clean-csv \\
  -H "X-RapidAPI-Proxy-Secret: YOUR_SECRET" \\
  -H "Content-Type: application/json" \\
  -d '{
    "csv": "Name,Email,Join Date\\nJohn Doe,[email protected],01/15/2023\\nJane Smith,[email protected],2023-02-20\\nJohn Doe,[email protected],01/15/2023",
    "deduplicate": true,
    "normalizeDates": true,
    "normalizeHeaders": true
  }'

Request Parameters

Parameter Type Required Description
csv string Yes The raw CSV/TSV content to clean.
deduplicate boolean No Whether to remove duplicate rows. Default: true
normalizeDates boolean No Normalize date fields to ISO 8601 (YYYY-MM-DD). Default: true
normalizeHeaders boolean No Convert header names to snake_case. Default: true
collapseWhitespace boolean No Collapse extra whitespace in fields. Default: true
outputDelimiter string No Delimiter character for the output CSV (e.g. ,, \t, ;, |). Defaults to the auto-detected input delimiter.

Supported Delimiters

The input delimiter is always auto-detected. Use outputDelimiter to convert to a different format:

,

Comma

\t

Tab

;

Semicolon

|

Pipe

Response Format

200 OK - Success Response
{
  "csv": "name,email,join_date\nJohn Doe,[email protected],2023-01-15\nJane Smith,[email protected],2023-02-20",
  "stats": {
    "inputRows": 3,
    "outputRows": 2,
    "duplicatesRemoved": 1,
    "emptyRowsRemoved": 0,
    "fieldsCleaned": 2,
    "rowsPadded": 0,
    "rowsTruncated": 0
  }
}

Response Fields

Field Type Description
csv string The cleaned and normalized CSV content
stats object Statistics about the cleaning operation
stats.inputRows integer Number of rows in the original input
stats.outputRows integer Number of rows in the cleaned output
stats.duplicatesRemoved integer Number of duplicate rows removed
stats.emptyRowsRemoved integer Number of empty rows removed
stats.fieldsCleaned integer Number of fields that were cleaned or normalized
stats.rowsPadded integer Number of rows padded with missing columns
stats.rowsTruncated integer Number of rows truncated to match header count

Use Cases

Data Migration

Clean and normalize legacy CSV exports before importing into modern databases or data warehouses.

CRM Import Preparation

Standardize contact lists from various sources before importing into Salesforce, HubSpot, or other CRMs.

Analytics Pipeline

Pre-process raw data files before feeding them into analytics tools like Tableau, Power BI, or custom dashboards.

Build Constraints

Auto-Detect Delimiter

The API automatically detects the delimiter used in the input file by analyzing the first line. Supports comma (,), tab (\t), semicolon (;), and pipe (|) separators. You can override the output delimiter via the outputDelimiter parameter.

Date Normalization to ISO 8601

All date values are automatically detected and normalized to ISO 8601 format (YYYY-MM-DD). Supports common formats including MM/DD/YYYY, DD-MM-YYYY, DD.MM.YYYY, YYYY/MM/DD, named months (e.g. "Jan 15, 2023"), and more.

Header snake_case Conversion

Column headers are automatically converted to snake_case format. For example, "First Name" becomes first_name, and "EmailAddress" becomes email_address. This ensures consistency and compatibility with most database and programming conventions.

Error Codes

Code Status Description
400 Bad Request Missing or invalid parameters, empty CSV, malformed data, or content exceeds the 5MB size limit
401 Unauthorized Invalid or missing API key
429 Too Many Requests Rate limit exceeded
500 Server Error Internal server error - contact support

MCP Integration MCP Ready

What is MCP?

Model Context Protocol (MCP) allows AI assistants like Claude to call this API as a native tool during conversation. Instead of writing HTTP requests, the AI invokes the tool directly — no API keys or boilerplate needed on the client side.

Tool Details

Tool Class
CsvSurgeonTools
Method
CleanCsv()

Description

Cleans, normalizes, and deduplicates CSV data