CSV Surgeon
Clean, normalize, and transform CSV data. Auto-detect delimiters, remove duplicates, normalize dates to ISO 8601.
/api/clean-csv
What CSV Surgeon Does
CSV Surgeon is an intelligent data cleaning API designed to handle the most common CSV and TSV data quality issues. Whether you're dealing with inconsistent formatting, mixed delimiters, duplicate records, or poorly named headers, this API automates the cleanup process.
The API processes your raw data and returns a cleaned version along with metadata about the transformation, including row counts and the number of duplicates removed.
Key Features
- Auto-delimiter detection — Automatically identifies comma, tab, semicolon, or pipe separators
- Date normalization — Converts various date formats to ISO 8601 standard
- Header standardization — Transforms column names to snake_case
- Duplicate removal — Optional deduplication with configurable behavior
- Whitespace cleanup — Trims leading/trailing spaces and collapses excessive internal whitespace in all fields
Code Examples
curl -X POST https://api.atomicapis.dev/api/clean-csv \\
-H "X-RapidAPI-Proxy-Secret: YOUR_SECRET" \\
-H "Content-Type: application/json" \\
-d '{
"csv": "Name,Email,Join Date\\nJohn Doe,[email protected],01/15/2023\\nJane Smith,[email protected],2023-02-20\\nJohn Doe,[email protected],01/15/2023",
"deduplicate": true,
"normalizeDates": true,
"normalizeHeaders": true
}'
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
csv |
string | Yes | The raw CSV/TSV content to clean. |
deduplicate |
boolean | No | Whether to remove duplicate rows. Default: true |
normalizeDates |
boolean | No | Normalize date fields to ISO 8601 (YYYY-MM-DD). Default: true |
normalizeHeaders |
boolean | No | Convert header names to snake_case. Default: true |
collapseWhitespace |
boolean | No | Collapse extra whitespace in fields. Default: true |
outputDelimiter |
string | No | Delimiter character for the output CSV (e.g. ,, \t, ;, |). Defaults to the auto-detected input delimiter. |
Supported Delimiters
The input delimiter is always auto-detected. Use outputDelimiter to convert to a different format:
,
Comma
\t
Tab
;
Semicolon
|
Pipe
Response Format
{
"csv": "name,email,join_date\nJohn Doe,[email protected],2023-01-15\nJane Smith,[email protected],2023-02-20",
"stats": {
"inputRows": 3,
"outputRows": 2,
"duplicatesRemoved": 1,
"emptyRowsRemoved": 0,
"fieldsCleaned": 2,
"rowsPadded": 0,
"rowsTruncated": 0
}
}
Response Fields
| Field | Type | Description |
|---|---|---|
csv |
string | The cleaned and normalized CSV content |
stats |
object | Statistics about the cleaning operation |
stats.inputRows |
integer | Number of rows in the original input |
stats.outputRows |
integer | Number of rows in the cleaned output |
stats.duplicatesRemoved |
integer | Number of duplicate rows removed |
stats.emptyRowsRemoved |
integer | Number of empty rows removed |
stats.fieldsCleaned |
integer | Number of fields that were cleaned or normalized |
stats.rowsPadded |
integer | Number of rows padded with missing columns |
stats.rowsTruncated |
integer | Number of rows truncated to match header count |
Use Cases
Data Migration
Clean and normalize legacy CSV exports before importing into modern databases or data warehouses.
CRM Import Preparation
Standardize contact lists from various sources before importing into Salesforce, HubSpot, or other CRMs.
Analytics Pipeline
Pre-process raw data files before feeding them into analytics tools like Tableau, Power BI, or custom dashboards.
Build Constraints
Auto-Detect Delimiter
The API automatically detects the delimiter used in the input file by analyzing the first line. Supports comma (,), tab (\t), semicolon (;), and pipe (|) separators. You can override the output delimiter via the outputDelimiter parameter.
Date Normalization to ISO 8601
All date values are automatically detected and normalized to ISO 8601 format (YYYY-MM-DD). Supports common formats including MM/DD/YYYY, DD-MM-YYYY, DD.MM.YYYY, YYYY/MM/DD, named months (e.g. "Jan 15, 2023"), and more.
Header snake_case Conversion
Column headers are automatically converted to snake_case format. For example, "First Name" becomes first_name, and "EmailAddress" becomes email_address. This ensures consistency and compatibility with most database and programming conventions.
Error Codes
| Code | Status | Description |
|---|---|---|
400 |
Bad Request | Missing or invalid parameters, empty CSV, malformed data, or content exceeds the 5MB size limit |
401 |
Unauthorized | Invalid or missing API key |
429 |
Too Many Requests | Rate limit exceeded |
500 |
Server Error | Internal server error - contact support |
MCP Integration MCP Ready
What is MCP?
Model Context Protocol (MCP) allows AI assistants like Claude to call this API as a native tool during conversation. Instead of writing HTTP requests, the AI invokes the tool directly — no API keys or boilerplate needed on the client side.
Tool Details
CsvSurgeonTools
CleanCsv()
Description
Cleans, normalizes, and deduplicates CSV data