Web-to-Markdown

Scrapes URLs and returns clean Markdown for RAG/LLM contexts. Zero configuration. Sub-second response.

POST /api/extract

Data Processing

~50ms avg latency

API Key auth

99.9% uptime

Description

Extracts clean, readable Markdown from any web page URL. Strips non-essential HTML elements (navigation, footer, scripts, styles, iframes) and converts the main content to GitHub-flavored Markdown optimized for LLM and RAG contexts. Includes SSRF protection that blocks private IP ranges.

Key Features

Extracts clean, readable Markdown from any web page
Strips non-essential HTML tags to optimize for LLM token usage
Preserves document structure with headings, lists, and links
SSRF protection blocking private IP ranges
5MB response size limit and 15-second timeout
Strips nav, footer, script, style, noscript, iframe, svg, header, and aside tags

Code Examples

                curl -X POST https://api.atomicapis.dev/api/extract \
  -H "X-RapidAPI-Proxy-Secret: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article"
  }'
              

                const response = await fetch('https://api.atomicapis.dev/api/extract', {
  method: 'POST',
  headers: {
    'X-RapidAPI-Proxy-Secret': 'YOUR_SECRET',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/article'
  })
});

const data = await response.json();
console.log(data);
              

                import requests

response = requests.post(
    'https://api.atomicapis.dev/api/extract',
    headers={
        'X-RapidAPI-Proxy-Secret': 'YOUR_SECRET',
        'Content-Type': 'application/json'
    },
    json={
        'url': 'https://example.com/article'
    }
)

data = response.json()
print(data)
              

                using System.Net.Http.Json;

var client = new HttpClient();
client.DefaultRequestHeaders.Add("X-RapidAPI-Proxy-Secret", "YOUR_SECRET");

var request = new
{
    url = "https://example.com/article"
};

var response = await client.PostAsJsonAsync(
    "https://api.atomicapis.dev/api/extract", 
    request
);

var data = await response.Content.ReadFromJsonAsync<object>();
Console.WriteLine(data);
              

Response

{
  "markdown": "# Understanding Modern Web Development\n\nWeb development has evolved significantly over the past decade..."
}

Parameters

Name	Type	Required	Description
`url`	string	Yes	The URL to extract content from. Must be an absolute HTTP or HTTPS URL.

Response Schema

JSON Schema

{
  "type": "object",
  "properties": {
    "markdown": {
      "type": "string",
      "description": "The extracted page content converted to GitHub-flavored Markdown"
    }
  }
}

Use Cases

RAG Pipeline Ingestion

Feed web content directly into your Retrieval-Augmented Generation pipelines with clean, token-optimized Markdown.

LLM Context Preparation

Prepare web articles and documentation as structured context for large language model prompts.

Documentation Migration

Migrate existing web-based documentation to Markdown-based systems like Docusaurus or MkDocs.

Build Constraints

Implementation Notes

This API uses specific techniques to optimize content extraction for LLM consumption:

Strips non-essential HTML tags to save tokens
Uses ReverseMarkdown library for HTML-to-Markdown conversion
Implements TrimmerRootAssembly for aggressive content cleaning

MCP Integration MCP Ready

What is MCP?

Model Context Protocol (MCP) allows AI assistants like Claude to call this API as a native tool during conversation. Instead of writing HTTP requests, the AI invokes the tool directly — no API keys or boilerplate needed on the client side.

Tool Details

Tool Class

WebToMarkdownTools

Method

ExtractMarkdown()

Description

Scrapes a URL and converts page content to clean Markdown for RAG/LLM contexts