Fetch and extract content from URLs

curl --request POST \
  --url https://api.fetch.tinyfish.ai/ \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <api-key>' \
  --data '
{
  "urls": [
    "https://example.com",
    "https://example.org"
  ],
  "format": "markdown",
  "include_html_head": false,
  "links": false,
  "image_links": false,
  "ttl": 0
}
'

{
  "results": [
    {
      "url": "https://example.com",
      "final_url": "https://www.example.com",
      "title": "Example Domain",
      "description": "This domain is for use in illustrative examples.",
      "language": "en",
      "format": "markdown",
      "text": "<string>",
      "author": "John Doe",
      "published_date": "2024-01-15",
      "links": [
        "<string>"
      ],
      "image_links": [
        "<string>"
      ],
      "latency_ms": 1183.4
    }
  ],
  "errors": [
    {
      "url": "https://invalid.example.com",
      "error": "target_http_error",
      "status": 404
    }
  ]
}

API Reference

Fetch and extract content from URLs

Renders web pages using a real browser (including JavaScript-heavy sites) and returns clean extracted content in your preferred format. Submit up to 10 URLs, get back structured content. Per-URL failures appear in errors[] and do not fail the entire request.

Per-URL error codes (in errors[].error):

target_http_error — target server returned a non-2xx HTTP status; the raw status code is in errors[].status
target_unreachable — connection refused, TLS failure, DNS failure, or other network error
timeout — request timed out
proxy_error — proxy tunnel failure
bot_blocked — bot-challenge page detected (Cloudflare, etc.)
empty_content — page loaded but no extractable text was found
invalid_url — malformed URL or SSRF-blocked address
invalid_redirect_url — redirect target rejected before fetch

POST

curl --request POST \
  --url https://api.fetch.tinyfish.ai/ \
  --header 'Content-Type: application/json' \
  --header 'X-API-Key: <api-key>' \
  --data '
{
  "urls": [
    "https://example.com",
    "https://example.org"
  ],
  "format": "markdown",
  "include_html_head": false,
  "links": false,
  "image_links": false,
  "ttl": 0
}
'

{
  "results": [
    {
      "url": "https://example.com",
      "final_url": "https://www.example.com",
      "title": "Example Domain",
      "description": "This domain is for use in illustrative examples.",
      "language": "en",
      "format": "markdown",
      "text": "<string>",
      "author": "John Doe",
      "published_date": "2024-01-15",
      "links": [
        "<string>"
      ],
      "image_links": [
        "<string>"
      ],
      "latency_ms": 1183.4
    }
  ],
  "errors": [
    {
      "url": "https://invalid.example.com",
      "error": "target_http_error",
      "status": 404
    }
  ]
}

Authorizations

X-API-Key

string

header

required

API key for authentication. Get your key from the API Keys page.

Body

application/json

URLs to fetch and extraction options

urls

string<uri>[]

required

Array of URLs to fetch (1-10). All URLs are fetched in parallel. Each URL is processed independently — if one fails, others still return successfully. Errors are reported per-URL in the errors array.

Required array length: 1 - 10 elements

Example:

[
  "https://example.com",
  "https://example.org"
]

format

enum<string>

default:markdown

Output format for extracted content. "markdown" (default) is ideal for LLM consumption. "html" returns cleaned semantic HTML. "json" returns a structured document tree.

Available options:

markdown,

html,

json

Example:

"markdown"

include_html_head

boolean

default:false

When true and format is "html", return a complete HTML document with and . The injected head contains curated content metadata when available.

Example:

false

links

boolean

default:false

Extract all outbound links () from each page. Useful for discovering related pages or navigating to specific content. Links are returned as absolute URLs in the links array of each result. [blocked]

Example:

false

image_links

boolean

default:false

Extract all image URLs ([Image blocked: No description]) from each page. Useful for finding visual content or media assets. Image links are returned as absolute URLs in the image_links array of each result.

Example:

false

ttl

integer

Caller freshness tolerance in seconds for the cached entry. Omit (default) for unlimited tolerance — any cached entry is acceptable. Set to 0 to prefer a live fetch; a cached entry is still served if the origin's Cache-Control: max-age covers its age, or the host is in the small allowlist of operator-pinned never-expire domains. Set to N > 0 to accept a cached entry whose age is below N; the upstream Cache-Control: max-age and the never-expire allowlist may extend (never shorten) this tolerance.

Required range: x >= 0

Example:

0

Response

Fetch completed. Check errors[] for any per-URL failures.

Fetch response with results and errors

results

object[]

required

Successfully fetched URLs

Show child attributes

errors

object[]

required

URLs that failed to fetch

Show child attributes

Sync vault items List fetch usage

Documentation Index

Authorizations

Body

Response