WebSearchAPI.ai Scraper API Reference

Docs

Complete reference for the WebSearchAPI.ai Scraper endpoint with advanced content extraction capabilities

WebSearchAPI.ai offers a powerful web scraping API built for developers who need full control over content extraction. Extract clean, structured content from any URL with browser rendering, CSS selectors, JavaScript injection, and AI-optimized markdown formatting perfect for LLMs.

Introduction

The WebSearchAPI.ai Scraper API allows you to:

Extract and parse content from any URL with AI-optimized formatting
Use full browser rendering for JavaScript-heavy websites
Target specific content with CSS selectors
Inject custom JavaScript before scraping
Generate AI-powered alt text for images
Customize markdown output formatting
Extract links and images with summaries
Control privacy and caching behavior

Endpoint Details

POST

/scrape

Base URL: https://api.websearchapi.ai

Full Endpoint: https://api.websearchapi.ai/scrape

Authentication

Authentication Required All API requests require authentication using your API key in the Authorization header.

Include your API key in the Authorization header using the Bearer token format:

Authorization: Bearer YOUR_API_KEY

You can obtain an API key by signing up for a WebSearchAPI.ai account. Each account receives 1,000 free API credits monthly.

Request Format

All requests to the Scraper API should be made as HTTP POST requests with a JSON body containing the scraping parameters.

Required Headers

Header	Value
`Content-Type`	`application/json`
`Authorization`	`Bearer YOUR_API_KEY`

Request Body

The request body should be a JSON object containing your scraping parameters:

{
  "url": "https://example.com",
  "returnFormat": "markdown",
  "engine": "browser",
  "targetSelector": "article, main",
  "removeSelector": "header, footer, nav, .ads",
  "withLinksSummary": true,
  "withImagesSummary": true
}

Request Parameters

Example Requests

curl -X POST https://api.websearchapi.ai/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example.com/article",
  "returnFormat": "markdown",
  "engine": "browser",
  "targetSelector": "article, main",
  "removeSelector": "header, footer, nav, .ads",
  "withLinksSummary": true,
  "withImagesSummary": true,
  "withGeneratedAlt": true,
  "timeout": 15
}'

Response Format

Success Response (200 OK)

A successful API call returns a JSON object with the following structure:

{
  "code": 200,
  "data": {
    "title": "Example Article Title",
    "url": "https://example.com/article",
    "content": "# Example Article Title\n\nThis is the extracted content in markdown format...",
    "links": {
      "https://example.com/page1": "Page 1 Title",
      "https://example.com/page2": "Page 2 Title"
    },
    "images": {
      "https://example.com/image1.jpg": "AI-generated alt text for image 1",
      "https://example.com/image2.jpg": "AI-generated alt text for image 2"
    }
  }
}

Response Fields

Error Responses

WebSearchAPI.ai returns standard HTTP status codes with JSON error details:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid request parameters",
    "details": {
      "url": "Invalid URL format"
    }
  }
}

Returned when the request contains invalid or missing parameters.

Rate Limits and Quotas

WebSearchAPI.ai uses a credit-based system for API usage. Each account receives 1,000 free API credits monthly. Scraper API credit usage varies based on features used:

Operation	Credits
Basic scraping (direct engine)	1 credit
Browser rendering	2 credits
With link/image summaries	+1 credit
With AI-generated alt text	+1 credit
Screenshot/pageshot	2 credits

Pro Tip: Optimize Credit Usage Use the direct engine for simple, static pages to save credits. Reserve browser and cf-browser-rendering engines for JavaScript-heavy sites. Enable AI features like withGeneratedAlt only when needed for accessibility or LLM processing.

Advanced Features

Browser Rendering Engines

The Scraper API offers three rendering engines optimized for different use cases:

Engine	Best For	Speed	JavaScript Support
`direct`	Static HTML pages, simple sites	Fastest	None
`browser`	JavaScript-heavy sites, SPAs	Medium	Full
`cf-browser-rendering`	Complex SPAs, anti-bot protection	Slower	Full + Advanced

Choosing the Right Engine

Direct Engine - Perfect for blogs, documentation sites, and static content. Fastest response time with minimal credit usage.

Browser Engine - Ideal for modern websites with JavaScript rendering. Handles React, Vue, and Angular applications.

CF Browser Rendering - Advanced rendering for complex SPAs and sites with anti-bot measures. Best for challenging scraping scenarios.

CSS Selectors for Precise Extraction

Use targetSelector and removeSelector to focus on the content you need:

{
  "url": "https://example.com/article",
  "targetSelector": "article, main, .content",
  "removeSelector": "header, footer, nav, .ads, .sidebar, .comments"
}

This approach:

Reduces noise in extracted content
Saves tokens for LLM processing
Improves content quality
Focuses on relevant information

JavaScript Injection

Execute custom JavaScript before scraping to manipulate the DOM:

{
  "url": "https://example.com",
  "engine": "browser",
  "injectPageScript": "document.querySelector('.cookie-banner')?.remove(); document.querySelector('.newsletter-popup')?.remove();"
}

Common use cases:

Remove popups and overlays
Trigger lazy-loaded content
Manipulate page elements
Extract data from JavaScript variables

AI-Enhanced Features

Generated Alt Text

Enable AI-powered alt text generation for images:

{
  "url": "https://example.com",
  "withImagesSummary": true,
  "withGeneratedAlt": true
}

This feature:

Generates descriptive alt text for images
Improves accessibility
Enhances LLM understanding of visual content
Useful for RAG applications

ReaderLM-v2 Processing

Use advanced AI processing for better content extraction:

{
  "url": "https://example.com",
  "respondWith": "readerlm-v2"
}

Benefits:

Superior HTML-to-Markdown conversion
Better preservation of complex structures
Enhanced table and list formatting
Optimized for LLM consumption

Markdown Customization

Customize markdown output to match your preferences:

{
  "url": "https://example.com",
  "returnFormat": "markdown",
  "mdHeadingStyle": "atx",
  "mdBulletListMarker": "-",
  "mdEmDelimiter": "*",
  "mdStrongDelimiter": "**",
  "mdLinkStyle": "inline"
}

Perfect for:

Maintaining consistent documentation styles
Matching existing markdown conventions
Optimizing for specific markdown parsers
Personal formatting preferences

Privacy and Compliance

Control privacy settings for GDPR compliance:

{
  "url": "https://example.com",
  "dnt": true,
  "noCache": true,
  "proxy": "eu"
}

Privacy features:

dnt: Sends Do Not Track header
noCache: Bypasses cache for fresh data
proxy: Use EU-based proxies for GDPR compliance
setCookie: Custom cookie settings for authenticated content

Use Cases

RAG Applications

Extract clean, token-optimized content for retrieval-augmented generation systems. Perfect for building knowledge bases and AI assistants.

Knowledge Base Building

Scrape documentation sites and build comprehensive knowledge graphs with link extraction and structured content.

Content Migration

Migrate content from legacy systems to modern platforms with preserved formatting and structure.

News Aggregation

Extract articles from news sites with clean formatting, removing ads and navigation clutter automatically.

Academic Research

Extract citations, references, and structured data from research papers and academic websites.

E-commerce Data

Extract product information, prices, reviews, and inventory status from e-commerce platforms.

Market Intelligence

Monitor competitor websites for pricing changes, product launches, and content updates.

SEO Analysis

Extract page content, metadata, and structure for SEO audits and competitive analysis.

Compliance Monitoring

Track legal and regulatory websites for updates and changes to compliance requirements.

Best Practices

1. Choose the Right Engine

Start with direct engine and upgrade to browser only when needed:

// Try direct first
let payload = { url: targetUrl, engine: 'direct' };
 
// If content is missing, retry with browser
if (contentIncomplete) {
  payload.engine = 'browser';
}

2. Use Selectors Wisely

Target specific content to reduce noise and save tokens:

{
  "targetSelector": "article, main, .post-content",
  "removeSelector": "header, footer, nav, aside, .ads, .related-posts"
}

3. Optimize Token Budget

Set token limits for LLM optimization:

{
  "tokenBudget": 10000,
  "retainImages": "none"
}

4. Handle Errors Gracefully

Always implement retry logic with exponential backoff:

async function scrapeWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch(apiUrl, options);
      if (response.ok) return await response.json();
 
      if (response.status === 429) {
        // Rate limited, wait and retry
        await sleep(Math.pow(2, i) * 1000);
        continue;
      }
 
      throw new Error(`HTTP ${response.status}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
    }
  }
}

5. Monitor Credit Usage

Track credits to avoid unexpected charges:

const creditsConsumed = response.headers.get('X-Credits-Consumed');
const creditsRemaining = response.headers.get('X-Credits-Remaining');
 
if (creditsRemaining < 100) {
  console.warn('Low credits! Consider upgrading.');
}

Web Search API

WebSearchAPI.ai Scraper API Reference

Introduction

Endpoint Details

Authentication

Request Format

Required Headers

Request Body

Request Parameters

Essential Parameters

Timing & Performance

Content Selection

Viewport & Screenshots

JavaScript Injection

Media Extraction

Markdown Formatting

Advanced Settings

Example Requests

Response Format

Success Response (200 OK)

Response Fields

Code

Data Object

Response Headers

Error Responses

Rate Limits and Quotas

Advanced Features

Browser Rendering Engines

Choosing the Right Engine

CSS Selectors for Precise Extraction

JavaScript Injection

AI-Enhanced Features

Markdown Customization

Privacy and Compliance

Use Cases

RAG Applications

Knowledge Base Building

Content Migration

News Aggregation

Academic Research

E-commerce Data

Market Intelligence

SEO Analysis

Compliance Monitoring

Best Practices

1. Choose the Right Engine

2. Use Selectors Wisely

3. Optimize Token Budget

4. Handle Errors Gracefully

5. Monitor Credit Usage