For AI agent buildersPreview · in development

Structured web data your agent tools can trust

Multi-provider scraping orchestration with LLM-based extraction, automatic fallback chains, and confidence-routed model selection.

This is a teaser of an upcoming integration. The API shape, SDKs, and docs below reflect our in-development direction. Join the waitlist to get early access and shape the final API.

Join the waitlist Read the integration guide

What you get

Key capabilities

Schema-enforced LLM extraction

Define your output shape once. We send cleaned HTML and your schema to an LLM and return JSON validated against it. Malformed extractions are quarantined, not silently returned.

Multi-provider fallback

One request routes across 8 scraping providers (Firecrawl, Jina, Brightdata, Zyte, Scrapingbee, Oxylabs, ScraperAPI, Apify). If one fails or rate-limits, the next fires automatically. The response includes the provider chain so your agent can log or react.

Confidence-based model routing

Cheap models first (GPT-4o mini, Claude Haiku). When confidence drops below your threshold, we escalate to a stronger model automatically. Control cost per request while protecting output quality.

Agent-ready JSON output

Structured output that maps directly to your agent framework's tool parameters or context window. Any framework that can call an HTTPS endpoint works: LangChain tools, CrewAI tasks, OpenAI function calling, Vercel AI SDK.

API preview

Extract structured data for an agent tool

Illustrative shape of the upcoming API. Endpoints and field names may change before GA.

tsPreview

export async function getProduct(url: string) {
  const schema = {
    type: "object",
    properties: {
      productName: { type: "string" },
      price: { type: "number" },
      inStock: { type: "boolean" },
      reviews: { type: "array", items: { type: "string" } },
    },
    required: ["productName", "price"],
  } as const

  const res = await fetch("https://api.webscraping.app/v1/extract", {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.WEBSCRAPING_API_KEY}` },
    body: JSON.stringify({
      url,
      schema,
      modelRouting: { cheapFirst: true, confidenceThreshold: 0.85 },
    }),
  })

  const { data, confidence, provider, providerChain } = await res.json()

  // Surface the fallback chain so the agent can log or react to provider switches
  return { content: data, metadata: { confidence, provider, providerChain } }
}

Integrations

Tested framework integrations

LangChain

Use as a Tool via the REST endpoint

OpenAI Agents SDK

Wire into function calling

Vercel AI SDK

Drop into tool definitions

CrewAI

Expose as a Task tool

Shape the API with us

Join the waitlist. Early adopters get direct input on the API surface before GA.

Join the waitlist Read the integration guide