This is a teaser of an upcoming integration. The API shape, SDKs, and docs below reflect our in-development direction. Join the waitlist to get early access and shape the final API.
What you get
Define your output shape once. We send cleaned HTML and your schema to an LLM and return JSON validated against it. Malformed extractions are quarantined, not silently returned.
One request routes across 8 scraping providers (Firecrawl, Jina, Brightdata, Zyte, Scrapingbee, Oxylabs, ScraperAPI, Apify). If one fails or rate-limits, the next fires automatically. The response includes the provider chain so your agent can log or react.
Cheap models first (GPT-4o mini, Claude Haiku). When confidence drops below your threshold, we escalate to a stronger model automatically. Control cost per request while protecting output quality.
Structured output that maps directly to your agent framework's tool parameters or context window. Any framework that can call an HTTPS endpoint works: LangChain tools, CrewAI tasks, OpenAI function calling, Vercel AI SDK.
API preview
export async function getProduct(url: string) {
const schema = {
type: "object",
properties: {
productName: { type: "string" },
price: { type: "number" },
inStock: { type: "boolean" },
reviews: { type: "array", items: { type: "string" } },
},
required: ["productName", "price"],
} as const
const res = await fetch("https://api.webscraping.app/v1/extract", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.WEBSCRAPING_API_KEY}` },
body: JSON.stringify({
url,
schema,
modelRouting: { cheapFirst: true, confidenceThreshold: 0.85 },
}),
})
const { data, confidence, provider, providerChain } = await res.json()
// Surface the fallback chain so the agent can log or react to provider switches
return { content: data, metadata: { confidence, provider, providerChain } }
}
Integrations
Use as a Tool via the REST endpoint
Wire into function calling
Drop into tool definitions
Expose as a Task tool