FireCrawl
Load data from URL using FireCrawl.
Last updated
Load data from URL using FireCrawl.
Last updated
FireCrawl is a powerful web crawling and scraping service that provides advanced capabilities for extracting content from websites. This module enables loading and processing web content through the FireCrawl API.
This module provides a sophisticated web crawler that can:
Scrape single web pages
Crawl entire websites
Extract structured data
Handle JavaScript-rendered content
Process content with text splitters
Customize metadata extraction
Support multiple operation modes
URL: The webpage or website URL to process
Connect Credential: FireCrawl API credentials
Mode: Choose between:
Scrape: Single page extraction
Crawl: Multi-page website crawling
Extract: Structured data extraction
Text Splitter: A text splitter to process the extracted content
Scrape Options:
Include Tags: HTML tags to include
Exclude Tags: HTML tags to exclude
Mobile: Use mobile user agent
Skip TLS Verification: Bypass SSL checks
Timeout: Request timeout
Additional Metadata: JSON object with additional metadata
Omit Metadata Keys: Comma-separated list of metadata keys to omit
Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of documents
Multiple operation modes
Advanced scraping options
Structured data extraction
JavaScript rendering
Mobile device emulation
Custom timeout settings
Error handling
Single page processing
Main content extraction
Format selection
Custom tag filtering
Multi-page crawling
Subdomain handling
Sitemap processing
Link extraction
Structured data extraction
Schema-based parsing
LLM-powered extraction
Custom extraction prompts
Each document contains:
pageContent: Extracted content in markdown format
metadata:
title: Page title
description: Meta description
language: Content language
sourceURL: Original URL
Additional custom metadata
Requires valid FireCrawl API key
Supports multiple content formats
Handles rate limiting
Job status monitoring
Error handling and retries
Customizable request options
Memory-efficient processing