Enhanced Crawl Reliability and Throughput
This update focuses on improving success rates for sites with advanced protections and increasing system capacity for demanding browser tasks. These optimizations ensure a more stable experience and better performance during peak scraping volumes.
The Python SDK now supports direct tier selection and proactive warnings for usage limits. These tools empower you to better manage your budget and optimize request costs directly from your code.
We have optimized resource allocation to double the number of concurrent requests the platform can handle. Your batch jobs and large-scale scraping tasks will now complete significantly faster.
The crawler now automatically retries pages that fail due to temporary connection issues or system blocks. This ensures that outbound links are always captured, allowing your crawls to reach their full intended depth without missing pages.
New Features
3SDK cost control features
The Python SDK now supports direct tier selection and proactive warnings for usage limits. These tools empower you to better manage your budget and optimize request costs directly from your code.
Real-time Usage Warnings
API responses now include proactive notifications when you are approaching your account's capacity limits. This allows your application to manage scraping volume dynamically and avoid hitting usage caps unexpectedly.
Enhanced SDK Scraper Controls
You can now explicitly select specific performance tiers through the SDKs to better balance request costs and success rates. Your API responses will also now include clear warnings when your scraping activity approaches plan-specific volume caps.
Improvements
9Granular challenge diagnostics
API responses now provide more specific naming for various bot protection challenges encountered during requests. This extra detail replaces generic labels, helping you better understand and troubleshoot why a request was blocked.
Increased reliability under load
We have extended wait thresholds for resources during periods of high traffic volume. This change reduces failure rates during usage spikes, ensuring your most demanding scraping jobs complete successfully.
General bug fixes and improvements
Plus 3 internal improvements for better reliability and performance.
2x Scraping Throughput
We have optimized resource allocation to double the number of concurrent requests the platform can handle. Your batch jobs and large-scale scraping tasks will now complete significantly faster.
Deeper Crawl Coverage
The crawler now automatically retries pages that fail due to temporary connection issues or system blocks. This ensures that outbound links are always captured, allowing your crawls to reach their full intended depth without missing pages.
Increased Connection Stability
Internal wait times have been standardized to prevent premature timeouts during periods of high demand. You will experience fewer connection errors and more stable performance when the system is under heavy load.
Intelligent Routing Recovery
We have optimized our routing logic to ensure that domains are no longer permanently excluded from high-performance tiers after experiencing temporary failures. Domains are now periodically re-evaluated, ensuring your long-term scraping tasks maintain the highest possible success rates.
Refined Challenge Detection Logic
Updated logic for identifying site-specific challenges leads to more reliable scraping of protected targets. You will see fewer intermittent failures on sites with advanced automated protections.
Increased Browser Session Capacity
Optimized infrastructure now provides additional headroom for resource-heavy browser sessions. This results in greater stability and consistency when your scraping tasks require significant memory.
Bug Fixes
8Improved JSON response handling
Fixed a crash that occurred when a scraped URL returned a top-level JSON array as its primary content. The API now correctly handles these data structures, allowing for more robust data extraction across diverse websites.
Accurate credit usage tracking
Refined our billing logic to ensure absolute precision when calculating usage for zero-cost operations and edge-case scenarios. Your credit balance and history will now consistently reflect your actual API consumption.
Stable long-term job retention
Corrected a maintenance task issue to prevent active or scheduled jobs from being cleared prematurely. This ensures that your long-running tasks and their associated data remain accessible and secure.
Precise automated refund logic
Improved the accuracy of credit restorations for tasks where internal tracking data might have expired before processing. This ensures your account is always correctly credited for any incomplete or failed requests.
Improved dashboard stability
Fixed a bug that could cause errors for new users during their first sign-in or while viewing their account dashboard. This update ensures a smooth and reliable experience for all users managing their API keys and usage.
Improved Article Data Extraction
Resolved a crash that occurred when processing articles containing multiple metadata blocks. This ensures reliable data extraction and fewer job failures when scraping complex news and media websites.
Consistent Cached Responses
Fixed an issue where cached scrape results could occasionally return in a different format than fresh requests. All API responses now follow a unified structure, simplifying your data parsing and integration logic.
Sophisticated Challenge Resolution
Our automated browser solvers are now more effective at detecting and bypassing advanced JavaScript-based challenges that previously resulted in empty responses. This update significantly reduces failed requests on heavily protected sites, providing a more seamless data extraction experience.
Security
1Secure Error Reporting
Improved the sanitization of error messages within system reports to ensure they render correctly and securely. This prevents unexpected formatting issues or text injections caused by special characters in error details.
Plus 5 internal changes for stability and performance.