Enhanced Reliability and Faster Detection
This update introduces advanced link discovery for JavaScript-heavy websites, ensuring complete coverage for modern web applications. We have also improved automated sitemap detection and resolved several critical API stability issues to ensure more reliable data extraction workflows.
The system now better handles domains with intermittent success rates, preventing unnecessary service interruptions. This ensures your scraping jobs continue to run reliably.
We have improved our ability to distinguish between different types of site protections and access limits. This provides more accurate error reporting when navigating complex page gates.
Automated configurations have been tuned for several regional media sites to ensure faster navigation and higher success rates. This reduces the retries needed to fetch content.
New Features
1JavaScript-Powered Link Discovery
The crawler now supports link discovery on modern web applications that require JavaScript execution to render content. This enables deeper data extraction from sites built with modern frameworks that do not include navigation links in their initial HTML.
Improvements
8Intelligent Circuit Breaking
The system now better handles domains with intermittent success rates, preventing unnecessary service interruptions. This ensures your scraping jobs continue to run reliably.
Granular Anti-Bot Detection
We have improved our ability to distinguish between different types of site protections and access limits. This provides more accurate error reporting when navigating complex page gates.
Optimized Scraping Success
Automated configurations have been tuned for several regional media sites to ensure faster navigation and higher success rates. This reduces the retries needed to fetch content.
Reduced Latency Under Load
Wait times have been significantly reduced during high system demand by failing fast when resources are unavailable. This prevents applications from hanging during traffic spikes.
Enhanced Browser Emulation
Browser environments now include more robust script execution capabilities and accurate environment signatures. This improves success when scraping sites with complex interactive scripts.
Optimized Concurrency Management
Internal resource allocation has been fine-tuned to better match real-world usage patterns. This allows for more reliable performance when running multiple concurrent scraping tasks.
General bug fixes and improvements
Plus 3 internal improvements for better reliability and performance.
Advanced Sitemap Detection
The system now automatically probes multiple common locations to find sitemaps, even when they are not explicitly declared in standard locations. This improves automated discovery for sites using various content management systems with non-standard configurations.
Bug Fixes
12Improved Session Reliability
We resolved an issue where browser sessions could occasionally become unavailable after a crash. Your allocated scraping capacity is now more accurately tracked and available.
General API and SDK Improvements
This update includes more accurate job status reporting and refined type definitions in our libraries. Connection handling for ingestion has also been improved to prevent hangs.
Reliable Browser Slot Management
Fixed a synchronization issue that could cause browser session counters to become inaccurate over time. You can now rely on more consistent availability and management of your active browser slots.
Improved Extraction for Complex Pages
Our extraction engine now handles pages with incomplete metadata tags more effectively. This reduces processing errors when scraping complex web structures and improves overall data reliability.
Accurate Page Type Classification
Resolved an issue where some URLs were incorrectly classified as landing pages. Your scrape results and metadata will now accurately reflect the specific page types you are targeting.
Improved JavaScript Scraping Reliability
We have implemented automated system refreshes to prevent resource exhaustion during high-volume operations. This ensures that browser-based requests are processed consistently without interruption.
Stable Session Recording Navigation
We resolved a navigation error that could occur when reviewing scraping history with specific authentication states. You will no longer encounter redirect loops, providing a more reliable debugging experience.
Reliable Recursive Page Discovery
Your deep crawls will now consistently find and traverse links, even when content is retrieved from the cache. This fix ensures that discovery correctly identifies new pages from the content of every crawled URL.
Stable Dashboard Session Persistence
We resolved an issue that caused the management console to incorrectly sign users out during certain proxy operations. This ensures you can monitor sessions without experiencing unexpected interruptions or redirects.
Reliable Recursive Discovery
Fixed a bug that caused recursive page discovery to fail during multi-level crawls. The system now accurately tracks source URLs for all discovered pages, ensuring complete and uninterrupted coverage across complex website structures.
Resolved Session Ingestion Errors
Fixed an issue where valid data sent to session endpoints was being incorrectly rejected with validation errors. API reliability is now restored, ensuring that your ingestion requests are processed successfully without unexpected failures.
Refined Recording Management Filters
Filtering session recordings by specific edge cases now works reliably within your dashboard. This fix ensures that your search criteria are correctly applied by the API, allowing for more precise management of your captured data.
Security
3Hardened Request Validation
Internal request handling has been updated to better validate client headers and prevent manipulation. This ensures a more secure connection between your services and our endpoints.
Stronger Client Identity Verification
We have hardened our request validation process to prevent unauthorized identity spoofing. This ensures more reliable rate limiting and consistent performance for all developers.
Enforced Request Payload Limits
Payload size limits are now strictly enforced across all transfer types, including chunked data streams. This update maintains platform stability by ensuring all incoming data adheres to standard size constraints.
Plus 5 internal changes for stability and performance.