how to scrape the infinte long javascript render (dynamic page content) #498

mllife · 2025-01-20T11:50:35Z

mllife
Jan 20, 2025

I want to scrape this page, https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?year=2024 which is quite long; how can I utilise 🔄 Full-Page Scanning: Simulates scrolling to load and capture all dynamic content, perfect for infinite scroll pages. to first load and then scrape to markdown.

Answered by mllife

Jan 20, 2025

was able to do it
`python
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

async def scrape_ecb_publications():
# Configure the browser settings
browser_config = BrowserConfig(headless=True, java_script_enabled=True)

# Set run configurations, including cache mode and enabling full-page scan
crawl_config = CrawlerRunConfig(
    cache_mode=CacheMode.BYPASS,
    scan_full_page=True,
    scroll_delay=1.5,
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun(
        url="https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?year=2024",
        config=crawl_config
    )

    # Output the extracted…

View full answer

mllife · 2025-01-20T13:08:10Z

mllife
Jan 20, 2025
Author

was able to do it
`python
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode

async def scrape_ecb_publications():
# Configure the browser settings
browser_config = BrowserConfig(headless=True, java_script_enabled=True)

# Set run configurations, including cache mode and enabling full-page scan
crawl_config = CrawlerRunConfig(
    cache_mode=CacheMode.BYPASS,
    scan_full_page=True,
    scroll_delay=1.5,
)

async with AsyncWebCrawler(config=browser_config) as crawler:
    result = await crawler.arun(
        url="https://www.ecb.europa.eu/press/pubbydate/html/index.en.html?year=2024",
        config=crawl_config
    )

    # Output the extracted content
    with open("./downloads/ecb_publications_.html", "w", encoding="utf-8") as file:
        file.write(result.html)

if name == "main":
asyncio.run(scrape_ecb_publications())
`
had to increase the scroll_delay

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to scrape the infinte long javascript render (dynamic page content) #498

{{title}}

Replies: 1 comment

{{title}}

Select a reply

how to scrape the infinte long javascript render (dynamic page content) #498

mllife Jan 20, 2025

Replies: 1 comment

mllife Jan 20, 2025 Author

mllife
Jan 20, 2025

mllife
Jan 20, 2025
Author