Screenshots, Scraping, and Querying

In this guide, you'll learn how to quickly capture screenshots, scrape content from web pages, and query pages using natural language with the Aidolon Browser Client. These tools are invaluable for debugging, data extraction, automated reporting, and more.

What You'll Learn

Capturing full-page and viewport screenshots
Scraping web content as PDF, Markdown, and HTML
Querying web pages using natural language
Practical usage examples

Taking Screenshots

Aidolon Browser Client lets you easily capture screenshots of the web pages you're automating.

Full-Page Screenshot

Capture the entire page, including what's below the visible viewport:

from aidolon_browser_client.browser.browser_session import BrowserSession

with BrowserSession() as browser:
    browser.navigate("https://example.com")
    screenshot = browser.take_screenshot(full_page=True)
    print("Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))

Viewport Screenshot

Capture only the visible portion of the webpage:

screenshot = browser.take_screenshot(full_page=False)
print("Viewport Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))

Scraping Web Content

You can scrape web content directly into various formats, perfect for structured data extraction or reporting.

Scrape as PDF

Generate a downloadable PDF version of the page:

pdf_response = browser.generate_pdf()
print("PDF URL:", pdf_response.get("data", {}).get("pdf_url"))

Scrape as Markdown

Scrape the page content as Markdown, ideal for readability and content reuse:

scraped_data = browser.scrape_page(format=["markdown"])
markdown_content = scraped_data.get("data", {}).get("markdown")
print(markdown_content)

Scrape as HTML

Extract the raw HTML source, useful for precise data extraction or offline analysis:

scraped_data = browser.scrape_page(format=["html"])
html_content = scraped_data.get("data", {}).get("html")
print(html_content)

Querying Web Pages with Natural Language

Use scrape_information() to describe exactly what data you want extracted from the web page in plain language:

from aidolon_browser_client.browser.browser_session import BrowserSession

with BrowserSession() as browser:
    browser.navigate("https://example-ecommerce.com")

    # Describe the information you want to extract
    extracted_data = browser.scrape_information("list all product names and their prices")

    print("Extracted Data:", extracted_data)

Complete Example

Combine screenshots, scraping, and natural language querying into a single workflow:

from aidolon_browser_client.browser.browser_session import BrowserSession

with BrowserSession() as browser:
    browser.navigate("https://example.com")

    # Take a full-page screenshot
    screenshot = browser.take_screenshot(full_page=True)
    print("Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))

    # Scrape content into PDF, Markdown, and HTML formats
    scraped_data = browser.scrape_page(format=["markdown", "html"], pdf=True)

    pdf_url = scraped_data.get("data", {}).get("pdf_url")
    markdown_content = scraped_data.get("data", {}).get("markdown")
    html_content = scraped_data.get("data", {}).get("html")

    print("PDF URL:", pdf_url)
    print("Markdown Content:", markdown_content[:500], "...")  # Preview first 500 chars
    print("HTML Content:", html_content[:500], "...")  # Preview first 500 chars

    # Querying the page using natural language
    extracted_data = browser.scrape_information("summarize the main points of the page")
    print("Extracted Data:", extracted_data)

Recommended Use Cases

Debugging: Quickly capture visual states of webpages for troubleshooting.
Reporting: Automatically generate PDFs or Markdown documents from webpage content.
Data Extraction: Grab structured HTML for precise scraping tasks.
Natural Language Queries: Extract data or summaries using intuitive descriptions.

Best Practices

Choose the Right Tool: Use screenshots for visuals, generate_pdf for documents, scrape_page for bulk content (HTML/Markdown), and scrape_information for targeted natural language extraction.
Handle Large Content: Be mindful of potential large outputs when scraping full HTML or Markdown; process results efficiently.
Respect robots.txt and Terms of Service: Use scraping capabilities responsibly and ethically.
Error Handling: Check response objects for success/failure and handle potential errors during scraping or screenshotting.
Natural Language Clarity: Be specific and clear in your descriptions for scrape_information for best results.

You're now equipped to easily capture screenshots, scrape content, and perform natural language queries with Aidolon Browser Client!

What You'll Learn​

Taking Screenshots​

Full-Page Screenshot​

Viewport Screenshot​

Scraping Web Content​

Scrape as PDF​

Scrape as Markdown​

Scrape as HTML​

Querying Web Pages with Natural Language​

Complete Example​

Recommended Use Cases​

Best Practices​