Screenshots, Scraping, and Querying
In this guide, you'll learn how to quickly capture screenshots, scrape content from web pages, and query pages using natural language with the Aidolon Browser Client. These tools are invaluable for debugging, data extraction, automated reporting, and more.
What You'll Learn
- Capturing full-page and viewport screenshots
- Scraping web content as PDF, Markdown, and HTML
- Querying web pages using natural language
- Practical usage examples
Taking Screenshots
Aidolon Browser Client lets you easily capture screenshots of the web pages you're automating.
Full-Page Screenshot
Capture the entire page, including what's below the visible viewport:
from aidolon_browser_client.browser.browser_session import BrowserSession
with BrowserSession() as browser:
browser.navigate("https://example.com")
screenshot = browser.take_screenshot(full_page=True)
print("Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))
Viewport Screenshot
Capture only the visible portion of the webpage:
screenshot = browser.take_screenshot(full_page=False)
print("Viewport Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))
Scraping Web Content
You can scrape web content directly into various formats, perfect for structured data extraction or reporting.
Scrape as PDF
Generate a downloadable PDF version of the page:
pdf_response = browser.generate_pdf()
print("PDF URL:", pdf_response.get("data", {}).get("pdf_url"))
Scrape as Markdown
Scrape the page content as Markdown, ideal for readability and content reuse:
scraped_data = browser.scrape_page(format=["markdown"])
markdown_content = scraped_data.get("data", {}).get("markdown")
print(markdown_content)
Scrape as HTML
Extract the raw HTML source, useful for precise data extraction or offline analysis:
scraped_data = browser.scrape_page(format=["html"])
html_content = scraped_data.get("data", {}).get("html")
print(html_content)
Querying Web Pages with Natural Language
Use scrape_information() to describe exactly what data you want extracted from the web page in plain language:
from aidolon_browser_client.browser.browser_session import BrowserSession
with BrowserSession() as browser:
browser.navigate("https://example-ecommerce.com")
# Describe the information you want to extract
extracted_data = browser.scrape_information("list all product names and their prices")
print("Extracted Data:", extracted_data)
Complete Example
Combine screenshots, scraping, and natural language querying into a single workflow:
from aidolon_browser_client.browser.browser_session import BrowserSession
with BrowserSession() as browser:
browser.navigate("https://example.com")
# Take a full-page screenshot
screenshot = browser.take_screenshot(full_page=True)
print("Screenshot URL:", screenshot.get("data", {}).get("screenshot_url"))
# Scrape content into PDF, Markdown, and HTML formats
scraped_data = browser.scrape_page(format=["markdown", "html"], pdf=True)
pdf_url = scraped_data.get("data", {}).get("pdf_url")
markdown_content = scraped_data.get("data", {}).get("markdown")
html_content = scraped_data.get("data", {}).get("html")
print("PDF URL:", pdf_url)
print("Markdown Content:", markdown_content[:500], "...") # Preview first 500 chars
print("HTML Content:", html_content[:500], "...") # Preview first 500 chars
# Querying the page using natural language
extracted_data = browser.scrape_information("summarize the main points of the page")
print("Extracted Data:", extracted_data)
Recommended Use Cases
- Debugging: Quickly capture visual states of webpages for troubleshooting.
- Reporting: Automatically generate PDFs or Markdown documents from webpage content.
- Data Extraction: Grab structured HTML for precise scraping tasks.
- Natural Language Queries: Extract data or summaries using intuitive descriptions.
Best Practices
- Choose the Right Tool: Use screenshots for visuals,
generate_pdffor documents,scrape_pagefor bulk content (HTML/Markdown), andscrape_informationfor targeted natural language extraction. - Handle Large Content: Be mindful of potential large outputs when scraping full HTML or Markdown; process results efficiently.
- Respect
robots.txtand Terms of Service: Use scraping capabilities responsibly and ethically. - Error Handling: Check response objects for success/failure and handle potential errors during scraping or screenshotting.
- Natural Language Clarity: Be specific and clear in your descriptions for
scrape_informationfor best results.
You're now equipped to easily capture screenshots, scrape content, and perform natural language queries with Aidolon Browser Client!