Skip to main content

Interacting with the Browser

This tutorial covers key interactions you can perform with Aidolon's Browser Client. You'll learn how to navigate pages, click elements, type into fields, press keys, and drag and drop items.

By the end, you'll confidently automate browser tasks efficiently and intuitively.


What You'll Learn

  • Navigating to web pages using standard URLs and smart URLs
  • Clicking elements using technical selectors and natural language
  • Typing text into input fields
  • Simulating key presses, including special keys and combinations
  • Performing drag-and-drop operations
  • Understanding the basic interaction workflow

Navigating web pages is straightforward. Provide a full URL or use Aidolon's smart URLs:

# Using a complete URL
browser.navigate("https://www.example.com")

# Using a smart URL (natural language)
browser.navigate("example site")

Smart URLs use AI to interpret natural language descriptions, saving you time.


Click Elements

Click elements using either technical selectors or natural language:

# Using CSS selector
browser.click("#submit-button", wait="auto")

# Using natural language
browser.click("Submit button", wait="auto")

# The `click` and `press` methods now include a `wait` parameter with options: "auto", True, False.

Prefer natural language for readability. Use precise descriptions to ensure accuracy.


Type text into Fields

Enter text easily with the type_text method:

# Technical selector example
browser.type_text("input[name='email']", "user@example.com", delay=0.1)

# Natural language example
browser.type_text("Email field", "user@example.com", delay=0.1)

### Parameters for `type_text`

- `selector`: The element to type into.
- `text`: The text to type.
- `delay`: Optional delay between keystrokes, default is 0.1 seconds.

Natural language selectors adapt well if the web page structure changes.


Press Keys

Simulate pressing keys within fields or on the page:

You can specify keys using exact identifiers or natural language. Examples of supported keys include:

  • Function keys: F1 - F12
  • Digits and letters: Digit0 - Digit9, KeyA - KeyZ
  • Special keys: Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp
  • Modifier keys: Shift, Control, Alt, Meta, ShiftLeft, ControlOrMeta (resolves to Control on Windows/Linux and Meta on macOS)

Holding down Shift generates uppercase characters. Single characters are case-sensitive (a and A produce different results).

Shortcut combinations are supported:

browser.press("search box", "Control+o")
browser.press("editor", "Control++")
browser.press("browser window", "Control+Shift+T")

Natural language descriptions also work:

# Pressing Enter using a selector
browser.press("input[name='search']", "Enter")

# Using natural language description
browser.press("search box", "return")

Aidolon understands natural descriptions of keys, like "enter," "return," "tab," and combinations like "shift+tab."


Drag and Drop

Effortlessly move items around with drag_and_drop:

# Technical selectors
browser.drag_and_drop("#source-element", "#target-element")

# Natural language selectors
browser.drag_and_drop("image thumbnail", "drop area")

Natural language selectors are powerful, making complex interactions simple.


Interaction Workflow Example

Here's how interactions come together in practice:

from aidolon_browser_client.browser.browser_session import BrowserSession

with BrowserSession() as browser:
browser.navigate("google")
browser.type_text("search box", "Aidolon Browser", delay=0.1)
browser.press("search box", "Enter")
browser.click("first result link")
browser.drag_and_drop("feature icon", "favorites area")

Interaction Flow Diagram


Best Practices for Browser Interactions

  • Prefer natural language selectors for maintainability and clarity.
  • Use precise descriptions when using natural language.
  • Utilize smart URLs to enhance readability.
  • Keep interaction sequences logical and concise.

You're now equipped to automate browser interactions efficiently using Aidolon Browser Client!