An Superior Coding Implementation: Mastering Browser‑Pushed AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini


On this tutorial, we’ll discover ways to harness the facility of a browser‑pushed AI agent fully inside Google Colab. We are going to make the most of Playwright’s headless Chromium engine, together with the browser_use library’s high-level Agent and BrowserContext abstractions, to programmatically navigate web sites, extract information, and automate complicated workflows. We are going to wrap Google’s Gemini mannequin through the langchain_google_genai connector to supply pure‑language reasoning and resolution‑making, secured by pydantic’s SecretStr for protected API‑key dealing with. With getpass managing credentials, asyncio orchestrating non‑blocking execution, and optionally available .env help through python-dotenv, this setup will provide you with an finish‑to‑finish, interactive agent platform with out ever leaving your pocket book setting.

!apt-get replace -qq
!apt-get set up -y -qq chromium-browser chromium-chromedriver fonts-liberation
!pip set up -qq playwright python-dotenv langchain-google-generative-ai browser-use
!playwright set up

We first refresh the system bundle lists and set up headless Chromium, its WebDriver, and the Liberation fonts to allow browser automation. It then installs Playwright together with python-dotenv, the LangChain GoogleGenerativeAI connector, and browser-use, and at last downloads the mandatory browser binaries through playwright set up.

import os
import asyncio
from getpass import getpass
from pydantic import SecretStr
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserContextConfig, BrowserConfig
from browser_use.browser.browser import BrowserContext

We deliver within the core Python utilities, os for setting administration and asyncio for asynchronous execution, plus getpass and pydantic’s SecretStr for safe API‑key enter and storage. It then masses LangChain’s Gemini wrapper (ChatGoogleGenerativeAI) and the browser_use toolkit (Agent, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and drive a headless browser agent.

os.environ["ANONYMIZED_TELEMETRY"] = "false"

We disable nameless utilization reporting by setting the ANONYMIZED_TELEMETRY setting variable to “false”, making certain that neither Playwright nor the browser_use library sends any telemetry information again to its maintainers.

async def setup_browser(headless: bool = True):
    browser = Browser(config=BrowserConfig(headless=headless))
    context = BrowserContext(
        browser=browser,
        config=BrowserContextConfig(
            wait_for_network_idle_page_load_time=5.0,
            highlight_elements=True,
            save_recording_path="./recordings",
        )
    )
    return browser, context

This asynchronous helper initializes a headless (or headed) Browser occasion and wraps it in a BrowserContext configured to attend for community‑idle web page masses, visually spotlight components throughout interactions, and save a recording of every session underneath ./recordings. It then returns each the browser and its prepared‑to‑use context on your agent’s duties.

async def agent_loop(llm, browser_context, question, initial_url=None):
    initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url else None
    agent = Agent(
        activity=question,
        llm=llm,
        browser_context=browser_context,
        use_vision=True,
        generate_gif=False,  
        initial_actions=initial_actions,
    )
    end result = await agent.run()
    return end result.final_result() if end result else None

This async helper encapsulates one “assume‐and‐browse” cycle: it spins up an Agent configured together with your LLM, the browser context, and optionally available preliminary URL tab, leverages imaginative and prescient when obtainable, and disables GIF recording. When you name agent_loop, it runs the agent by its steps and returns the agent’s last end result (or None if nothing is produced).

async def primary():
    raw_key = getpass("Enter your GEMINI_API_KEY: ")


    os.environ["GEMINI_API_KEY"] = raw_key


    api_key = SecretStr(raw_key)
    model_name = "gemini-2.5-flash-preview-04-17"


    llm = ChatGoogleGenerativeAI(mannequin=model_name, api_key=api_key)


    browser, context = await setup_browser(headless=True)


    strive:
        whereas True:
            question = enter("nEnter immediate (or go away clean to exit): ").strip()
            if not question:
                break
            url = enter("Elective URL to open first (or clean to skip): ").strip() or None


            print("n🤖 Working agent…")
            reply = await agent_loop(llm, context, question, initial_url=url)
            print("n📊 Search Resultsn" + "-"*40)
            print(reply or "No outcomes discovered")
            print("-"*40)
    lastly:
        print("Closing browser…")
        await browser.shut()


await primary()

Lastly, this primary coroutine drives the complete Colab session: it securely prompts on your Gemini API key (utilizing getpass and SecretStr), units up the ChatGoogleGenerativeAI LLM and a headless Playwright browser context, then enters an interactive loop the place it reads your pure‑language prompts (and optionally available begin URL), invokes the agent_loop to carry out the browser‑pushed AI activity, prints the outcomes, and at last ensures the browser closes cleanly.

In conclusion, by following this information, you now have a reproducible Colab template that integrates browser automation, LLM reasoning, and safe credential administration right into a single cohesive pipeline. Whether or not you’re scraping actual‑time market information, summarizing information articles, or automating reporting duties, the mixture of Playwright, browser_use, and LangChain’s Gemini interface offers a versatile basis on your subsequent AI‑powered mission. Be at liberty to increase the agent’s capabilities, re‑allow GIF recording, add customized navigation steps, or swap in different LLM backends to tailor the workflow exactly to your analysis or manufacturing wants.


Right here is the Colab Notebook. Additionally, don’t neglect to observe us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Neglect to affix our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Leave a Reply

Your email address will not be published. Required fields are marked *