Lightpanda如何加速AI数据采集？2026年高效爬虫实战指南

Q: Lightpanda浏览器相比Chrome有哪些性能优势？

Lightpanda是一款专为AI数据采集优化的轻量级无头浏览器，速度比Chrome快11倍，内存占用少9倍，适合大规模网页爬虫和自动化任务。

Q: 如何在Windows系统上安装和使用Lightpanda？

Windows用户需要安装WSL2，在WSL2环境中下载Lightpanda二进制文件，然后在Windows主机上通过Puppeteer连接进行数据采集。

Q: 使用Lightpanda进行爬虫时如何遵守robots.txt规则？

启动CDP服务器时添加--obey_robots参数，如：./lightpanda serve --obey_robots，这样Lightpanda会自动遵守目标网站的robots.txt规则。

Lightpanda: An AI Crawler Browser 11x Faster Than Chrome

本文是一份实践指南，旨在帮助工程师和开发者掌握如何使用轻量级无头浏览器没有图形用户界面的浏览器，用于自动化网页测试、抓取和渲染，常面临扩展性和稳定性挑战。 Lightpanda 进行高效的数据采集。通过约 20 分钟的学习，您将了解如何利用这款速度比 Chrome 快 11 倍、内存占用少 9 倍的工具来加速您的 AI 和自动化任务。

This article is a practical guide designed to help engineers and developers master the use of the lightweight headless browser Lightpanda for efficient data collection. In about 20 minutes, you will learn how to leverage this tool, which is 11 times faster and uses 9 times less memory than Chrome, to accelerate your AI and automation tasks.

目标读者

Target Audience

需要大规模网页爬虫的工程师 (Engineers requiring large-scale web crawlers)
AI/LLM 数据采集开发者 (AI/LLM data collection developers)
自动化测试工程师 (Automation testing engineers)
对高性能浏览器技术感兴趣的技术爱好者 (Tech enthusiasts interested in high-performance browser technology)

核心依赖与环境

Core Dependencies and Environment

Linux x86_64 或 macOS aarch64 (Linux x86_64 or macOS aarch64)
Windows 需要 WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。 (Windows requires WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。)
Docker（可选，推荐用于生产环境）(Docker (Optional, recommended for production environments))
Node.js 18+（运行 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。/Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。脚本）(Node.js 18+ (for running Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。/Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 scripts))

TIP
如果你用 Windows，直接在 WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。里安装 Lightpanda，然后在 Windows 主机上运行 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。。
If you are using Windows, install Lightpanda directly within WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。, then run Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 from your Windows host machine.

手把手教程

Step-by-Step Tutorial

步骤1：下载并安装 Lightpanda

Step 1: Download and Install Lightpanda

我们可以从 nightly builds 直接下载二进制文件。

We can download the binary directly from the nightly builds.

Linux 安装：

Linux Installation:

curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-x86_64-linux && \
chmod a+x ./lightpanda

macOS 安装：

macOS Installation:

curl -L -o lightpanda https://github.com/lightpanda-io/browser/releases/download/nightly/lightpanda-aarch64-macos && \
chmod a+x ./lightpanda

验证安装：

Verify Installation:

./lightpanda --version

WARNING
目前只有 Linux x86_64 和 macOS aarch64 的官方二进制。Windows 用户需要用 WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。。
Currently, only official binaries for Linux x86_64 and macOS aarch64 are available. Windows users need to use WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。.

步骤2：使用 Docker 启动（推荐）

Step 2: Launch Using Docker (Recommended)

如果你不想直接下载二进制，可以用 Docker 更快上手：

If you prefer not to download the binary directly, you can get started faster with Docker:

docker run -d --name lightpanda -p 9222:9222 lightpanda/browser:nightly

这样就直接启动了 CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器，监听在 9222 端口。

This directly starts the CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 server, listening on port 9222.

验证容器运行：

Verify Container is Running:

docker ps | grep lightpanda

步骤3：启动 CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器

Step 3: Start the CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 Server

如果我们不用 Docker，需要手动启动 CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器：

If we are not using Docker, we need to manually start the CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 server:

./lightpanda serve --obey_robots --log_format pretty --log_level info --host 127.0.0.1 --port 9222

输出类似：

The output will be similar to:

INFO  telemetry : telemetry status . . . . . . . . . . . . .  [+0ms]
      disabled = false

INFO  app : server running . . . . . . . . . . . . . . . .  [+0ms]
      address = 127.0.0.1:9222

TIP
--obey_robots 参数会让 Lightpanda 遵守 robots.txtA text file that instructs web crawlers which pages or files to access or ignore on a website.，这是礼貌爬虫的基本素养。
The --obey_robots parameter instructs Lightpanda to respect robots.txtA text file that instructs web crawlers which pages or files to access or ignore on a website., which is a fundamental practice for polite crawlers.

步骤4：编写 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。脚本

Step 4: Write a Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 Script

现在我们来写第一个爬虫脚本。假设你已经在项目目录安装了 puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。-core：

Now let's write our first crawler script. Assuming you have already installed puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。-core in your project directory:

npm install puppeteer-core

创建一个 crawler.js 文件：

Create a crawler.js file:

'use strict'

import puppeteer from 'puppeteer-core';

// 通过 WebSocket 连接到 Lightpanda 的 CDP 服务器
// Connect to Lightpanda's CDP server via WebSocket
const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

// 创建浏览器上下文和页面
// Create a browser context and page
const context = await browser.createBrowserContext();
const page = await context.newPage();

// 访问目标页面
// Navigate to the target page
await page.goto('https://demo-browser.lightpanda.io/amiibo/', {waitUntil: "networkidle0"});

// 提取页面中所有链接
// Extract all links from the page
const links = await page.evaluate(() => {
  return Array.from(document.querySelectorAll('a')).map(row => {
    return row.getAttribute('href');
  });
});

console.log('抓取的链接：');
console.log('Links scraped:');
links.forEach(link => console.log(link));

// 统计页面加载时间
// Measure page load metrics
const metrics = await page.metrics();
console.log('\n页面指标：');
console.log('\nPage Metrics:');
console.log('脚本执行时间:', metrics.ScriptDuration, 'ms');
console.log('Script execution time:', metrics.ScriptDuration, 'ms');
console.log('DOM 节点数:', metrics.Nodes);
console.log('DOM node count:', metrics.Nodes);

// 清理资源
// Clean up resources
await page.close();
await context.close();
await browser.disconnect();

运行脚本：

Run the script:

node crawler.js

TIP
如果你是用 Docker 启动的 Lightpanda，脚本里的 browserWSEndpoint 改成 ws://localhost:9222。
If you started Lightpanda with Docker, change the browserWSEndpoint in the script to ws://localhost:9222.

步骤5：体验极速抓取

Step 5: Experience High-Speed Scraping

Lightpanda 官方数据显示：

Official Lightpanda data shows:

速度: 比 Chrome 快 11 倍 (Speed: 11x faster than Chrome)
内存: 比 Chrome 少 9 倍 (Memory: 9x less than Chrome)
启动: 即时启动（无头 Chrome 启动要好几秒）(Startup: Instant startup (headless Chrome takes several seconds))

我们来跑一个简单对比测试。首先确保 Lightpanda 在运行：

Let's run a simple comparative test. First, ensure Lightpanda is running:

./lightpanda serve --host 127.0.0.1 --port 9222

然后写一个批量抓取脚本：

Then write a batch scraping script:

'use strict'

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();

// 批量抓取多个页面
// Batch scrape multiple pages
const urls = [
  'https://demo-browser.lightpanda.io/amiibo/',
  'https://demo-browser.lightpanda.io/campfire-commerce/',
  'https://demo-browser.lightpanda.io/hacker-news-top-stories/',
];

const startTime = Date.now();

for (const url of urls) {
  console.log(`\n抓取: ${url}`);
  console.log(`\nScraping: ${url}`);
  const pageStart = Date.now();

  await page.goto(url, {waitUntil: "networkidle0"});

  const title = await page.title();
  console.log(`标题: ${title}`);
  console.log(`Title: ${title}`);
  console.log(`耗时: ${Date.now() - pageStart}ms`);
  console.log(`Time taken: ${Date.now() - pageStart}ms`);
}

console.log(`\n总耗时: ${Date.now() - startTime}ms`);
console.log(`\nTotal time taken: ${Date.now() - startTime}ms`);

await browser.disconnect();

运行：

Run:

node batch-crawler.js

你会发现即使是批量抓取，Lightpanda 的响应也非常快。

You will find that Lightpanda responds very quickly even during batch scraping.

步骤6：高级功能 - 页面截图

Step 6: Advanced Feature - Page Screenshot

Lightpanda 也支持截图功能：

Lightpanda also supports screenshot functionality:

'use strict'

import puppeteer from 'puppeteer-core';

const browser = await puppeteer.connect({
  browserWSEndpoint: "ws://127.0.0.1:9222",
});

const context = await browser.createBrowserContext();
const page = await context.newPage();

// 设置视口大小
// Set viewport size
await page.setViewport({ width: 1280, height: 720 });

await page.goto('https://demo-browser.lightpanda.io/campfire-commerce/', {waitUntil: "networkidle0"});

// 截图保存
// Take and save a screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });

console.log('截图已保存到 screenshot.png');
console.log('Screenshot saved to screenshot.png');

await browser.disconnect();

常见问题排查

Common Issue Troubleshooting

Q1: 端口 9222 被占用

Q1: Port 9222 is Already in Use

症状： 启动时报错 "Address already in use"

Symptom: Error "Address already in use" on startup.

解决：

Solution:

# 查看谁在用这个端口
# Check which process is using the port
lsof -i :9222

# 或者换个端口
# Or use a different port
./lightpanda serve --port 9223
# 然后脚本里改成 ws://127.0.0.1:9223
# Then change the script to ws://127.0.0.1:9223

Q2: Web API 不支持报错

Q2: Error Due to Unsupported Web API

症状： 运行脚本时报错 "XXX is not defined"

Symptom: Error "XXX is not defined" when running the script.

解决： Lightpanda 目前还在 Beta 阶段，Web API 覆盖不完整。去 GitHub 提 issue，通常团队会很快响应。

Solution: Lightpanda is currently in Beta, and Web API coverage is not complete. File an issue on GitHub; the team typically responds quickly.

Q3: Docker 容器启动失败

Q3: Docker Container Fails to Start

症状： docker run 报错或容器立即退出

Symptom: docker run error or the container exits immediately.

解决：

Solution:

# 查看容器日志
# Check container logs
docker logs lightpanda

# 如果端口冲突，改一下
# If there's a port conflict, change it
docker run -d --name lightpanda -p 9322:9222 lightpanda/browser:nightly

Q4: Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。连接不上

Q4: Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 Cannot Connect

症状： Error: Protocol error (Target.attachToTarget): No target with given id

Symptom: Error: Protocol error (Target.attachToTarget): No target with given id

解决： 确认 Lightpanda CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器已启动，并且版本与 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。兼容。尝试重启：

Solution: Confirm the Lightpanda CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 server is running and its version is compatible with Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。. Try restarting:

# 杀掉旧进程
# Kill the old process
pkill lightpanda
# 重新启动
# Restart
./lightpanda serve --port 9222

Q5: 页面加载超时

Q5: Page Load Timeout

症状： TimeoutError: Navigation timeout

Symptom: TimeoutError: Navigation timeout

解决：

Solution:

# 增加超时时间
# Increase the timeout
await page.goto(url, { timeout: 60000 });
# 或者不用 networkidle0，改用 domcontentloaded
# Or use domcontentloaded instead of networkidle0
await page.goto(url, { waitUntil: "domcontentloaded" });

Q6: 想用 Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。而不是 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。

Q6: Want to Use Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 Instead of Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。

症状： 不知道如何集成

Symptom: Unsure how to integrate.

解决： Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。的连接方式和 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。类似：

Solution: Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。's connection method is similar to Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。's:

import { chromium } from 'playwright';

const browser = await chromium.connectOverCDP('ws://127.0.0.1:9222');
// 后续用法一样
// Subsequent usage is the same

扩展阅读 / 进阶方向

Further Reading / Advanced Topics

1. 从源码编译

1. Compile from Source

如果你想深入研究 Lightpanda 的内部实现，或者给它贡献代码，可以从源码编译：

If you want to delve into Lightpanda's internal implementation or contribute code, you can compile from source:

# 安装 Zig 0.15.2
# Install Zig 0.15.2
curl -L https://ziglang.org/download/0.15.2/zig-linux-x86_64-0.15.2.tar.xz | tar xJ

# 克隆项目
# Clone the project
git clone https://github.com/lightpanda-io/browser.git
cd browser

# 编译
# Compile
zig build run

2. Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。集成

2. Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 Integration

Lightpanda 官方支持 Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。。用法：

Lightpanda officially supports Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。. Usage:

npm install playwright

import { firefox } from 'playwright';

const browser = await firefox.connectOverCDP('ws://127.0.0.1:9222');
// 后续用法和普通 Playwright 一样
// Subsequent usage is the same as regular Playwright

WARNING
Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。支持有局限——因为 Lightpanda 会不断添加新的 Web API，Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。可能会选择不同的执行路径，导致某些脚本不工作。
Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 support has limitations—because Lightpanda continuously adds new Web APIs, Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 might choose different execution paths, causing some scripts to fail.

3. 代理和网络拦截

3. Proxy and Network Interception

Lightpanda 支持代理和网络请求拦截：

Lightpanda supports proxies and network request interception:

# 启动时指定代理
# Specify a proxy at startup
./lightpanda serve --proxy http://proxy:8080

4. 自定义 HTTP 头

4. Custom HTTP Headers

await page.setExtraHTTPHeaders({
  'X-Custom-Header': 'value'
});

5. Web Platform Tests

Lightpanda 团队在持续推进 Web API 兼容性测试。你可以在 wpt.live 上测试特定 API 的支持情况。

The Lightpanda team is continuously working on Web API compatibility testing. You can test the support for specific APIs on wpt.live.

常见问题（FAQ）

Lightpanda浏览器相比Chrome有哪些性能优势？

Lightpanda是一款专为AI数据采集优化的轻量级无头浏览器没有图形用户界面的浏览器，用于自动化网页测试、抓取和渲染，常面临扩展性和稳定性挑战。，速度比Chrome快11倍，内存占用少9倍，适合大规模网页爬虫和自动化任务。

如何在Windows系统上安装和使用Lightpanda？

Windows用户需要安装WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。，在WSL2Windows Subsystem for Linux 2，允许在Windows上运行Linux环境，用于支持Instill Core在Windows上的部署。环境中下载Lightpanda二进制文件，然后在Windows主机上通过Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。连接进行数据采集。

使用Lightpanda进行爬虫时如何遵守robots.txtA text file that instructs web crawlers which pages or files to access or ignore on a website.规则？

启动CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器时添加--obey_robots参数，如：./lightpanda serve --obey_robots，这样Lightpanda会自动遵守目标网站的robots.txtA text file that instructs web crawlers which pages or files to access or ignore on a website.规则。

Lightpanda: An AI Crawler Browser 11x Faster Than Chrome

目标读者

Target Audience

核心依赖与环境

Core Dependencies and Environment

手把手教程

Step-by-Step Tutorial

步骤1：下载并安装 Lightpanda

Step 1: Download and Install Lightpanda

步骤2：使用 Docker 启动（推荐）

Step 2: Launch Using Docker (Recommended)

步骤3：启动 CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 服务器

Step 3: Start the CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。 Server

步骤4：编写 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 脚本

Step 4: Write a Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 Script

步骤5：体验极速抓取

Step 5: Experience High-Speed Scraping

步骤6：高级功能 - 页面截图

Step 6: Advanced Feature - Page Screenshot

常见问题排查

Common Issue Troubleshooting

Q1: 端口 9222 被占用

Q1: Port 9222 is Already in Use

Q2: Web API 不支持报错

Q2: Error Due to Unsupported Web API

Q3: Docker 容器启动失败

Q3: Docker Container Fails to Start

Q4: Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 连接不上

Q4: Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。 Cannot Connect

Q5: 页面加载超时

Q5: Page Load Timeout

扩展阅读 / 进阶方向

Further Reading / Advanced Topics

1. 从源码编译

1. Compile from Source

2. Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 集成

2. Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。 Integration

3. 代理和网络拦截

3. Proxy and Network Interception

4. 自定义 HTTP 头

4. Custom HTTP Headers

5. Web Platform Tests

常见问题（FAQ）

Lightpanda浏览器相比Chrome有哪些性能优势？

如何在Windows系统上安装和使用Lightpanda？

使用Lightpanda进行爬虫时如何遵守robots.txtA text file that instructs web crawlers which pages or files to access or ignore on a website.规则？

步骤3：启动 CDPChrome DevTools Protocol的缩写，是一种基于WebSocket的协议，允许外部工具（如Puppeteer、Playwright）与浏览器实例进行通信和控制。服务器

步骤4：编写 Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。脚本

Q4: Puppeteer一个由Google开发的Node.js库，提供高级API来控制Chrome或Chromium浏览器，常用于网页爬虫、自动化测试和截图生成。连接不上

2. Playwright一个脚本层面的浏览器驱动库，与Browser-Use互补，适合精细控制浏览器流程，在组合使用时提供底层驱动支持。集成