Selenium 代理 IP 配置实战：从基础设置到动态网页采集_

摘要：Selenium 结合代理 IP 是动态网页采集的利器。本文详解 Chrome/Edge/Firefox 代理配置、认证代理处理、动态IP切换及常见踩坑场景。

很多网站的核心数据是 JavaScript 动态渲染的，单纯用 requests 拿不到完整页面。Selenium 通过真实浏览器加载页面，配合代理 IP 使用，是动态网页采集的标准方案。

但 Selenium 配置代理 IP 有不少坑——认证代理怎么传参数？运行时怎么动态切换？Chrome 新版有什么变化？这篇文章逐一解决。

基础配置：Chrome + 代理 IP

最基础的场景，无认证代理：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--proxy-server=http://proxy.wukongdaili.com:8888")
# 禁用图片加载加速（可选，纯采集场景推荐）
chrome_options.add_argument("--blink-settings=imagesEnabled=false")

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://httpbin.org/ip")
print(driver.page_source)
driver.quit()

认证代理：用户名密码怎么处理？

Selenium 的 --proxy-server 参数不支持直接传用户名密码。两种解决方案：

方案一：URL 格式（部分浏览器支持）

chrome_options.add_argument("--proxy-server=http://user:pass@proxy.wukongdaili.com:8888")

注意：Chrome 新版可能忽略 URL 中的认证信息，推荐使用方案二。

方案二：Chrome 扩展插件（推荐）

import os
import zipfile
from selenium import webdriver

def create_proxy_extension(proxy_host, proxy_port, username, password):
    """创建代理认证 Chrome 扩展"""
    manifest_json = """
    {
        "version": "1.0.0",
        "manifest_version": 3,
        "name": "Proxy Auth",
        "permissions": ["proxy", "tabs", "webRequest", "webRequestAuthProvider", "storage"],
        "background": {
            "service_worker": "background.js"
        }
    }
    """

    background_js = f"""
    const config = {{
        mode: "fixed_servers",
        rules: {{
            singleProxy: {{
                scheme: "http",
                host: "{proxy_host}",
                port: {proxy_port}
            }}
        }}
    }};
    chrome.proxy.settings.set({{value: config, scope: "regular"}}, function() {{}});

    chrome.webRequest.onAuthRequired.addListener(
        function(details) {{
            return {{ authCredentials: {{ username: "{username}", password: "{password}" }} }};
        }},
        {{ urls: ["<all_urls>"] }},
        ["blocking"]
    );
    """

    plugin_path = "proxy_auth_plugin.zip"
    with zipfile.ZipFile(plugin_path, "w") as zp:
        zp.writestr("manifest.json", manifest_json)
        zp.writestr("background.js", background_js)

    return plugin_path

# 使用
plugin_file = create_proxy_extension("proxy.wukongdaili.com", 8888, "user", "pass")
chrome_options = Options()
chrome_options.add_extension(plugin_file)

driver = webdriver.Chrome(options=chrome_options)
driver.get("https://httpbin.org/ip")
print(driver.page_source)
driver.quit()

# 清理临时文件
os.remove(plugin_file)

动态切换代理：每次请求用不同 IP

Selenium 实例创建后，不能直接修改代理设置。需要重新创建 WebDriver 实例：

import time

PROXY_LIST = [
    "http://user:pass@proxy1.wukongdaili.com:8888",
    "http://user:pass@proxy2.wukongdaili.com:8888",
    "http://user:pass@proxy3.wukongdaili.com:8888",
]

def create_driver(proxy_url):
    options = Options()
    options.add_argument(f"--proxy-server={proxy_url}")
    return webdriver.Chrome(options=options)

urls_to_scrape = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3",
]

for url, proxy in zip(urls_to_scrape, PROXY_LIST):
    driver = create_driver(proxy)
    try:
        driver.get(url)
        time.sleep(2)  # 等待动态内容加载
        print(f"采集完成: {url} (IP: {proxy})")
        # 提取数据...
    finally:
        driver.quit()  # 关闭浏览器，下次用新代理

性能优化建议：频繁创建/销毁 WebDriver 开销较大。可以用浏览器上下文（Browser Context）实现隔离：

# Selenium 4.x 支持多上下文，但代理仍需通过扩展切换
# 更优雅的方案是使用隧道代理，一个入口自动轮换IP

隧道代理方案：最省心

隧道代理只需要配置一个入口地址，每次请求自动切换出口IP，无需在代码中管理代理轮换：

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--proxy-server=http://user:pass@tunnel.wukongdaili.com:8888")

driver = webdriver.Chrome(options=chrome_options)

# 连续访问多个页面，每个请求自动使用不同IP
for i in range(5):
    driver.get(f"https://example.com/page/{i}")
    time.sleep(2)
    # 目标网站看到的是不同的IP

driver.quit()

Edge 和 Firefox 配置

Edge

Edge 基于 Chromium，配置方式与 Chrome 完全一致：

from selenium.webdriver.edge.options import Options
from selenium import webdriver

edge_options = Options()
edge_options.add_argument("--proxy-server=http://user:pass@proxy.wukongdaili.com:8888")

driver = webdriver.Edge(options=edge_options)

Firefox

Firefox 使用 webdriver.FirefoxProfile 设置代理：

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

firefox_options = Options()
firefox_options.set_preference("network.proxy.type", 1)
firefox_options.set_preference("network.proxy.http", "proxy.wukongdaili.com")
firefox_options.set_preference("network.proxy.http_port", 8888)
firefox_options.set_preference("network.proxy.ssl", "proxy.wukongdaili.com")
firefox_options.set_preference("network.proxy.ssl_port", 8888)

driver = webdriver.Firefox(options=firefox_options)

常见踩坑场景

坑1：代理连上了但页面加载失败

原因：代理服务器不支持 HTTPS CONNECT 方法，或目标网站有额外反爬检测。

解决：确认代理服务支持 HTTPS 代理；检查目标网站是否需要特殊 User-Agent。

坑2：Selenium 被识别为自动化浏览器

很多网站通过 navigator.webdriver 属性检测 Selenium。解决方法：

chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
chrome_options.add_experimental_option("useAutomationExtension", False)

坑3：动态内容没加载完就抓取

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver.get("https://example.com")
# 等待特定元素出现，最多等10秒
element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, ".target-content"))
)
# 现在可以安全提取数据
print(element.text)

坑4：无头模式下代理不生效

确保 --proxy-server 参数在 --headless 之后添加：

chrome_options.add_argument("--headless=new")  # 新版 headless
chrome_options.add_argument("--proxy-server=http://proxy.wukongdaili.com:8888")

实战：采集携程酒店数据

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

def scrape_hotel(city, proxy_url):
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--proxy-server=" + proxy_url)
    options.add_argument("--window-size=1920,1080")

    driver = webdriver.Chrome(options=options)
    try:
        url = f"https://hotels.ctrip.com/hotel/{city}"
        driver.get(url)

        # 等待酒店列表加载
        WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, ".hotel-item"))
        )

        hotels = driver.find_elements(By.CSS_SELECTOR, ".hotel-item")
        for hotel in hotels[:5]:
            name = hotel.find_element(By.CSS_SELECTOR, ".hotel-name").text
            price = hotel.find_element(By.CSS_SELECTOR, ".hotel-price").text
            print(f"{name}: {price}")

    finally:
        driver.quit()

# 使用代理采集
scrape_hotel("beijing", "http://user:pass@tunnel.wukongdaili.com:8888")

总结

Selenium + 代理 IP 是动态网页采集的常用方案。核心要点：

无认证代理：直接 --proxy-server 参数
认证代理：用 Chrome 扩展插件处理
动态切换：重建 WebDriver 或使用隧道代理
反检测：隐藏 navigator.webdriver、伪装 User-Agent
等待动态内容：用 WebDriverWait 而非 time.sleep

对于大规模采集，隧道代理是比较省心的选择——一个入口自动轮换IP，代码简洁且稳定性高。

悟空代理隧道代理支持 Selenium/Playwright/Puppeteer 等主流自动化工具，同时也提供住宅静态代理用于固定IP场景，提供多语言配置示例，新用户可免费试用。

住宅静态IP

云服务器IP

隧道代理IP

独享代理IP

Selenium 代理 IP 配置实战：从基础设置到动态网页采集

基础配置：Chrome + 代理 IP

认证代理：用户名密码怎么处理？

方案一：URL 格式（部分浏览器支持）

方案二：Chrome 扩展插件（推荐）

动态切换代理：每次请求用不同 IP

隧道代理方案：最省心

Edge 和 Firefox 配置

Edge

Firefox

常见踩坑场景

坑1：代理连上了但页面加载失败

坑2：Selenium 被识别为自动化浏览器

坑3：动态内容没加载完就抓取

坑4：无头模式下代理不生效

实战：采集携程酒店数据

总结

热门标签

悟空代理IP 免费开通测试

住宅静态IP

云服务器IP

隧道代理IP

独享代理IP

Selenium 代理 IP 配置实战：从基础设置到动态网页采集

基础配置：Chrome + 代理 IP

认证代理：用户名密码怎么处理？

方案一：URL 格式（部分浏览器支持）

方案二：Chrome 扩展插件（推荐）

动态切换代理：每次请求用不同 IP

隧道代理方案：最省心

Edge 和 Firefox 配置

Edge

Firefox

常见踩坑场景

坑1：代理连上了但页面加载失败

坑2：Selenium 被识别为自动化浏览器

坑3：动态内容没加载完就抓取

坑4：无头模式下代理不生效

实战：采集携程酒店数据

总结

热门标签

推荐阅读

悟空代理IP 免费开通测试