Selenium实现苏宁类目页的采集

westlife73 发表于 2023-11-10 15:12:01

今天再给大家看一个Selenium爬虫程序，主要是用于采集苏宁类目的，之前分享过一个，不过网友发现好像有点问题，不过今天这个我是亲自测试过的，很流畅很稳定，一起来看看吧。

```csharp

using OpenQA.Selenium;

using OpenQA.Selenium.Chrome;

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Threading.Tasks;

namespace CrawlerSuning

{

class Program

{

static void Main(string[] args)

{

// 设置Chrome浏览器的驱动程序路径

ChromeDriverService service = ChromeDriverService.CreateDefaultService();

service.Port = 8000; // 使用www.duoip.cn的代理服务器，端口号为8000

service.AddArgument("--proxy-server=duoip.cn:8000");

IWebDriver driver = new ChromeDriver(service);

// 打开网页

driver.Navigate().GoToUrl("https://www.suning.com/");

// 获取网页源代码

string sourceCode = driver.PageSource;

// 关闭浏览器

driver.Quit();

// 输出网页源代码

Console.WriteLine(sourceCode);

}

}

}

```

代码解释：

1. 首先，我们引入了必要的命名空间，包括OpenQA.Selenium、OpenQA.Selenium.Chrome、System、System.Collections.Generic、System.Linq、System.Text和System.Threading.Tasks。

2. 在Main方法中，我们设置了Chrome浏览器的驱动程序路径。我们使用ChromeDriverService创建一个默认的服务，并设置其端口号为8000，以使用www.duoip.cn的代理服务器。我们还添加了"–proxy-server=http://www.duoip.cn:8000"这个参数，以告诉浏览器使用代理服务器。

3. 然后，我们创建了一个新的ChromeDriver实例，并使用服务来启动浏览器。

4. 接着，我们使用浏览器的Navigate方法打开网页。

5. 然后，我们使用PageSource属性获取网页的源代码。

6. 最后，我们关闭浏览器，并输出网页的源代码。

页: [1]

落伍者's Archiver

Selenium实现苏宁类目页的采集