Headless Chrome入门

原文地址:Getting Started with Headless Chrome  By EricBidelman  Engineer @ Google working on web tooling: Headless Chrome, Puppeteer, Lighthouse

Headless Chrome在Chrome59中发布,用于在headless环境中运行Chrome浏览器,也就是在非Chrome环境中运行Chrome。它将Chromium和Blink渲染引擎提供的所有现代Web平台功能引入命令行。

它有什么用处呢?

headless浏览器是自动测试和服务器环境的绝佳工具,您不需要可见的UI shell。例如,针对真实的网页进行测试,创建网页的PDF,或者只是检查浏览器如何呈现URL。

0. 开始

最简单的开始使用headless模式的方法是从命令行打开Chrome。如果你已经安装了Chrome59+的版本,可以使用 --headless 标签:

chrome   --headless \                   # 在headless模式运行Chrome
  --disable-gpu \                # 在Windows上运行时需要--remote-debugging-port=9222   https://www.chromestatus.com   # 打开URL. 默认为about:blank

注意:若在Windows中运行,则需要在命令行添加 --disable-gpu 。

chrome 命令需要指向Chrome的安装路径。(即在Chrome的安装路径下运行)

1. 命令行功能

在某些情况下,您可能不需要以编程方式编写Headless Chrome脚本。下面是一些有用的命令行标志来执行常见任务。

1.1 打印DOM --dump-dom

将 document.body.innerHTML 在stdout打印出来:

chrome --headless --disable-gpu --dump-dom https://www.chromestatus.com/

1.2 创建PDF --print-to-pdf :

chrome --headless --disable-gpu --print-to-pdf https://www.chromestatus.com/

1.3 截屏 --screenshot

chrome --headless --disable-gpu --screenshot https://www.chromestatus.com/

# 标准屏幕大小
chrome --headless --disable-gpu --screenshot --window-size=1280,1696 https://www.chromestatus.com/

# Nexus 5x
chrome --headless --disable-gpu --screenshot --window-size=412,732 https://www.chromestatus.com/

运行 --screenshot将会在当前运行目录下生成一个 screenshot.png 文件。若想给整个页面的截图,那么会比较复杂。来自 David Schnurr 的一篇很棒的博文介绍了这一内容。请查看 使用 headless Chrome 作为自动截屏工具

1.4 REPL模式(read-eval-print loop) --repl

在REPL模式运行Headless,该模式允许通过命令行在浏览器中评估JS表达式:

$ chrome --headless --disable-gpu --repl --crash-dumps-dir=./tmp https://www.chromestatus.com/
[0608/112805.245285:INFO:headless_shell.cc(278)] Type a Javascript expression to evaluate or "quit" to exit.
>>> location.href
{"result":{"type":"string","value":"https://www.chromestatus.com/features"}}
>>> quit
$

注意:使用repl模式时需要添加 --crash-dumps-dir 命令。

2. 在没有浏览器界面情况下调试Chrome

当使用 --remote-debugging-port=9222 运行Chrome时,会启用DevTools协议的实例。该协议用于与Chrome通信并且驱动headless浏览器实例。除此之外,它还是一个类似于 Sublime, VS Code, 和Node的工具,可用于远程调试一个应用。

由于没有浏览器UI来查看页面,因此需要在另一个浏览器中导航到http:// localhost:9222以检查一切是否正常。这将看到一个可查看页面的列表,可以在其中单击并查看Headless正在呈现的内容:

DevTools远程调试界面

在这里,你可以使用熟悉的DecTools功能来查看、调试、修改页面。若以编程方式(programmatically)使用Headless,该页面的功能更强大,可以用于查看所有的DecTools协议的命令,并与浏览器进行通信。

3. 使用编程模式(Node)

3.1 Puppeteer

Puppeteer 由Chrome团队开发的Node库。它提供了控制headless Chrome的高阶API。类似于 Phantom 和 NightmareJS这样的自动测试库,但它只用于最新版本的Chrome。

除此之外,Puppeteer还可用于截屏,创建PDF,页面导航,以及获取有关这些页面的信息。如果需要快速进行浏览器的自动化测试,建议使用该库。它隐藏了DevTools协议的复杂性,并负责启动Chrome的调试实例等冗余任务。

安装:

npm i --save puppeteer

例子-打印用户代理信息:

const puppeteer = require(‘puppeteer‘);

(async() => {
  const browser = await puppeteer.launch();
  console.log(await browser.version());
  await browser.close();
})();

例子-截屏

const puppeteer = require(‘puppeteer‘);

(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(‘https://www.chromestatus.com‘, {waitUntil: ‘networkidle2‘});
await page.pdf({path: ‘page.pdf‘, format: ‘A4‘});

await browser.close();
})();

查看 Puppeteer‘s 文档 学习Puppeteer的更多用法。

3.2 CRI库

相对于Puppeteer‘s API来说,chrome-remote-interface 是一个低阶的库,推荐使用它更接近底层地直接使用DevTools协议。

打开Chrome

chrome-remote-interface不能打开Chrome,因此需要自己打开Chrome。

在CLI部分,我们使用--headless --remote-debugging-port = 9222手动打开Chrome。但是,要实现完全自动化测试,您可能希望从应用程序中生成Chrome。

使用 child——process 的一种方式:

const execFile = require(‘child_process‘).execFile;

function launchHeadlessChrome(url, callback) {
  // Assuming MacOSx.
  const CHROME = ‘/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome‘;
  execFile(CHROME, [‘--headless‘, ‘--disable-gpu‘, ‘--remote-debugging-port=9222‘, url], callback);
}

launchHeadlessChrome(‘https://www.chromestatus.com‘, (err, stdout, stderr) => {
  ...
});

但是如果你想要一个适用于多个平台的可移植解决方案,那么事情会变得棘手。看看Chrome的硬编码路径吧:(

使用ChromeLaucher

Lighthouse 是测试web应用质量绝佳工具。用于启动Chrome的强大的模块就是在Lighthouse中开发的,现在可以单独使用。  chrome-launcher NPM module 可以找到Chrome的安装路径,设置调试实例,打开浏览器,并且当程序运行完成时关掉它。最棒的是,由于Node,它可以跨平台工作!

默认情况下,chrome-launcher会尝试启动Chrome Canary(如果已安装),但可以更改它以手动选择要使用的Chrome。要使用它,首先从npm安装:

npm i --save chrome-launcher

例子-使用 chrome-launcher 启动Headless模式

const chromeLauncher = require(‘chrome-launcher‘);

// 可选: 设置launcher的日志记录级别以查看其输出
// 安装:: npm i --save lighthouse-logger
// const log = require(‘lighthouse-logger‘);
// log.setLevel(‘info‘);

/**
 * 启动Chrome的调试实例
 * @param {boolean=} headless True (default) 启动headless模式的Chrome.
 *     False 启动Chrome的完成版本.
 * @return {Promise<ChromeLauncher>}
 */
function launchChrome(headless=true) {
  return chromeLauncher.launch({
    // port: 9222, // Uncomment to force a specific port of your choice.
    chromeFlags: [
      ‘--window-size=412,732‘,
      ‘--disable-gpu‘,
      headless ? ‘--headless‘ : ‘‘
    ]
  });
}

launchChrome().then(chrome => {
  console.log(`Chrome debuggable on port: ${chrome.port}`);
  ...
  // chrome.kill();
});

运行此脚本并没有太大作用,但在任务管理器中应该可以看到Chrome实例已启动,内容为 about:blank 。但是没有浏览器界面。因为是headless模式。

要控制浏览器,我们需要DevTools协议!

检索有关页面的信息

安装:

npm i --save chrome-remote-interface

例子-打印用户代理

const CDP = require(‘chrome-remote-interface‘);

...

launchChrome().then(async chrome => {
  const version = await CDP.Version({port: chrome.port});
  console.log(version[‘User-Agent‘]);
});

结果类似于: HeadlessChrome/60.0.3082.0

例子-检查网站是否有应用列表

const CDP = require(‘chrome-remote-interface‘);

...

(async function() {

const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});

// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page} = protocol;
await Page.enable();

Page.navigate({url: ‘https://www.chromestatus.com/‘});

// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
  const manifest = await Page.getAppManifest();

  if (manifest.url) {
    console.log(‘Manifest: ‘ + manifest.url);
    console.log(manifest.data);
  } else {
    console.log(‘Site has no app manifest‘);
  }

  protocol.close();
  chrome.kill(); // Kill Chrome.
});

})();

例子-使用DOM API提取页面的<title>

const CDP = require(‘chrome-remote-interface‘);

...

(async function() {

const chrome = await launchChrome();
const protocol = await CDP({port: chrome.port});

// Extract the DevTools protocol domains we need and enable them.
// See API docs: https://chromedevtools.github.io/devtools-protocol/
const {Page, Runtime} = protocol;
await Promise.all([Page.enable(), Runtime.enable()]);

Page.navigate({url: ‘https://www.chromestatus.com/‘});

// Wait for window.onload before doing stuff.
Page.loadEventFired(async () => {
  const js = "document.querySelector(‘title‘).textContent";
  // Evaluate the JS expression in the page.
  const result = await Runtime.evaluate({expression: js});

  console.log(‘Title of page: ‘ + result.result.value);

  protocol.close();
  chrome.kill(); // Kill Chrome.
});

})();

4. 使用Selenium,W??ebDriver和ChromeDriver

现在,Selenium打开了一个完整地Chrome的实例,也就是说,换句话说,它是一种自动化解决方案,但并非完全headless。但是,Selenium可以通过一些配置来运行headless Chrome。我建议使用headless Chrome运行Selenium,若你还是想要如何自己设置的完整说明,我已经在下面的一些例子中展示了如何让你放弃。

使用ChromeDriver

ChromeDriver 2.32使用了Chrome61,并且在headless Chrome运行的更好。

安装:

npm i --save-dev selenium-webdriver chromedriver

例子

const fs = require(‘fs‘);
const webdriver = require(‘selenium-webdriver‘);
const chromedriver = require(‘chromedriver‘);

const chromeCapabilities = webdriver.Capabilities.chrome();
chromeCapabilities.set(‘chromeOptions‘, {args: [‘--headless‘]});

const driver = new webdriver.Builder()
  .forBrowser(‘chrome‘)
  .withCapabilities(chromeCapabilities)
  .build();

// Navigate to google.com, enter a search.
driver.get(‘https://www.google.com/‘);
driver.findElement({name: ‘q‘}).sendKeys(‘webdriver‘);
driver.findElement({name: ‘btnG‘}).click();
driver.wait(webdriver.until.titleIs(‘webdriver - Google Search‘), 1000);

// Take screenshot of results page. Save to disk.
driver.takeScreenshot().then(base64png => {
  fs.writeFileSync(‘screenshot.png‘, new Buffer(base64png, ‘base64‘));
});

driver.quit();

使用WebDriverIO

WebDriverIO 是Selenium WebDriver之上的更高阶的API。

安装:

npm i --save-dev webdriverio chromedriver

例子-chromestatus.com上的CSS filter功能

const webdriverio = require(‘webdriverio‘);
const chromedriver = require(‘chromedriver‘);

const PORT = 9515;

chromedriver.start([
  ‘--url-base=wd/hub‘,
  `--port=${PORT}`,
  ‘--verbose‘
]);

(async () => {

const opts = {
  port: PORT,
  desiredCapabilities: {
    browserName: ‘chrome‘,
    chromeOptions: {args: [‘--headless‘]}
  }
};

const browser = webdriverio.remote(opts).init();

await browser.url(‘https://www.chromestatus.com/features‘);

const title = await browser.getTitle();
console.log(`Title: ${title}`);

await browser.waitForText(‘.num-features‘, 3000);
let numFeatures = await browser.getText(‘.num-features‘);
console.log(`Chrome has ${numFeatures} total features`);

await browser.setValue(‘input[type="search"]‘, ‘CSS‘);
console.log(‘Filtering features...‘);
await browser.pause(1000);

numFeatures = await browser.getText(‘.num-features‘);
console.log(`Chrome has ${numFeatures} CSS features`);

const buffer = await browser.saveScreenshot(‘screenshot.png‘);
console.log(‘Saved screenshot...‘);

chromedriver.stop();
browser.end();

})();

5. 更多资源

以下是一些有用的资源,可帮助您入门:

文档:

工具:

演示:

6. FAQ

6.1 是否需要 --disable-gpu 命令?

仅Windows平台需要。其他平台不需要。--disable-gpu命令是一个临时解决一些错误的方案。在将来的Chrome版本中,不再需要此命令。有关更多信息,请参阅 crbug.com/737678

6.2 是否需要 Xvfb

不需要。Headless Chrome不使用窗口,因此不再需要像Xvfb这样的显示服务器。没有它,也可以愉快地运行自动化测试。

什么是Xvfb?Xvfb是一种用于类Unix系统的内存显示服务器,它使您能够运行图形应用程序(如Chrome)而无需附加物理显示设备。许多人使用Xvfb运行早期版本的Chrome进行“headless”测试。

6.3 如何创建运行Headless Chrome的Docker容器?

看看lighthouse-ci。它有一个示例Dockerfile ,它使用node:8-slim作为基本映像,在App Engine Flex上安装+ 运行Lighthouse 。

6.4 Headless Chrome与PhantomJS有什么关系?

Headless Chrome与PhantomJS等工具类似。两者都可用于headless环境中的自动化测试。两者之间的主要区别在于Phantom使用较旧版本的WebKit作为其渲染引擎,而Headless Chrome使用最新版本的Blink。

目前,Phantom还提供了比DevTools 协议更高级别的API。

6.5 在哪里提交bugs?

对于Headless Chrome的bugs,请在crbug.com上提交。

对于DevTools协议中的错误,请将它们发送到github.com/ChromeDevTools/devtools-protocol

tech & looking ahead to the platform‘s future.

Getting Started with Headless Chrome

By Eric Bidelman

Engineer @ Google working on web tooling: Headless Chrome, Puppeteer, Lighthouse

TL;DR

Headless Chrome is shipping in Chrome 59. It‘s a way to run the Chrome browser in a headless environment. Essentially, running Chrome without chrome! It brings all modern web platform features provided by Chromium and the Blink rendering engine to the command line.

Why is that useful?

A headless browser is a great tool for automated testing and server environments where you don‘t need a visible UI shell. For example, you may want to run some tests against a real web page, create a PDF of it, or just inspect how the browser renders an URL.

Note: Headless mode has been available on Mac and Linux since Chrome 59Windows support came in Chrome 60.

Starting Headless (CLI)

The easiest way to get started with headless mode is to open the Chrome binary from the command line. If you‘ve got Chrome 59+ installed, start Chrome with the --headless flag:

chrome \  --headless \                   # Runs Chrome in headless mode.  --disable-gpu \                # Temporarily needed if running on Windows.  --remote-debugging-port=9222\  https://www.chromestatus.com   # URL to open. Defaults to about:blank.

Note: Right now, you‘ll also want to include the --disable-gpu flag if you‘re running on Windows. See crbug.com/737678.

chrome should point to your installation of Chrome. The exact location will vary from platform to platform. Since I‘m on Mac, I created convenient aliases for each version of Chrome that I have installed.

If you‘re on the stable channel of Chrome and cannot get the Beta, I recommend using chrome-canary:

alias chrome="/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"alias chrome-canary="/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary"alias chromium="/Applications/Chromium.app/Contents/MacOS/Chromium"

Download Chrome Canary here.

Command line features

In some cases, you may not need to programmatically script Headless Chrome. There are some useful command line flags to perform common tasks.

Printing the DOM

The --dump-dom flag prints document.body.innerHTML to stdout:

chrome --headless --disable-gpu --dump-dom https://www.chromestatus.com/

Create a PDF

The --print-to-pdf flag creates a PDF of the page:

chrome --headless --disable-gpu --print-to-pdf https://www.chromestatus.com/

Taking screenshots

To capture a screenshot of a page, use the --screenshot flag:

chrome --headless --disable-gpu --screenshot https://www.chromestatus.com/

# Size of a standard letterhead.chrome --headless --disable-gpu --screenshot --window-size=1280,1696 https://www.chromestatus.com/

# Nexus 5xchrome --headless --disable-gpu --screenshot --window-size=412,732 https://www.chromestatus.com/

Running with --screenshot will produce a file named screenshot.png in the current working directory. If you‘re looking for full page screenshots, things are a tad more involved. There‘s a great blog post from David Schnurr that has you covered. Check out Using headless Chrome as an automated screenshot tool.

REPL mode (read-eval-print loop)

The --repl flag runs Headless in a mode where you can evaluate JS expressions in the browser, right from the command line:

$ chrome --headless --disable-gpu --repl --crash-dumps-dir=./tmp https://www.chromestatus.com/[0608/112805.245285:INFO:headless_shell.cc(278)]Type a Javascript expression to evaluate or"quit" to exit.>>> location.href{"result":{"type":"string","value":"https://www.chromestatus.com/features"}}>>> quit$

Note: the addition of the --crash-dumps-dir flag when using repl mode.

Debugging Chrome without a browser UI?

When you run Chrome with --remote-debugging-port=9222, it starts an instance with the DevTools protocol enabled. The protocol is used to communicate with Chrome and drive the headless browser instance. It‘s also what tools like Sublime, VS Code, and Node use for remote debugging an application. #synergy

Since you don‘t have browser UI to see the page, navigate to http://localhost:9222 in another browser to check that everything is working. You‘ll see a list of inspectable pages where you can click through and see what Headless is rendering:

DevTools remote debugging UI

From here, you can use the familiar DevTools features to inspect, debug, and tweak the page as you normally would. If you‘re using Headless programmatically, this page is also a powerful debugging tool for seeing all the raw DevTools protocol commands going across the wire, communicating with the browser.

Using programmatically (Node)

Puppeteer

Puppeteer is a Node library developed by the Chrome team. It provides a high-level API to control headless (or full) Chrome. It‘s similar to other automated testing libraries like Phantom and NightmareJS, but it only works with the latest versions of Chrome.

Among other things, Puppeteer can be used to easily take screenshots, create PDFs, navigate pages, and fetch information about those pages. I recommend the library if you want to quickly automate browser testing. It hides away the complexities of the DevTools protocol and takes care of redundant tasks like launching a debug instance of Chrome.

Install it:

npm i --save puppeteer

Example - print the user agent

const puppeteer =require(‘puppeteer‘);

(async()=>{  const browser = await puppeteer.launch();  console.log(await browser.version());  await browser.close();})();

Example - taking a screenshot of the page

const puppeteer =require(‘puppeteer‘);

(async()=>{const browser = await puppeteer.launch();const page = await browser.newPage();await page.goto(‘https://www.chromestatus.com‘,{waitUntil:‘networkidle2‘});await page.pdf({path:‘page.pdf‘, format:‘A4‘});

await browser.close();})();

Check out Puppeteer‘s documentation to learn more about the full API.

The CRI library

chrome-remote-interface is a lower-level library than Puppeteer‘s API. I recommend it if you want to be close to the metal and use the DevTools protocol directly.

Launching Chrome

chrome-remote-interface doesn‘t launch Chrome for you, so you‘ll have to take care of that yourself.

In the CLI section, we started Chrome manually using --headless --remote-debugging-port=9222. However, to fully automate tests, you‘ll probably want to spawn Chrome from your application.

One way is to use child_process:

const execFile =require(‘child_process‘).execFile;

function launchHeadlessChrome(url, callback){  // Assuming MacOSx.  const CHROME =‘/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome‘;  execFile(CHROME,[‘--headless‘,‘--disable-gpu‘,‘--remote-debugging-port=9222‘, url], callback);}

launchHeadlessChrome(‘https://www.chromestatus.com‘,(err, stdout, stderr)=>{  ...});

But things get tricky if you want a portable solution that works across multiple platforms. Just look at that hard-coded path to Chrome :(

Using ChromeLauncher

Lighthouse is a marvelous tool for testing the quality of your web apps. A robust module for launching Chrome was developed within Lighthouse and is now extracted for standalone use. The chrome-launcher NPM modulewill find where Chrome is installed, set up a debug instance, launch the browser, and kill it when your program is done. Best part is that it works cross-platform thanks to Node!

By default, chrome-launcher will try to launch Chrome Canary (if it‘s installed), but you can change that to manually select which Chrome to use. To use it, first install from npm:

npm i --save chrome-launcher

Example - using chrome-launcher to launch Headless

const chromeLauncher =require(‘chrome-launcher‘);

// Optional: set logging level of launcher to see its output.// Install it using: npm i --save lighthouse-logger// const log = require(‘lighthouse-logger‘);// log.setLevel(‘info‘);

/** * Launches a debugging instance of Chrome. * @param {boolean=} headless True (default) launches Chrome in headless mode. *     False launches a full version of Chrome. * @return {Promise<ChromeLauncher>} */function launchChrome(headless=true){  return chromeLauncher.launch({    // port: 9222, // Uncomment to force a specific port of your choice.    chromeFlags:[      ‘--window-size=412,732‘,      ‘--disable-gpu‘,      headless ?‘--headless‘:‘‘    ]  });}

launchChrome().then(chrome =>{  console.log(`Chrome debuggable on port: ${chrome.port}`);  ...  // chrome.kill();});

Running this script doesn‘t do much, but you should see an instance of Chrome fire up in the task manager that loaded about:blank. Remember, there won‘t be any browser UI. We‘re headless.

To control the browser, we need the DevTools protocol!

Retrieving information about the page

Warning: The DevTools protocol can do a ton of interesting stuff, but it can be a bit daunting at first. I recommend spending a bit of time browsing the DevTools Protocol Viewer, first. Then, move on to thechrome-remote-interface API docs to see how it wraps the raw protocol.

Let‘s install the library:

npm i --save chrome-remote-interface
Examples

Example - print the user agent

const CDP =require(‘chrome-remote-interface‘);

...

launchChrome().then(async chrome =>{  const version = await CDP.Version({port: chrome.port});  console.log(version[‘User-Agent‘]);});

Results in something like: HeadlessChrome/60.0.3082.0

Example - check if the site has a web app manifest

const CDP =require(‘chrome-remote-interface‘);

...

(async function(){

const chrome = await launchChrome();const protocol = await CDP({port: chrome.port});

// Extract the DevTools protocol domains we need and enable them.// See API docs: https://chromedevtools.github.io/devtools-protocol/const{Page}= protocol;await Page.enable();

Page.navigate({url:‘https://www.chromestatus.com/‘});

// Wait for window.onload before doing stuff.Page.loadEventFired(async ()=>{  const manifest = await Page.getAppManifest();

  if(manifest.url){    console.log(‘Manifest: ‘+ manifest.url);    console.log(manifest.data);  }else{    console.log(‘Site has no app manifest‘);  }

  protocol.close();  chrome.kill();// Kill Chrome.});

})();

Example - extract the <title> of the page using DOM APIs.

const CDP =require(‘chrome-remote-interface‘);

...

(async function(){

const chrome = await launchChrome();const protocol = await CDP({port: chrome.port});

// Extract the DevTools protocol domains we need and enable them.// See API docs: https://chromedevtools.github.io/devtools-protocol/const{Page,Runtime}= protocol;await Promise.all([Page.enable(),Runtime.enable()]);

Page.navigate({url:‘https://www.chromestatus.com/‘});

// Wait for window.onload before doing stuff.Page.loadEventFired(async ()=>{  const js ="document.querySelector(‘title‘).textContent";  // Evaluate the JS expression in the page.  const result = await Runtime.evaluate({expression: js});

  console.log(‘Title of page: ‘+ result.result.value);

  protocol.close();  chrome.kill();// Kill Chrome.});

})();

Using Selenium, WebDriver, and ChromeDriver

Right now, Selenium opens a full instance of Chrome. In other words, it‘s an automated solution but not completely headless. However, Selenium can be configured to run headless Chrome with a little work. I recommend Running Selenium with Headless Chrome if you want the full instructions on how to set things up yourself, but I‘ve dropped in some examples below to get you started.

Using ChromeDriver

ChromeDriver 2.32 uses Chrome 61 and works well with headless Chrome.

Install:

npm i --save-dev selenium-webdriver chromedriver

Example:

const fs =require(‘fs‘);const webdriver =require(‘selenium-webdriver‘);const chromedriver =require(‘chromedriver‘);

const chromeCapabilities = webdriver.Capabilities.chrome();chromeCapabilities.set(‘chromeOptions‘,{args:[‘--headless‘]});

const driver =new webdriver.Builder()  .forBrowser(‘chrome‘)  .withCapabilities(chromeCapabilities)  .build();

// Navigate to google.com, enter a search.driver.get(‘https://www.google.com/‘);driver.findElement({name:‘q‘}).sendKeys(‘webdriver‘);driver.findElement({name:‘btnG‘}).click();driver.wait(webdriver.until.titleIs(‘webdriver - Google Search‘),1000);

// Take screenshot of results page. Save to disk.driver.takeScreenshot().then(base64png =>{  fs.writeFileSync(‘screenshot.png‘,newBuffer(base64png,‘base64‘));});

driver.quit();

Using WebDriverIO

WebDriverIO is a higher level API on top of Selenium WebDriver.

Install:

npm i --save-dev webdriverio chromedriver

Example: filter CSS features on chromestatus.com

const webdriverio =require(‘webdriverio‘);const chromedriver =require(‘chromedriver‘);

const PORT =9515;

chromedriver.start([  ‘--url-base=wd/hub‘,  `--port=${PORT}`,  ‘--verbose‘]);

(async ()=>{

const opts ={  port: PORT,  desiredCapabilities:{    browserName:‘chrome‘,    chromeOptions:{args:[‘--headless‘]}  }};

const browser = webdriverio.remote(opts).init();

await browser.url(‘https://www.chromestatus.com/features‘);

const title = await browser.getTitle();console.log(`Title: ${title}`);

await browser.waitForText(‘.num-features‘,3000);let numFeatures = await browser.getText(‘.num-features‘);console.log(`Chrome has ${numFeatures} total features`);

await browser.setValue(‘input[type="search"]‘,‘CSS‘);console.log(‘Filtering features...‘);await browser.pause(1000);

numFeatures = await browser.getText(‘.num-features‘);console.log(`Chrome has ${numFeatures} CSS features`);

const buffer = await browser.saveScreenshot(‘screenshot.png‘);console.log(‘Saved screenshot...‘);

chromedriver.stop();browser.end();

})();

Further resources

Here are some useful resources to get you started:

Docs

Tools

Demos

  • "The Headless Web" - Paul Kinlan‘s great blog post on using Headless with api.ai.

FAQ

原文地址:https://www.cnblogs.com/chaoxiZ/p/9664150.html

时间: 2024-07-31 14:33:12

Headless Chrome入门的相关文章

【转】利用 selenium 的 webdrive 驱动 headless chrome

1.参考 使用 headless chrome进行测试 2.概念 Headless模式解决了什么问题: 自动化工具例如 selenium 利用有头浏览器进行测试,面临效率和稳定性的影响,所以出现了 Headless Browser, 3年前,无头浏览器 PhantomJS 已经如火如荼出现了,紧跟着 NightmareJS 也成为一名巨星.无头浏览器带来巨大便利性:页面爬虫.自动化测试.WebAutomation... 用过PhantomJS的都知道,它的环境是运行在一个封闭的沙盒里面,在环境内

Headless Chrome long image capture issue

原文引用https://www.dazhuanlan.com/2019/08/26/5d6300778d22d/ The problem Recently I had received complaint about my capture service not export complete image. It seems that this problem only occurs when the page's is extremely long. The broken image is l

Serverless 实战——使用 Rendertron 搭建 Headless Chrome 渲染

为什么需要 Rendertron? 传统的 Web 页面,通常是服务端渲染的,而随着 SPA(Single-Page Application) 尤其是 React.Vue.Angular 为代表的前端框架的流行,越来越多的 Web App 使用的是客户端渲染. 使用客户端渲染有着诸多优势,比如节省后端资源.局部刷新.前后端分离等等,但也带来了一些挑战,比如本文要解决的 SEO 问题. 对于服务端渲染的页面,服务端可以直接将内容通过 HTML 的形式返回,搜索引擎爬虫可以轻易的获取页面内容,而对于

selenium(六)Headless Chrome/Firefox--PhantomJS停止支持后,使用无界面模式。

简介: 以前都用PhantomJS来进行无界面模式的自动化测试,或者爬取某些动态页面. 但是最近selenium更新以后,'Selenium support for PhantomJS has been deprecated, please use headless '提示不支持PhantomJs,请使用headless模式. 好吧,我们还是继续使用firefox chrome的headless模式吧. 一:版本确认 1.windows下 selenium  3.9.0 我使用这个版本的sele

基于headless chrome的游戏资源下载实现 (初版)

上周介绍了实现前端资源下载的思路,今天给一个简单的初版代码. 首先 基于express启动一个服务端容器,用于处理前端路由和后段逻辑处理,目录结构如下: 其中gameDir是游戏存放的地址,node_modules是存放用到插件的module,server内部目录结构如下 app.js是程序的启动代码 common 存放公用的方法 public 存放静态资源 routes 存放express路由信息 config存放一些配置信息 downLoadGame存放游戏下载的逻辑代码 view是ejs试

Puppeteer之爬虫入门

译者按: 本文通过简单的例子介绍如何使用Puppeteer来爬取网页数据,特别是用谷歌开发者工具获取元素选择器值得学习. 原文: A Guide to Automating & Scraping the Web with JavaScript (Chrome + Puppeteer + Node JS) 译者: Fundebug 为了保证可读性,本文采用意译而非直译.另外,本文版权归原作者所有,翻译仅用于学习. 我们将会学到什么? 在这篇文章,你讲会学到如何使用JavaScript自动化抓取网页

Selenium+Headless Firefox

背景 今天本地调试基于Selenium+PhantomJS的动态爬虫程序顺利结束后,着手部署到服务器上,刚买的热乎的京东云,噼里啪啦一顿安装环境,最后跑的时候报了这么个错误: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead 运用我考了五遍才飘过的六级英语定睛一看,这个意思是说,新版本的Selenium

headless&amp;unittest

为什么要使用 headless 测试? headless broswer 可以给测试带来显著好处: 对于 UI 自动化测试,少了真实浏览器加载 css,js 以及渲染页面的工作.无头测试要比真实浏览器快的多. 可以在无界面的服务器或 CI 上运行测试,减少了外界的干扰,使自动化测试更稳定. 在一台机器上可以模拟运行多个无头浏览器,方便进行并发测试 什么是 Headless Chrome? Headless Chrome 是 Chrome 浏览器的无界面形态,可以在不打开浏览器的前提下,使用所有

前端开发入门到实战:把HTML转成PDF的4个方案及实现

在本文中,我将展示如何使用 Node.js.Puppeteer.headless Chrome 和 Docker 从样式复杂的 React 页面生成 PDF 文档. 背景:几个月前,一个客户要求我们开发一个功能,用户可以得到 PDF 格式的 React 页面内容.该页面基本上是患者病例的报告和数据可视化结果,其中包含许多 SVG.另外还有一些特殊的请求来操纵布局,并对 HTML 元素进行一些重新排列.因此与原始的 React 页面相比,PDF 中应该有不同的样式和额外的内容. 由于这个任务比用简