Headless Chrome long image capture issue

原文引用https://www.dazhuanlan.com/2019/08/26/5d6300778d22d/

The problem

Recently I had received complaint about my capture service not export complete image. It seems that this problem only occurs when the page’s is extremely long.

The broken image is like this:

Chromium’s limit

So I Googled for the problem and I found a lot issues on Github that target the same problem. When reading throught this issue, I got the fact that this problem is caused by Chromium’s limit.

Since normal server don’t have a GPU inside, Headless Chrome had to use software renderer, that is, using CPU to calculate the pixels.

Chromium’s compositor has a maximum texture size when using software GL backend, this limit is 16384px. So large image will not be renderer completely.

How to solve it

The solve for this problem is simple. Cut the page into pieces, capture these fragments in order, and composite those pieces into a whole image.

The code below use Puppeteer’s API, it’s fine to replace it with other library like CDP.

await page.setViewport({ width: 1440, height: 1024});
const {contentSize} = await page._client.send(‘Page.getLayoutMetrics‘);
// MAGIC NUMBER, DO NOT MODIFIY THIS OR YOU WILL BE FIRED
const maxScreenshotHeight = 7000;
          if (contentSize.height >= maxScreenshotHeight) {

            let image;
            let lastBuffer;

            for (let ypos = 0; ypos < contentSize.height; ypos += maxScreenshotHeight) {
              const height = Math.min(contentSize.height - ypos, maxScreenshotHeight);
              let buffer = await page.screenshot({
                clip: {
                  x: 0,
                  y: ypos,
                  width: contentSize.width,
                  height
                }
              });
              if (ypos === 0) {
                image = sharp(buffer);
                lastBuffer = await image.toBuffer();
              }else {
                image = sharp(lastBuffer);
                image = image.extend({top: 0, bottom: height, left: 0, right: 0})
                image = image.overlayWith(buffer, {top: ypos, left:0})
                lastBuffer = await image.toBuffer();
              }
            }
            fileData = lastBuffer;

I use sharp for image processing, bacause it’s recommended on Github issue.

Future

The approach may not be necessary accroding to this Chromium issue.

原文地址:https://www.cnblogs.com/petewell/p/11410472.html

时间: 2024-07-31 09:31:12

Headless Chrome long image capture issue的相关文章

Headless Chrome入门

原文地址:Getting Started with Headless Chrome  By EricBidelman  Engineer @ Google working on web tooling: Headless Chrome, Puppeteer, Lighthouse Headless Chrome在Chrome59中发布,用于在headless环境中运行Chrome浏览器,也就是在非Chrome环境中运行Chrome.它将Chromium和Blink渲染引擎提供的所有现代Web平台

【转】利用 selenium 的 webdrive 驱动 headless chrome

1.参考 使用 headless chrome进行测试 2.概念 Headless模式解决了什么问题: 自动化工具例如 selenium 利用有头浏览器进行测试,面临效率和稳定性的影响,所以出现了 Headless Browser, 3年前,无头浏览器 PhantomJS 已经如火如荼出现了,紧跟着 NightmareJS 也成为一名巨星.无头浏览器带来巨大便利性:页面爬虫.自动化测试.WebAutomation... 用过PhantomJS的都知道,它的环境是运行在一个封闭的沙盒里面,在环境内

Serverless 实战——使用 Rendertron 搭建 Headless Chrome 渲染

为什么需要 Rendertron? 传统的 Web 页面,通常是服务端渲染的,而随着 SPA(Single-Page Application) 尤其是 React.Vue.Angular 为代表的前端框架的流行,越来越多的 Web App 使用的是客户端渲染. 使用客户端渲染有着诸多优势,比如节省后端资源.局部刷新.前后端分离等等,但也带来了一些挑战,比如本文要解决的 SEO 问题. 对于服务端渲染的页面,服务端可以直接将内容通过 HTML 的形式返回,搜索引擎爬虫可以轻易的获取页面内容,而对于

selenium(六)Headless Chrome/Firefox--PhantomJS停止支持后,使用无界面模式。

简介: 以前都用PhantomJS来进行无界面模式的自动化测试,或者爬取某些动态页面. 但是最近selenium更新以后,'Selenium support for PhantomJS has been deprecated, please use headless '提示不支持PhantomJs,请使用headless模式. 好吧,我们还是继续使用firefox chrome的headless模式吧. 一:版本确认 1.windows下 selenium  3.9.0 我使用这个版本的sele

基于headless chrome的游戏资源下载实现 (初版)

上周介绍了实现前端资源下载的思路,今天给一个简单的初版代码. 首先 基于express启动一个服务端容器,用于处理前端路由和后段逻辑处理,目录结构如下: 其中gameDir是游戏存放的地址,node_modules是存放用到插件的module,server内部目录结构如下 app.js是程序的启动代码 common 存放公用的方法 public 存放静态资源 routes 存放express路由信息 config存放一些配置信息 downLoadGame存放游戏下载的逻辑代码 view是ejs试

ChromeDriver与Chrome版本对应关系

备注: 下载ChromeDriver的时候,可以在notes.txt文件中查看版本对应关系. ----------ChromeDriver v2.29 (2017-04-04)---------- Supports Chrome v56-58 Resolved issue 1521: Assignment to Object.prototype.$family causes a crash [['Pri-1']] Resolved issue 1482: Chromedriver cannot

chrome 和 chromeDriver

在写selenium的时候,发现很简单的case也报错 package com.lv.test; import org.junit.Test; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class Orders { private WebDriver driver = new ChromeDriver(); String orderUrl="http:/

centos chrome

在centos6.X和redhat enterprise 中安装chrome,我找了很久都不行,今天终于找到了可以用下脚本那安装: #! /bin/bash # Google Chrome Installer/Uninstaller for CentOS 6 or 7 # (C) Richard K. Lloyd 2014 <[email protected]> # See http://chrome.richardlloyd.org.uk/ for further details. # Th

Selenium+Headless Firefox

背景 今天本地调试基于Selenium+PhantomJS的动态爬虫程序顺利结束后,着手部署到服务器上,刚买的热乎的京东云,噼里啪啦一顿安装环境,最后跑的时候报了这么个错误: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead 运用我考了五遍才飘过的六级英语定睛一看,这个意思是说,新版本的Selenium