Puppeteer Screenshot Full Page Not Working. Possible Fixes and Alternatives

🤔

TL;DR: Puppeteer, it's an amazing library but it has some caveats. One of those caveats is that full-page screenshots don't work out of the box. This post explores some of the problems and potential solutions to this gap in the library. Our recommendation is to go with a managed screenshot service but keep reading if you want to learn more.

Puppeteer is currently the best tool in the ecosystem for running a headless browser. Not only it provides an expressive API to control Chrome and do multiple types of browser operations, but it's perhaps the only solution that can operate a browser (in this case Chrome/Chromium) with a high degree of accuracy and consistency. Both of these things are especially important for creating automated solutions.

However, Puppeteer doesn't perform perfectly when it comes to taking screenshots.

There's one particular aspect that still doesn't work very well. Puppeteer screenshot full page functionality is very inaccurate and inconsistent across different rendering scenarios.

In this post, we explain why the Puppeteer screenshot full-page functionality may not be working on your particular case.

Understanding Puppeteer screenshot functionality

Before we explain the areas where Puppeteer fails and describe some possible solutions it's important to understand how the Puppeteer Screenshot functionality works.

For starters Puppeteer is just a protocol library that allows you to control a Chrome or Chromium browser through a high level API. This means that when you're using Puppeteer there are always two parts in play:

  1. Puppeteer
  2. Chrome or Chromium

Understanding this is important because some defects in the screenshotting process can happen in the browser or in the process of controlling the browser. As you will see later, different versions of Puppeteer and Chrome or Chromium can yield different results when it comes to taking full-page screenshots. Also different implementations of the page.screenshot method can produce different results as well.

So now that we know that there are 2 parts in the play how are Puppeteer Screenshots being taken exactly?

Well, it's really a very uncomplicated abstraction on top of the Chrome DevTool protocol. Basically, after you instruct Puppeteer to take a screenshot, the library will try to determine the layout values based on the parameters that you passed, and then it would use an underlying DevTools method: Page.captureScreenshot to take the screenshot. → https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-captureScreenshot

In theory, this should yield an accurate screenshot but when you tell Puppeteer to take a full-page screenshot there are no accommodations for accuracy. The browser will take a view screenshot given a set amount of clip coordinates and it will return things as they were seen by the renderer at the moment of the instruction. Although simple, this process will never work as if you were seeing the page on your screen and this is the main reason why puppeteer full page screenshots will not work.

Problems with Puppeteer full-page screenshot functionality —Why your full page screenshots are not working?

Viewport Units Issues (a.k.a vh height or 100% height)

One common problem with Puppeteer's full-page screenshots is the use of viewport units in the layout. In a real-world scenario when you're seeing a website on a real screen, viewport units makes sense since they are expressing a size based on the actual dimensions of a physical display.

The problem with viewport units in Puppeteer is that although the library can simulate the viewport size, it can't really take a full-page screenshot without setting the viewport to 100% the height of the layout. Because of this when you instruct Puppeteer to take a full-page screenshot on a website that is using vh units or 100% height in an element, the resulting screenshot will usually be an image of the website rendered with that element taking 100% of the height of the website.

See example described in GitHub issue: https://github.com/puppeteer/puppeteer/issues/959

Possible solution (1.19.0 and below):

Until Puppeteer 1.20.0 it was possible to overcome this problem by setting the viewport height to a reasonable height (800 or 900) and instructing Puppeteer to screenshot beyond the fold while keeping the viewport height fixed:

// This strategy only works in Puppeteer 1.19.0 and below.

// You will have to create a function to determine the real height of the page.
// This function is an example an it's definitely not guaranteed to work in every
// case, since websites heights can change at scroll time or based on other conditions.
async function realHeight(){
	const bodyHeight = await page.evaluate((_) => {
		document.body.scrollHeight
	}
	return bodyHeight;
}

const pageRealHeight = realHeight()
const pageWidth = 1440
const viewportHeight = 800

await page.setViewport({
      width: pageWidth,
      height: viewportHeight
});

await page.screenshot({
    path: "./screenshot.jpg",
    type: "jpeg",
    clip: {
			x: 0,
      y: 0,
      width: 1440,
      height: realHeight,
   }
});

Lazy loading and dynamic on-scroll event dependent elements

Lazy loading is a technique used in many websites for efficiency. Basically, it allows the browser to only request resources when the dependent elements become viewed. Lazy loading not only optimizes network usage, but it also protects computing resources by not executing rendering actions until they are needed.

The problem with lazy loading is that generally it will affect screenshot operations since the default rendered behavior is to capture the image as it's seen by the renderer. This means that images or elements that have not been loaded could not show up in the screenshots. This is also true for on-scroll dependent interactions like on-scroll animations.

There are many ways to fix this but there's no one size fits all solutions. In general, you will need to instruct Puppeteer to scroll before taking the screenshot or you will need to set the viewport to the height of the page. Sometimes you will have to do both and sometimes you might need to tamper with DOM execution to really force elements to show up.

Possible solution:

As said above, there's not one size fits all but we advice you to do the following as basic measures to avoid lazy loading related problems:

await page.evaluate((_) => {
    window.scrollBy({
        top: 800,
        behavior: "smooth",
    });
});

// Arbitrary wait to allow things to load
await wait(1000);

await page.evaluate((_) => {
    window.scrollBy({
        top: 30000,
        behavior: "smooth",
    });
});

// Another arbitrary wait to allow more things to load
await wait(1000);

// Scroll back to top
await page.evaluate((_) => {
    window.scrollTo({
        top: 0,
        behavior: "smooth",
    });
});

// A full height viewport reveal to force any missing elements to reveal.
// We set here to 10000 but it can be set to the real calculated height or
// Something even larger like 20000 or 30000
await page.setViewport({ width: 1440, height: 10000 });

Extremely heavy pages and/or complicated layouts

Unfortunately, not all websites are made equally and there's always going to be websites with weird caveats and implementations. In our experience running GetScreenshot and screenshotting thousands of websites for Waveguide, there's not one single technique that will give you 100% coverage for all the potential website specific issues that can result when trying to take full-page screenshots with Puppeteer.

We have found that there are two classes of websites that are especially troublesome for Puppeteer:

1) Super heavy web pages loading massive images or with very large resources

Puppeteer doesn't handle large websites well. Although this is highly dependent on the host environment, very large websites will usually exhaust the allocated memory and crash the browser. You can always experiment with larger memory allocations and more powerful environments. However, if you're dealing with a website that has a memory leak or has a DOM issue, there are probably very few things you can do to fix this on Puppeteer's side.

2) Complicated layouts

As we saw before, things like the usage of viewport units can affect the resulting screenshot. But there are many other implementation caveats on the website side that can cause a problem for the renderer. For example, if a website uses some special animation technique, it is not guaranteed that the resulting screenshot will reflect the final state of the animation, as seen on a real display.

Also, if a website has some heavy JavaScript layout optimization is not guaranteed that the headless version of the website will be interpreted with the same rendered intention of a real screen browser.

It’s also important to acknowledge that there are Chromium specific problems. For example, the bare Chromium binary doesn't include the codecs necessary to render MP4 video and other similar formats.

What to do if nothing works?

We personally love Puppeteer and we are happy with the steady process the library has been making in the past year. It's also exciting to see other solutions like Playwright which could become a great alternative to Puppeteer.

However we think that Puppeteer as a screenshot service is still very much a work in progress.

We build GetScreenshot to abstract all our knowledge into an easy and affordable screenshot API based on Puppeteer. We are not gonna lie, we believe that you should be using a managed service like ours and let us do the research and optimizations. We have mastered several rendering techniques that go deep into DOM and JS manipulation to achieve accurate screenshots.

We would love if you use our solution and let us solve this problem while you focus on your core problems, but if you still want to dig deeper and try to master Puppeteer for your own screenshotting use cases, we believe there are some useful things that you could do:

  • Become familiar with the Puppeteer project. There are many issues for screenshot related issues and it's generally a good idea to pay attention to these since the Puppeteer team will often provide updates on certain issues.
  • Become familiar with all the different versions of Puppeteer and Chrome and how they may affect your use cases. If you're comfortable using old versions of Puppeteer you may be able to solve certain problems that exist due to regressions or changes in functionality.
  • Understand the potential benefits of real-time DOM manipulation. In GetScreenshot we solve many of these problems by actively manipulating the DOM to create a rendered version that preserves the layout intentions. Currently, we don't share or explain these since we believe this part of our competitive advantage in such a crowded space. These are not simple techniques but you can start by exploring things like how to hot-swap images, stylesheets, and how to de-activate certain features that are added via attributes or browser-specific APIs. It's also a good idea to explore the Blink Render Engine implementation since it can give some clues on how Chromium displays the DOM and there are some ways to modify how the render engine works.

An afterword on GetScreenshot

As we said before we really would love if you give GetScreenshot a chance for your screenshot needs. We have a post where we explain why we believe screenshotting makes more sense as a managed service than as something you write yourself:

Why you shouldn't build your own Puppeteer Screenshot API Solution (A case for Puppeteer as a Service)

We really believe that we have a cool offering with unique features such as Webhooks, Email Delivery, and a Zapier Integration. We also believe that we have the best pricing in the market (5 USD for 2500 screenshots). That's at least 3X cheaper than many other solutions that do little to nothing when it comes to optimizing an accurate rendering.

Regardless of what route you choose to screenshot your websites, we would love to hear some of your thoughts and question. Please feel free to drop me a note any time at jj@rasterwise.com