Generating Screenshots of URLs using Google's secret magic API

The Problem

Being a developer, I have always wondered what would be the best way to capture screenshots of URLs. There are a lot of use cases for this. Consider the following ones:

  • When a link is posted, we could show a thumbnail of it.
  • When a URL is inputted by the user, we can fetch the screenshot to show if that's the intended one.
  • Testing the URL on different browsers.
  • Testing if the URL is accessible by the public.

And so on. Getting a screenshot of a rendered URL requires a lot of work:

  1. You need a complete headless server that's connected to the internet.
  2. The server should have a browser that's compatible with the web standards.
  3. There are also screen resolution requirements - mobile or desktop.

When I tried looking at the available options that I have, here's what I get:

Having said everything above, I gave up my idea of getting this thing working. I started looking into other stuff that might work. Later, I started building my website. Since it's up and running, I had to do some performance tests. Obviously, my first point was to go for Google's PageSpeed Insights API, which allows any webmaster to get a comprehensive report on how Google assess their page. It also includes a screenshot of how Google sees the webpage.

Here's how my website looks according to the service:

Google Page Speed Output

Luckily, I found something interesting there. The report that I get programmatically has a screenshot key, which holds the above images. Now let's dive into how this works.

Contents

  1. The Problem
  2. How "Stuff" Works
    1. API Endpoint
    2. Percent Encoded URl
    3. Request & Response
  3. Limitations
  4. Full Source Code
  5. Client Side
  6. Server Side PHP
  7. Summary

How "Stuff" Works

The Google's PageSpeed Insights API doesn't require an authentication! This is a huge bonus for developers like us to leverage the use of this API. Another good news about the API is that you don't need to be the website's owner to take the screenshot of it. This means that you can literally take a screenshot of any URL.

API Endpoint

All we need to do is to call the runPagespeed module of the API online using the following URL:

https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=  

If you look at it closely, you have three parameters.

  • screenshot=true: Required. Obviously, you need this for getting the screenshot.
  • strategy=mobile: Optional. If specified, it takes the mobile version of the URL. If you need the desktop version, please remove this parameter.
  • url=: Required. Here's where you will be sending the URL. Make sure that the URL is a a percent encoded URl.

Percent Encoded URl

Creating a percent encoded URl is very simple. Most of the server side and client side languages provide it for you. For example, in JavaScript, you can do it using encodeURIComponent():

// Encodes characters such as ?,=,/,&,:
console.log(encodeURIComponent("https://praveen.science/"));  
// Outputs: "https%3A%2F%2Fpraveen.science%2F"

Also, in server side, say PHP, you can use rawurlencode or the following custom function:

function encodeURIComponent($str) {  
    $url = array('%21'=>'!', '%2A'=>'*', '%27'=>"'", '%28'=>'(', '%29'=>')');
    return strtr(rawurlencode($str), $url);
}

This function works exactly how encodeURIComponent is defined:

encodeURIComponent escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )

There's more to this topic in PHP here at Stack Overflow. Do have a look at the answer by Gumbo and others.

Request & Response

When you send a HTTP GET request to this URL, say if I want to capture the screenshot of my website, I would send a HTTP GET request to:

https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=https%3A%2F%2Fpraveen.science%2F  

It does take some time to render this URL, so patience is virtue.

Response

If you look at it closely, you can find a key that says screenshot. If you expand it, you will further find:

Screenshot

  • data
  • mime_type
  • height and width

Using the above you can construct the image, but you be aware of a few things that Google does it for us to get our heads around. The data key holds the base64 version of the image, but with some changes. To get the complete data-uri scheme:

  • All the _ should be changed to /.
  • All the - should be changed to +.
  • Then you should affix the data: with mime_type's value and ;base64,.

Now you have got the data-uri scheme version of the image. This can be used in JavaScript to create a client side image and set the source.

Limitations

This API isn't perfect.

  • Image width is 320px, which is definitely not great for high-resolution screenshots.
  • Web fonts can be challenging. If you are using Google fonts, Google seems to render most of them.
  • There's no way to pass authentication or cookie data - so you just get the "public" view of the page.
  • Similarly, no POST data - although GET is fine.
  • Plugins like Flash & Java may not work. Anyway, who's going to use Flash now-a-days.
  • Complex JavaScript pages won't necessarily work. Sorry SPA (Angular and React JS) folks!
  • It's a bit slow to generate the report. But it's worth the wait.
  • Only one rendering - so you can't use it to see how Firefox compares to Chrome.

Overall, not a perfect solution, but for quickly generating a screenshot without needing to install anything, it's pretty good.

Full Source Code

Client Side

<!DOCTYPE html>  
<html lang="en">  
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width" />
    <title>Get Screenshot</title>
    <script src="https://code.jquery.com/jquery-2.2.4.js"></script>
    <script>
      $(function () {
        // Get the URL.
        var url = "https://praveen.science/";
        // Prepare the URL.
        url = encodeURIComponent(url);
        // Hit the Google Page Speed API.
        $.get("https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + url, function (data) {
          // Get the screenshot data.
          var screenshot = data.screenshot;
          // Convert the Google's Data to Data URI scheme.
          var imageData = screenshot.data.replace(/_/g, "/").replace(/-/g, "+");
          // Build the Data URI.
          var dataURI = "data:" + screenshot.mime_type + ";base64," + imageData;
          // Set the image's source.
          $("img").attr("src", dataURI);
        });
      });
    </script>
  </head>
  <body>
    <h1>Hard Coded Screenshot of my Website:</h1>
    <img src="//placehold.it/300x50?text=Loading+Screenshot..." alt="Screenshot" />
  </body>
</html>  

See the Pen Screenshot using Google PageSpeed Insights API by Praveen Kumar (@praveenscience) on CodePen.

Server Side PHP

<?php  
  // Creating a proxy to use GET request to hit the Google Page Speed API and receive a screenshot.
  // Check if the URL parameter for our proxy is set.
  if (!empty($_GET['url'])) {
    // Make sure the given value is a URL.
    if (filter_var($_GET['url'], FILTER_VALIDATE_URL)) {
      // Hit the Google PageSpeed Insights API.
      // Catch: Your server needs to allow file_get_contents() to make this run. Or you need to use cURL.
      $googlePagespeedResponse = file_get_contents("https://www.googleapis.com/pagespeedonline/v2/runPagespeed?screenshot=true&url={$_GET['url']}");

      // Convert the JSON response into an array.
      $googlePagespeedObject = json_decode($googlePagespeedResponse, true);

      // Grab the Screenshot data.
      $screenshot = $googlePagespeedObject['screenshot']['data'];
      // Replace Google's anamolies.
      $screenshot = str_replace(array('_','-'), array('/','+'), $screenshot);

      // Build the Data URI scheme and spit out an <img /> Tag.
      echo "<img src=\"data:image/jpeg;base64,{$screenshot}\" alt=\"Screenshot\" />";
    } else {
      // If not a valid URL.
      echo "Given URL is not valid.";
    }
  } else {
    // URL not set.
    echo "You need to specify the URL.";
  }
?>

Summary

Not sure how long Google's PageSpeed Insights API will be free and open to use. Make hay while the sun shines. Hope this article was useful. Do let me know what you think about this in the comments below. Meet you soon until my next article.



comments powered by Disqus