Generating Screenshots of URLs using Google's secret magic API
The Problem
Being a developer, I have always wondered what would be the best way to capture screenshots of URLs. There are a lot of use cases for this. Consider the following ones:
- When a link is posted, we could show a thumbnail of it.
- When a URL is inputted by the user, we can fetch the screenshot to show if that's the intended one.
- Testing the URL on different browsers.
- Testing if the URL is accessible by the public.
And so on. Getting a screenshot of a rendered URL requires a lot of work:
- You need a complete headless server that's connected to the internet.
- The server should have a browser that's compatible with the web standards.
- There are also screen resolution requirements - mobile or desktop.
When I tried looking at the available options that I have, here's what I get:
- Python script that takes screenshots (browsershots) using webkit
python-webkit2png
that doesn't run on a headless server. - Screen Capture with PhantomJS that requires installation of complex libraries.
- Or well, if you are rich, please shell out your pocket for a similar service. Sorry guys, I am not that rich.
Having said everything above, I gave up my idea of getting this thing working. I started looking into other stuff that might work. Later, I started building my website. Since it's up and running, I had to do some performance tests. Obviously, my first point was to go for Google's PageSpeed Insights API, which allows any webmaster to get a comprehensive report on how Google assess their page. It also includes a screenshot of how Google sees the webpage.
Here's how my website looks according to the service:
Luckily, I found something interesting there. The report that I get programmatically has a screenshot
key, which holds the above images. Now let's dive into how this works.
Contents
How "Stuff" Works
The Google's PageSpeed Insights API doesn't require an authentication! This is a huge bonus for developers like us to leverage the use of this API. Another good news about the API is that you don't need to be the website's owner to take the screenshot of it. This means that you can literally take a screenshot of any URL.
API Endpoint
All we need to do is to call the runPagespeed
module of the API online using the following URL:
https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=
If you look at it closely, you have three parameters.
screenshot=true
: Required. Obviously, you need this for getting the screenshot.strategy=mobile
: Optional. If specified, it takes the mobile version of the URL. If you need the desktop version, please remove this parameter.url=
: Required. Here's where you will be sending the URL. Make sure that the URL is a a percent encoded URl.
Percent Encoded URl
Creating a percent encoded URl is very simple. Most of the server side and client side languages provide it for you. For example, in JavaScript, you can do it using encodeURIComponent()
:
// Encodes characters such as ?,=,/,&,:
console.log(encodeURIComponent("https://praveen.science/"));
// Outputs: "https%3A%2F%2Fpraveen.science%2F"
Also, in server side, say PHP, you can use rawurlencode
or the following custom function:
function encodeURIComponent($str) {
$url = array('%21'=>'!', '%2A'=>'*', '%27'=>"'", '%28'=>'(', '%29'=>')');
return strtr(rawurlencode($str), $url);
}
This function works exactly how encodeURIComponent
is defined:
encodeURIComponent
escapes all characters except the following: alphabetic, decimal digits,-
_
.
!
~
*
'
(
)
There's more to this topic in PHP here at Stack Overflow. Do have a look at the answer by Gumbo and others.
Request & Response
When you send a HTTP GET
request to this URL, say if I want to capture the screenshot of my website, I would send a HTTP GET
request to:
https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=https%3A%2F%2Fpraveen.science%2F
It does take some time to render this URL, so patience is virtue.
If you look at it closely, you can find a key that says screenshot
. If you expand it, you will further find:
data
mime_type
height
andwidth
Using the above you can construct the image, but you be aware of a few things that Google does it for us to get our heads around. The data
key holds the base64 version of the image, but with some changes. To get the complete data-uri
scheme:
- All the
_
should be changed to/
. - All the
-
should be changed to+
. - Then you should affix the
data:
withmime_type
's value and;base64,
.
Now you have got the data-uri
scheme version of the image. This can be used in JavaScript to create a client side image and set the source.
Limitations
This API isn't perfect.
- Image width is
320px
, which is definitely not great for high-resolution screenshots. - Web fonts can be challenging. If you are using Google fonts, Google seems to render most of them.
- There's no way to pass authentication or cookie data - so you just get the "public" view of the page.
- Similarly, no POST data - although GET is fine.
- Plugins like Flash & Java may not work. Anyway, who's going to use Flash now-a-days.
- Complex JavaScript pages won't necessarily work. Sorry SPA (Angular and React JS) folks!
- It's a bit slow to generate the report. But it's worth the wait.
- Only one rendering - so you can't use it to see how Firefox compares to Chrome.
Overall, not a perfect solution, but for quickly generating a screenshot without needing to install anything, it's pretty good.
Full Source Code
Client Side
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width" />
<title>Get Screenshot</title>
<script src="https://code.jquery.com/jquery-2.2.4.js"></script>
<script>
$(function () {
// Get the URL.
var url = "https://praveen.science/";
// Prepare the URL.
url = encodeURIComponent(url);
// Hit the Google Page Speed API.
$.get("https://www.googleapis.com/pagespeedonline/v1/runPagespeed?screenshot=true&strategy=mobile&url=" + url, function (data) {
// Get the screenshot data.
var screenshot = data.screenshot;
// Convert the Google's Data to Data URI scheme.
var imageData = screenshot.data.replace(/_/g, "/").replace(/-/g, "+");
// Build the Data URI.
var dataURI = "data:" + screenshot.mime_type + ";base64," + imageData;
// Set the image's source.
$("img").attr("src", dataURI);
});
});
</script>
</head>
<body>
<h1>Hard Coded Screenshot of my Website:</h1>
<img src="//placehold.it/300x50?text=Loading+Screenshot..." alt="Screenshot" />
</body>
</html>
See the Pen Screenshot using Google PageSpeed Insights API by Praveen Kumar (@praveenscience) on CodePen.
Server Side PHP
<?php
// Creating a proxy to use GET request to hit the Google Page Speed API and receive a screenshot.
// Check if the URL parameter for our proxy is set.
if (!empty($_GET['url'])) {
// Make sure the given value is a URL.
if (filter_var($_GET['url'], FILTER_VALIDATE_URL)) {
// Hit the Google PageSpeed Insights API.
// Catch: Your server needs to allow file_get_contents() to make this run. Or you need to use cURL.
$googlePagespeedResponse = file_get_contents("https://www.googleapis.com/pagespeedonline/v2/runPagespeed?screenshot=true&url={$_GET['url']}");
// Convert the JSON response into an array.
$googlePagespeedObject = json_decode($googlePagespeedResponse, true);
// Grab the Screenshot data.
$screenshot = $googlePagespeedObject['screenshot']['data'];
// Replace Google's anamolies.
$screenshot = str_replace(array('_','-'), array('/','+'), $screenshot);
// Build the Data URI scheme and spit out an <img /> Tag.
echo "<img src=\"data:image/jpeg;base64,{$screenshot}\" alt=\"Screenshot\" />";
} else {
// If not a valid URL.
echo "Given URL is not valid.";
}
} else {
// URL not set.
echo "You need to specify the URL.";
}
?>
Summary
Not sure how long Google's PageSpeed Insights API will be free and open to use. Make hay while the sun shines. Hope this article was useful. Do let me know what you think about this in the comments below. Meet you soon until my next article.