My quest to make a better Kindle Direct Publishing report using their “nonpublic” API data.

As I’ve lamented previously, Amazon’s Kindle Direct Publishing reporting tools (if thats what you want to call them) leave much to be desired. Right off the bat the data is capped to 90 days, which kind of just sucks. Most of the reports export to Excel and don’t give you meaningful sums and the one graph they do provide is just unit sales over a short time period and total sales for the same time period (which is nice by itself, but thats the only functionality really present).

I dug in a little and found the JSON they are retrieving every time you tweak the filter settings in the report drop downs. Low and behold the link to the API has easy to change querystring parameters. The best part is that you can go back to the beginning of time so I’m no longer bound by their weird 90 day restriction imposed on the frontend.

Screen Shot 2014-06-17 at 11.40.31 AM

I wanted to make a quick and dirty dashboard using this new data but I quickly ran into a problem. I couldn’t consume the API endpoint from the front-end because of cross-domain requests. So I figured I’d just consume the endpoint from Node.js using Request and pass the data back to the browser as JSON but from my own domain.

The first big snag I hit was that Amazon requires you to be logged in when you make a request to that API endpoint (understandably). Since I’m not technically using an actual API, theres no easy way to get around this.

PhantomJS to the rescue

Up to this point, I’ve only used PhantomJS to run automated front-end tests. For that reason it works great, but maybe you didn’t know that its truly a headless browser that you can use to do anything you want. This includes visiting a webpage, and interacting with it.

I setup a somewhat trivial Node.js script that Phantom will use to connect to Amazon.com via the API url, notice that its prompting for me to login, automate entering my email address and password and clicking the submit button. After that I wait a short amount of time and then try the API url again, this time assuming that I’m logged in. The pure JSON data that gets returned I simply log out to stdout.

I can test it out by simply executing the Phantom CLI and pointing it to the script I wrote:

$ phantomjs amazon.js

Which logs right to the screen the JSON from the API. So far so good! Here is the code for the amazon.js Phantom script:

[sourcecode lang=”javascript”]

/* jshint node: true */

‘use strict’;

var loadInProgress = false,

interval = 0,

page = require(‘webpage’).create(),

moment = require(‘moment’),

startDate = ‘2014-01-01’,

endDate = moment().format(‘YYYY-MM-DD’),

url = ‘https://kdp.amazon.com/reports/data?customerID=YOUR_CUSTOMER_ID_HERE&sessionID=sessionK&type=OBR&marketplaces=all&asins=YOUR_ASIN_GOES_HERE&startDate=’ + startDate + ‘&endDate=’ + endDate + ‘&_=0000000000’;

page.onLoadStarted = function() {

loadInProgress = true;

};

page.onLoadFinished = function() {

loadInProgress = false;

};

page.settings.userAgent = “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11”;

page.open(url, function(status) {

page.evaluate(function() {

document.getElementById(‘ap_email’).value = ‘YOUR_EMAIL@HERE.COM’;

document.getElementById(‘ap_password’).value = ‘YOURPASSWORD’;

document.getElementById(‘ap_signin_form’).submit();

});

interval = setInterval(function(){

if(!loadInProgress) {

page.open(url, function(status) {

if(status === ‘success’) {

var data = page.content.replace(‘’,”).replace(‘’,”);

console.log(data);

} else {

console.log(‘error’);

}

phantom.exit();

});

clearInterval(interval);

}

},50);

});

[/sourcecode]

Execute the Phantom shell command from Node.js

Now that I have a command that I can run from the command line, I need to be able to consume that output from a small node app. I wrote a small Express app that will serve up some HTML to display the dashboard itself, but also provide an endpoint that will return the JSON from Amazon’s endpoint.

The endpoint will execute the Phantom command and capture the stdout to a variable. I’m using the node module child_process to execute the command. The data thats captured from the command is passed to a callback function and uses JSON.parse(data) to convert it to actual JSON before just serving that up using Express’ res.json send command. Heres the code for the node module that executes the Phantom command and returns the captured data:

[sourcecode lang=”javascript”]

var sys = require(‘sys’),

exec = require(‘child_process’).exec,

child;

module.exports = function(callback) {

child = exec(‘phantomjs amazon/amazon.js’, function(error, stdout, stderr){

if (error !== null) {

callback(false);

} else {

callback(stdout);

}

});

};

[/sourcecode]

Using the JSON thats returned from my own endpoint, I can then use jQuery to do whatever I want with it: parse it, loop through it, generate some totals, generate some bar graphs, etc.

Screen Shot 2014-06-17 at 11.39.13 AM

Summary

I wrote a super small Express web app that serves up some HTML for a dashboard. An endpoint is provided thats called /data that the front-end code makes an AJAX request to to retrieve the KDP data. The endpoint itself actually executes a command on the shell to launch a PhantomJS script which will connect to Amazon.com, login, and retrieve and return the API data.