Stop! Using! Bad! Numbers!

(exclamation points, on the other hand…)

There’s a conversation I’ve had before. In terms of authors I fall on the weak end of the “Boo, Piracy!” side, and I especially fall on the extremely strong end of “Boo, DRM!” The basic gist of the conversation was something like this: “Well, you might not think it’s a big deal now, but wait until you see your book on the piracy sites, with all those downloads listed.” Well. Okay. I’ve waited. Now here I am. I am officially a published author. I officially have to worry about whether my sales numbers will be good enough, and whether they’ll justify another contract. I have, in fact, lost sleep over this.

You know what makes my head hurt when I worry about sales numbers and contract renewals? The fact that one of my local Borders didn’t get their shipment of my books in for two weeks, and worrying that this might be more than just a local error. You know what will make people buy my books, faster and with greater likelihood, than if I spent 20 hours a week filing takedown notices? Their finding my book in Target where they stopped by to get lightbulbs.

My guess is that maybe, maybe, 1% of the people who download my book will actually read it, and maybe, maybe, 1% of the ones who actually read it would have purchased it. I absolutely despise these “estimates” of the cost of piracy that just take the number of downloads and multiply it by the cost of the book, because that has no basis in reality.

The most recent such estimate has hit the twitter/authorosphere by means of Publisher’s Weekly, in which Attributor estimates that piracy costs the industry “as much as” $3 billion in lost sales. Already, this has been turned into “pirates cost the industry $3 billion!” Sugar plums dance in heads, as people imagine what their sales would be like with another 6 zeroes attached to the end.

But the study (you can read the whole thing here) isn’t worth the paper it isn’t printed on, and its findings have been lied about by the very people who ran the study. It is so egregious, that I am angry just thinking about it.

So let’s start with first things first: Note the source. Attributor is not a scientific outfit. They are not economists who have been trained to determine this sort of thing. What “Attributor” is, is a fee-charging service that tries to stamp out piracy for you. This means that Attributor has an incentive–a financial one–to convince authors and publishers that there is money to be made in stamping out piracy. Beware anyone with murky motivations.

Now let’s move on to the methodology.

Attributor estimated the cost of piracy at $3 billion dollars using the following methodology:

  1. It used the titles that it was tracking–that is, the titles where people had paid it money to hunt down and remove illegal copies. These titles are not listed in its methodology, but Publisher’s Weekly listed them as titles like, “Girl with the Dragon Tattoo” and “Angels and Demons.” Not precisely representative of book downloads in general.
  2. Somehow, it figured out what “market share” each potential hosting site represented. The methodology does not explain how it figured that.
  3. Four of those sites show how many “downloads” a title has. Using the estimated market share in part 2, Attributor stated that these sites represented 36.4% of all downloads. So it figured out the number of downloads by taking the number of downloads from those four sites, and dividing that number by 0.364. This gave them 9 million copies of books sold.
  4. Attributor looked up prices for these books, and multiplied price by downloads. This gave them a figure of $380 million.
  5. It then estimated that the 913 titles it was tracking represented 13.5% of the book publishing market. Again, no explanation is given as to how they measured this. By number of titles? (not possible; there are more than 10,000 books available for purchase). By percentage of books sold, per BookScan? I don’t really know where they get this number, but it’s pretty clear that the titles listed by Publisher’s Weekly represent very, very popular titles, and I’m not sure it’s fair to extrapolate from one set of books to the other, especially since their own findings demonstrated that there was variability in download rate for different types of books. In any event, they took $380 million and divided it by 0.135, which gave them $2.8 billion.
  6. They added $200,000,000 to the number to make it nice and round. No, I’m not joking. That gives you an idea about precisely how scientifically accurate this study is.

These numbers are useless. In the study’s methodology, it acknowledges that these numbers cannot even attempt to estimate financial loss:

(study here; page 5).

Which, of course, is why, Attributor, in announcing its findings, announced it thusly:

You know what I call that?

I call that dishonesty. The numbers themselves are drawn from nowhere, are unexplained, and use estimates that the survey methodology itself acknowledges render it useless for the determination of loss. But Attributor–who makes its money from publishers scared of piracy–has itself used those numbers to claim something that they can’t actually claim, and those numbers are now being disseminated around the web by people who call this fact.

Piracy is bad. But you know what? So is a dishonest representation of those findings, especially when those findings then become part of the debate about what should be done about piracy.

I am firmly opposed to piracy. But I am also firmly opposed to lies about piracy, and this is a lie, both in the “damned statistics” meaning of the word, and in the “knowing misstatement of the truth” sense of the word.

Shame on you, Attributor, for your misleading press release, and for your blog post stating in certain terms what you, yourself, internally said you hadn’t even attempted to estimate.

Courtney Milan writes historical romances, which might lead people to think that she could be cool. In reality, she's about four different kinds of geeky. At present, this blog is where Courtney applies semi-dormant geek skills to publishing.

17 thoughts on “Stop! Using! Bad! Numbers!

  1. That’s why I didn’t retweet that info. Those numbers sounded too off-the-wall for me. I also don’t have time to send off a lot of take-down letters, just to have the work put back up the next day.

    Thanks for rooting out the facts!

  2. Thank you.

    No, actually. Thank! You!

    Every one of my stories is pirated a dozen places. There are actually a few places where you can download 15 or 20 of them at once. No one can tell me I don’t understand how it feels. I get it! It stings!

    However, that doesn’t make shoddy math and ridiculous research okay. People take free stuff. They take it because it’s there, whether they’ll use it or not, and I have yet to have anyone show me conclusive proof that a large majority of those people are lost sales.

    I hate “studies” like this, especially when they’re used as an excuse to whip authors into a frenzy and enable the continued spread of misinformation.

    And there’s MY rant for the morning.

  3. Thank you, Bree. I’m not trying to say that piracy is no problem, or that it causes no harm.

    I am trying to say that I would really, really welcome a study that was actually scientific, and based on economic fact, rather than one that was based on hyperbole.

    I also want to add that the study is SO unscientific, that they haven’t even explained what time period their $3 billion estimate applies to. From their methodology, it appears that this is an estimate for only a 90-day period in 2009. Does this mean that they think publishing lost $12 billion to piracy last year? (But clearly not, as the report starts out saying that piracy represents around 10% of publishing industry sales, comparing their number to another year’s publishing figures.) Did they think that figure simply too outlandish to report? What the heck is up with that?

    Why can’t we have people who are reasonable conduct a study, who aren’t trying to PROVE something with their numbers?

  4. Why can’t we have people who are reasonable conduct a study, who aren’t trying to PROVE something with their numbers?

    I would find that fascinating. I certainly think piracy is going to have a growing impact going forward, and that it’s the sort of thing that could become a massive, massive problem if we don’t understand it. And maybe that’s what makes me most furious about these studies–they are looking for a specific answer and so they go straight to it, and any real data that could be used to form intelligent, useful strategies is lost in the shuffle.

  5. The one reasonable analysis I’ve seen of the real impact of piracy in publishing is courtesy of Carolyn Jewel, who found a guy doing real research and got him to show her his findings (hehe).

    She posted about this on the Dear Author discussion of this last October, but her input was mostly ignored amidst the emotional firestorm of “OMG pirates!” She re-posted the information on her own blog here:

    I advise anyone who is interested in a real look at the “Impact of P2P and Free Distribution on Book Sales” (that’s what the study was called) to check out Carolyn’s summary of the findings. You can has good numbers!

  6. Yes, it’s a shady, self-serving study with questionable metrics and nonsensical conclusions. That said, I still want to keelhaul all the pirates and torch all the sites that foster piracy.

  7. That said, I still want to keelhaul all the pirates and torch all the sites that foster piracy.

    I don’t, and I really don’t approve of this kind of language.

    And you know why?

    I don’t think that’s appropriate punishment for the measure of harm.

    Sorry. I know you don’t really mean that we should tie ropes to their legs and toss them in the water, to have their face ripped to shreds by the barnacles on the bottom of a boat, assuming they don’t drown first. But when I write a post saying “stop with the hyperbole,” I’m also going to call it out when it occurs in the comments.

    This part of the hyperbolic talk is just as bad. I don’t think someone should be keelhauled if they rob a bank of a million bucks and shoot two innocent bystanders. I certainly won’t support it because someone clicked “download.”

    Pirates should be subject to reasonable civil penalties, and if appropriate, criminal ones, that are commensurate with the harm inflicted.

    Someone not giving me my twenty cents in royalties just doesn’t justify physical harm.


  8. Actually, Theresa, let me put it another way.

    I don’t think the problem with piracy is that we need bigger punishments. The punishments we have now are quite, quite massive–many of these downloaders could get socked with damages of millions of dollars.

    The problem is that these punishments are too hard to enforce.

    We need the equivalent of a traffic ticket for downloaders–a fine, easy to levy, of a small amount (say $100), where the enforcement agency has an incentive to both levy the fines and collect the money.

    That would do more to combat piracy than actually keelhauling one a year.

  9. Hi Courtney,

    I work for Attributor and just came across your blog. We certainly struck a nerve with you and I’m not sure we’ll ever agree on the study, but I wanted to clarify a few things. You can always reach me at rich(at) too.

    While not 100% relevant, I wanted to point out that we offer a free service at which is a partnership with the Creative Commons to allow bloggers and freelancers to see who is reusing their content. We have also started the Fair Syndication Consortium to move past the takedown mentality and help newspapers and other publishers collect a fair share of revenue made from their work as it is reused across the Internet.

    In response to some of your points

    1) None of our customers’ titles were included in the study. We grabbed what we believe is representative of the industry and most were frontlist titles.

    2) The market share is based on the 52,000 successful takedowns we have sent since our service was launched in July ’09. This is mentioned in a footnote but we should have made this more explicit.

    3) The projection to total U.S. books was indeed tricky and it’s not perfect. The 913 titles were from publishers whose *entire catalog* represented 13.5% of U.S. Book Sales. So we assumed that those 913 titles represented these publishers entire catalog. We believe this is a conservative approach but invite other ideas or approaches.

    I’m happy to engage further – as you note, we were pretty transparent on our methodology and were clear about these numbers representing potential losses.

    We’ve been in this business for a little over six months and, by any measure, it’s a big and growing problem.

    Thanks for letting me respond!

  10. Rich, thanks for stopping by. I do have a number of questions.

    First, can you explain for me how you translate take down notifications into market share for download? At best, the number of take down notifications you send to a site shows the relative percentage of UPLOADS, or, more likely, the relative persistence of uploaders. It doesn’t in any way quantify the relative percentage of DOWNLOADS those sites represent.

    Second, how did you determine what percentage of book sales a particular catalog represented?

    Third, why did Attributor title its blog post with a claim that was clearly disclaimed in the text of the study itself? Because, you know, when your press release says “Online Piracy Costs Publishers Nearly $3 Billion,” that seems pretty misleading, especially since people so rarely read the fine print.

    Fourth, given your stated objectives, don’t you think it’s a little awkward to release a piracy study where you provide clearly labeled links to pages where people can download the copyrighted materials in question?

    Finally, does the person who conducted the study have any formal training in either economics or statistical methods?

  11. Hi Courtney – happy to respond!

    1) We are definitely using the uploads as our market share figure, because only four of the sites publish the download figures eliminating a straight apples-to-apples comparison. We also considered using web traffic as a proxy but found that it would have inflated the numbers substantially given Rapidshare’s enormous web traffic (e.g. In fact, many we spoke too thought the results were too conservative beacuse of this.

    2) We debated whether to disclose the source for this, but chose to keep it confidential to avoid embarrassing any specific publishers.

    3) Mea culpa on the headline. It needed to be short 🙂

    4) We did this to increase transparency so everyone can see for themselves. We’re planning to remove these at the end of the day from the post. In reality, it’s very easy to find these books.

    5) Yes indeed and thanks for asking. We have a .phd in statistics on our staff.

    I hope this helps, and I appreciate that you dug into this. Most of the questions are unfortunately much less thought out.

  12. Wow. Attributor’s response to their misleading title is “It needed to be short”?

    I’m glad Attributor weighed in on this since it’s their work that’s under discussion. While I appreciate the clarifications, I don’t see that all that much as been clarified.

    I’m really surprised that they think frontlist titles are representative of books that get pirated. As as author myself, it’s two of my oldest OOP books that I see pirated the most. In fact, it’s the backlist of the really famous authors that downloaders seem to be most interested in.

    The fact is, there isn’t yet sufficiently rigorous study of the issue and the ONLY person to attempt a valid study that might hold up to scrutiny has come to the opposite conclusion as Attibutor.

  13. I always enjoy it when people who don’t know you personally somehow think that they can talk circles around you, CM. It’s fun to watch them fail.

    I agree with Carolyn. I fail to see how shortening the title would make it less misleading.

  14. @Carolyn,

    Please do send me a copy of the study you reference. If it’s from Brian O’Leary, then I have it and agree with one its main conclusions: that the peer-to-peer threat is overstated – our study was exclusively focused on the one-click hosting sites (e.g. rapidshare)

  15. I have to laugh — I wrote a review once (for the Journal of the American Statistical Association) of a book called “Lies, Damned Lies, and Statistics.” (My most persistent memory of that experience is that I got the author’s address wrong because I didn’t know then that there was a Halifax in England as well as Nova Scotia…)

    The one aspect to Attributor’s study that worries me: sampling methodology. As we know from political surveys, the better the sampling methodology the more likely the conclusions about the entire group will prove to be accurate. What makes this study particularly tricky is that sampling has taken place at two times: first when the pirating sites decide which books to uh, offer, and second when Attributor decides which of those pirates and their “offerings” to track.

    I’m not a statistician, obviously. But common sense would suggest that the pirates are going to select primarily bestsellers to pirate. (Which sets your nightmare, Courtney, as a bad joke: Good news: your book is selling very well. Bad news: it’s selling well enough to be pirated.)

    That’s not to say that there isn’t somewhere a pirated copy available of less-than-best sellers.

    I have no doubt that the statistics (i.e., the actual math) used by the Attributor study are consistent and appropriate. But sampling theory is more than getting your chi-square tests right. This is a complicated problem, and when a company has a motive for its research, getting the result to appear both transparent and devoid of bias is that much harder. (Which is why Consumer Reports is so valuable.)

    I know this has been mentioned already and I apologize for restating the obvious, but I’m not convinced that people willing to download pirated material would actually pay for that material if the pirated stuff were not available. Therefore, any monetary conclusion of the effect of piracy on publishing would have to have a third level of sampling: sampling the downloaders. It doesn’t sound like Attributor did that. (The fact that the downloaders are stealing intellectual property might make it a bit harder to get them to cooperate with a study. That’s why sampling experts get paid well…)

Comments are closed.