Amazon Tagging

I notice that Amazon is jumping on the tagging bandwagon. They’re using word frequency statistics from books in their “Search Inside” program to auto-generate tags that they’re calling Statistically Improbable Phrases:

Amazon.com’s Statistically Improbable Phrases, or “SIPs”, are the most distinctive phrases in the text of books in the Search Inside! program. To identify SIPs, our computers scan the text of all books in Search Inside. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside books, that phrase is a SIP in that book.

The phrases are listed on a given book’s Amazon detail page, just below the “Product Details” area, and they’re linked to lists of other books that use the same phrase.

For example, “Productive writer” and “tormented writer” are both used three times in If on a winter’s night a traveler. Those phrases don’t appear together in any other books in Amazon’s sampling, but they’re each used once in a handful of titles. It’s sort of funny that “productive writer” largely appears in books about writing, while “tormented writer” appears mostly in books about writers.

Word frequency statistics are further broken down on a book’s Text Stats page, which includes a chart of the books 100 most frequently used words presented in a format that seems a little familiar.

Another way to get your word count kicks: Compare statistics from different translations of the same book.

No comments yet. Be the first.

Leave a reply