Tag: data

  • The Unbearable Lightness of Amazon Sales Rank

    The Unbearable Lightness of Amazon Sales Rank

    How does a book climb to the top of Amazon charts? Many seek the secrets of the big river algorithm: authors, artists, entrepreneurs, scammers, publishers, used booksellers, scholars of literature, and me, nucky of nickywebsite.com.

    Discussing proprietary algorithms is cyber-divination, trying to figure out the future from hints, spreadsheets, the stars, shadows on a cave’s wall. The code exists, but none of us will ever see it even though it controls us like God.

    In this post, I consider the complex calculation of Amazon Sales Ranking, a.k.a. Amazon Best Seller Rank, a.k.a. the most popular products and numbers, featuring classic books and numbers from 1 to 100.

    Amazon data isn’t good for making business decisions. It is suitable for seeing what’s on the minds of a large group of people. They are sad. They want to color.

    Do You Have What It Takes to Be a Best Seller?

    The impetus for writing this piece was a Google data class that suggested students analyze a Kaggle dataset of Amazon’s Top 50 Bestselling Books 2009 – 2019, then imagine a business owner and offer them advice with this data.

    Thousands of Kaggle projects used this dataset and presented this list of books as if they were the most popular of the decade. The analysts seem to have no idea what they’re looking at: a theme of the 21st century.

    I offer all business owners the free advice: you are not Amazon and never will be. Amazon can use this data to make decisions, but your puny business cannot. Amazon can buy in massive quantities, getting products literally for a dime a dozen. Your little baby business probably pays wholesale costs like a chump.

    You’re not Amazon!

    Caveat: on the off chance someone at Amazon is reading this, you are Amazon, and you alone have permission to use sales rank to make business decisions. But guess what you already do.

    I chuckle, imagining the hubris of taking this spreadsheet to a bookstore and suggesting they “implement” this data. This dataset is six years old, incomplete, and poorly documented.

    It lacks key identifying information about the books (ISBN); there’s no column denoting the book’s ranking within the yearly Top 50 (Sales Rank). Unlabeled, duplicate records could correspond to format (hardcover/paperback, not listed), or records might have been best sellers in multiple years or just be mistakes. Price data was pulled at the book’s lowest ever Amazon price, including sales and $0 promotions, so it’s ineffective for price analysis; there’s no standard for handling series books, and some are counted twice with collections (ex. Divergent/Insurgent is counted as one record, Divergent as another). I could go on!

    Ironically, best sellers like these are some of the worst books to sell because you’re competing with Amazon. There are tons of copies; sometimes, you can find them for free in the little free libraries, and even when priced at $5 or $6, most people would prefer to get a new copy from Amazon for $8 or $10. It would be foolish for any small business to compete for the same buyers.

    Things get much more interesting if the data is framed solely as a story about what books people bought and read between 2009 and 2019.

    Desires, Books and Machines

    This data hints at the way Amazon sells books.

    Judging by the book type classification, Amazon buyers prefer non-fiction.

    What if a book was defined as a solution to a problem? Not a thing you read for fun, but something you read to fix something else.

    Try a visualization with me.

    Imagine you’re buying stuff on Amazon. Replenishing essentials, paper towels, smoothie powder, a bulk package of “knockoff” Mr. Clean Magic Erasers that work just like the real ones from a reseller in Montana who buys them from China. You meant to order this earlier. You’re almost out.

    And as you do this, you feel an inescapable feeling. A profound dread for the future and a certainty about life’s meaninglessness: something is wrong with the world and you’re not entirely sure why you’re still in it.

    One-day shipping is too fast. How is that sustainable?

    Why are people going out at 3 AM to deliver smoothie powder?

    Why are melamine foam sponges $4 at the store when you can buy them directly from the manufacturer for $0.10 each? And no way do I believe it’s because of the extraordinary product-pitching abilities of that weird, silent, bald man, Mr Clean.

    And why do I always have to clean up so much shit? Why are the messes so much worse that we need magic to clean them?

    And why does it seem like I’m always broke despite all my money-saving strategies? Why does going to a job five days a week, scrimping and saving, still not make enough money?

    Why doesn’t the government do something?

    How much longer will any of this last?

    Why am I lonely?

    Why am I even alive?

    Is there a God?

    What is the point of any of this?

    Then you happen to see a book that addresses one or multiple of these concerns that you were just ambiently ruminating about, and it promises to solve your problem. Wow! It’s like magic. This is the self-help genre.

    Amazon continues to be the most profitable company in the world because it has identified that self-help books are a low-cost product that can be subtly advertised to people who don’t typically buy books. This will increase cart totals by a few percentage points on a mass scale because most cart totals are less than $100.

    Classics of the genre are still on the best-seller list. How to Win Friends & Influence People (1912) hit its 100th anniversary in the period surveyed. It was written in a halcyon time before cars or air travel when Americans believed you could go out, make friends, and influence people. Now consider the stark difference in the optimism of that title and the outlooks expressed in popular self-help titles from the 2010s.

    Things seem bleak.

    • 12 Rules for Life: An Antidote to Chaos
    • Arguing with Idiots: How to Stop Small Minds and Big Government
    • Calm the F*ck Down: An Irreverent Adult Coloring Book
    • Can’t Hurt Me: Master Your Mind and Defy the Odds
    • Delivering Happiness: A Path to Profits, Passion, and Purpose
    • Divine Soul Mind Body Healing and Transmission System: The Divine Way to Heal You, Humanity, Mother Earth, and All…
    • Eat This Not That! Supermarket Survival Guide: The No-Diet Weight Loss Solution
    • Girl, Wash Your Face: Stop Believing the Lies About Who You Are So You Can Become Who You Were Meant to Be
    • It’s Not Supposed to Be This Way: Finding Unexpected Strength When Disappointments Leave You Shattered
    • Make Your Bed: Little Things That Can Change Your Life…And Maybe the World
    • One Thousand Gifts: A Dare to Live Fully Right Where You Are
    • Option B: Facing Adversity, Building Resilience, and Finding Joy
    • Quiet: The Power of Introverts in a World That Can’t Stop Talking
    • Radical: Taking Back Your Faith from the American Dream
    • Ship of Fools: How a Selfish Ruling Class Is Bringing America to the Brink of Revolution
    • Soul Healing Miracles: Ancient and New Sacred Wisdom, Knowledge, and Practical Techniques for Healing the Spiritual…
    • The 4 Hour Body: An Uncommon Guide to Rapid Fat Loss, Incredible Sex and Becoming Superhuman
    • The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
    • The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness
    • Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers
    • What Happened
    • You Are a Badass: How to Stop Doubting Your Greatness and Start Living an Awesome Life

    Read these titles as imperatives. Liminal, direct messaging telling you what to do, how you feel, why everything is wrong, and even offering an actionable antidote to all of life’s problems.

    People need help making the bed, washing their face, surviving the chaos, responding to impossible changes, comprehending the heinous greed of the ruling class, accepting their position in life, and both simultaneously caring enough to make things better and subtly not giving a f*ck because you need to focus on yourself. You are a badass. You can use the tools of titans to make yourself happy and profit!

    One needn’t be a certified Schizoanalyst to notice a pattern in these titles. In aggregate, these books seem like they’re written for people with intense feelings and disordered minds. A pervasive sense of dread permeates throughout our culture. Then, these feelings are commodified as add-ons to our carts.

    The practical purpose of a metric tracking the best-selling products is to sell more products.

    Coloring Reality

    A whimsical and playful illustration of the Eye of Providence atop a pyramid, resembling a coloring book style but instead of crayon coloring, surrounded by delivery drones | Credit DALL-E, ChatGPT

    Amazon categorizes coloring books as non-fiction. I disagree. Consider this book.

    Unicorn Coloring Book: For Kids Ages 4-8 (US Edition) (Silly Bear Coloring Books)

    What is non-fiction about this? Unicorns? Silly bears making coloring books? Indeed, that’s fun, fantastic fiction!

    But perhaps because the act of coloring is an act that exists in reality. Coloring is a non-fictional act. The book’s purpose is to color it, so its content is ancillary. That’s true.

    Out of 190 non-fiction books, 13 are coloring books, and 8 are marked for adults. Perhaps because the designs are complex, maybe because they have swears in them (specifically, Calm the F*ck Down: An Irreverent Adult Coloring Book).

    I took a screenshot of me spreadsheet | Credit: Me

    I recall the “adult coloring” book trend of the 2010s; even Chuck Palahniuk tried to capitalize on it. I always found it sad that people needed the “Adult” labeling to permit them to color. To rephrase a Mitch Hedberg joke, “Any book is an adult coloring book if an adult colors the pages.”

    Some coloring books are marked as “adult” but feature topics that seem fun for a child to color. Owls and cats are great images to color in for all ages.

    Thankfully, most children will ignore this label.

    In a few instances, the coloring books were about topics that seemed overly complex for a child but presented too simply for an adult to understand.

    Issues like anxiety, feminism, and meditative practices of East Asia are essential topics, and introducing them with a coloring book seems like an excellent idea for a young reader. One would hope an adult might consider a book on the topics with more words.

    Personally, I’d pick the coloring book about the owls.

    There’s a motivational journal on the list too. And while it’s not classified this way, this is the true adult’s coloring book. A book that teaches you cantrips to complete tasks and projects you don’t want to finish.

    What if one wants to draw in one’s motivation journal? And what if they’re going to color it in, too?

    These labels imply that coloring is an act only some people are allowed to do, only sometimes. If you’re an adult, you can color a little bit after a long day of stressful work. But only if there’s a purpose to it, like relieving stress or considering a complex and nuanced ideological framing.

    But I’m an adult, and you’re probably an adult, and I wish we all knew that this is untrue. You can color for any reason you want.

    Color the cats. Color the owls. Color code your annotations of Gender Trouble. Draw in your notebook and color in the drawings to motivate yourself.

    As I write this on January 25th, 2025, four adult coloring books are on the Top 50 Amazon Best Sellers list. All four of them look fun for everyone.

    Scratching the Surface

    What is the true scope of this data? We know “a lot” of people bought these books, but how many? A question that’s easy to articulate but difficult to answer “How many copies of 1984 have been sold?” This might remain forever unknown.

    There’s more to consider about Amazon’s Sales Rank and its effect on books in the 21st century. I hope to consider topics like the downward price pressure on books, the trending increase in product reviews over time, how tastes have changed in the last six years, and more in future posts.

    But 2000 words is enough for now because, honestly, I convinced myself to color some pictures I drew in my journal with some fine-tipped markers I bought on Amazon!


  • Data Analysis of My 2023 Reading

    Data Analysis of My 2023 Reading

    Three years ago, I started a reading database. In 2023, I started learning about the data analysis process and, indeed, spent the best parts of the year reading and learning about data.

    While many posts try to convince their reader to build a “Second Brain,” this post has no such Frankensteinian ambitions. Instead, I will show you my virtual brain, and we will consider what happens when you habitually use the e-brain and how it impacts reading and writing.

    The Big Numbers

    First, how many books did I read in the year? The 21st century adds significant complexity to an otherwise simple question. The books were primarily digital, borrowed, electronic, or audio. I could not pile the books. I often think about book piles as I write a newsletter about my reading piles, but there is no pile. The Pile Is A Lie.

    So, without a database, this simple question would be difficult to answer because most books were not printed on physical paper.

    Counting big numbers showed me interesting things about the veracity of truth and the grand, epistemological nature of knowledge. Far out, number stuff.

    Number of Books – VERIFIED TRUSTWORTHY!

    One hundred eighty-three books is a strongly verifiably datapoint. I opened many books and went from the first to the last page. It averages 3.5 books a week. Of course, this is all self-reported data. I could have forgotten to log a book, or I could be making all this stuff up! But to me, 183 is a strongly knowable number.

    What’s interesting about this number is that I cared a lot about it from 1/1/2023 to 12/31/2023, but today, it is one of the least insightful numbers because the collection is complete. Creating the records drives the collection, but once it’s done, the records become self-evident.

    Number of Pages – UNTRUSTWORTHY!

    Here, things get vaguer and much more imprecise. I am not confident I read 50,574 pages this year for a few reasons. I read about 40% of non-fiction books, which include indexes and footnotes, valuable tools that show the skeleton of a good, well-argued book (or the amorphous blob of a poorly written one). But these pages make the numbers go up arbitrarily.

    Furthermore, page count information online is not reliable. I’ve decided to track this more closely this year. I’ve found wrongly reported numbers of numerous books. I’ve held a copy of a book in my hands and saw a different number on Amazon. Various editions, layouts, and formats all impact a quantifiable fact. But despite it being knowable information, it is not always known. Amazon has consistent errors, and this imprecision must somehow affect their warehouse usage and shipping costs. While exact page counts of books is mostly useless, random information, I find this imprecision hints at the fallibility of Big Data and monopoly capital.

    Authors Read – VERIFIED TRUSTWORTHY!

    In 2022, I read 73 authors and 91 books in total. Thus, I set the goal to read 100 authors. Achieving this goal was one of the drives pushing me to read this much in a year. Again, this is easy to count. I limited each book (record) to one author field, so multi-author books list both authors but count as one author.

    Something I realized binging Elmore Leonard novels was that getting the hang of an author’s style dramatically increases the speed at which one reads their books. I could start pounding these out in a day, even with other stuff to do! I think this quality explains why readers can voraciously consume an author like Brandon Sanderson, who’s written dozens of books and well over a million pages.

    Fiction Vs. Non-Fiction – UNTRUSTWORTHY!

    A new edition to my database, [Fiction: Y/N?]. The checkbox asks the user to consider what is true and what is false.

    I read Mike Tyson’s memoir, Undisputed Truth, which approaches this question from the title. Mike admits to a lot in the book, and there are undoubtedly truths in it. I marked this non-fiction.

    But a book I marked fiction, The Siberian Job, begins with an introduction stating how the book is fact but barely fictionalized because those involved started getting death threats. The book might be total bullshit, but there are also a lot of truths about privatization in the USSR.

    Or one like James Ellroy’s essay collection, Destination Morgue, which is half fiction, half non-fiction, the author recounting his childhood improprieties of huffing paint and breaking into houses, but it’s also fictionalized accounts of him as a police officer solving murders.

    That’s to say nothing of obvious roman à clef French, for a book about real life that probably got the author a vicious enemy. No Longer Human by Osamu Dazai and Queer by William S. Burroughs are two such examples.

    To binarize fact/fiction is to oversimplify reality into an abstraction that doesn’t always capture narratives’ strange, contradictory nature.


    Genre & Format

    Genre –

    A pie chart that doubles as a nifty abstraction for my brain.

    Categorizing what I read has made me keener on the contours of subcategorizations. Whereas most bookstores shelf together crime, mystery, and true crime, because they’re my most represented reading, I break them apart.

    Format –

    Analyzed independently of the content genre was the information’s format. Or, how did I consume the book? Did I read it in physical or digital? Did I own it or borrow it?

    There’s a clear insight from this chart. 64.1% of reading was done with the library. Using digital library apps makes reading cost-free. Most of the books I borrowed were in audio format. These are too expensive to buy individually, so a cost-free way of accessing this information doubled my consumption.

    Multi-modal consumption made reading and writing easier. Borrowing the audio and the digital copies allowed me to copy and paste text or highlight useful passages.

    Next year, I plan to correlate genre and medium and see which books I prefer in which mediums. I currently need to gain the technical skills to do that.


    Age and Quality of Information

    Publication year

    This tells me the age of the information consumed. Most of them are very new. My most represented year is 2023, with 34 books, and books read from the 21st century (137) outnumber books from any previous century (46).

    I think this is for a few reasons:

    This graph prompts the most straightforward conclusion of how to broaden my reading depth. I only read things from the 20th and 21st century. I don’t read any classics. In 2024, I intend to read something written in the 19th century.

    How did I read so many new books? I think there’s three reasons.

    1. Advanced Reader Copies
      • Requesting ARCs explains why 2023 is the year most represented in the data. With a Goodreads account and a website, I could receive free copies in exchange for a review. ARCs are an efficient way to read contemporary books.
    2. Library Power User
      • Again, the library keeps more recent materials in its collection. Because I use the library, I am more likely to read more current work.
    3. Publishing Press Reader
      • I’m also subject to advertising for new books. Because I read blogs like ShelfAwareness, Booklist, and Crimereads, I am recommended new books more frequently than old ones.

    Quality of information

    This is arguably the most subjective data point captured. I grade a book on a 5-star scale and am not a harsh grader. I like most books I read, with the overwhelming majority getting five stars. I find rating the books to be my least favorite part of this process, as it feels arbitrary and inconsequential. So what, more or less? I grade the information on whether I found it relevant. Did I enjoy the thrill of the story? How does it compare to others in the genre? Unlike comparisons is the problem with this category. How do I compare a reference book on sentence structures to an expose of an FBI coverup of the Osage massacre to a literary masterpiece to an enjoyable potboiler thriller? I can’t! I could rank the books I read this year, but that would be excruciatingly dull for me and the reader. If you want that, feel free to print out the blog post, cut it up, and rank it accordingly. Which one was better? WGAF?

    I’ve also noticed this is the field I’m most likely to revise. If I pick a number in a bad mood, it’s lower than one in a good mood. It’s capricious, but my awareness keeps it relevant enough to keep recording. Perhaps it will be fun to compare year after year after a while.


    Narrative Voice

    My favorite category in the database is narrative voice. It asks, how is the story told, whatever the story is?

    Traditionally, there’s first person (I, me), second person (you, yourself), and third person (He, his name), and it’s a way to define fictional tellings. Who Says? By Lisa Zeidner inspired to look closer at narrative voice and how an author chooses to tell a story. Books can have more than one narrative voice. The bar has multiple colors overlapping to show this.

    This year, I noticed more texture in non-fiction tellings. These are terms or flavors. I mostly made up. Historical would be citing multiple sources; Essay is opinionated writing; How To is a descriptive guide; Reporting is interviewing first-hand sources; Fandom is writing about a thing one passionately engaged with for multiple years. I intend to write up something longer about this column in 2024.


    Takeaways

    I have internalized some big takeaways as I am steadfast about continuing and deepening this project in 2024.

    Libraries vs. Book Buying

    I started using the library with regularity in 2022, but I still bought books. I quit my job in 2023, so I “stopped” buying books (I buy fewer anyway). This has dramatically increased my reading, and I now think buying books wastes time more than it wastes money. The biggest takeaway for me in 2023: buying books cuts into reading time. When browsing the library, the evaluation is so much simpler. “Does this look cool?” When buying a book, I ask about ten questions from “Is this a reference text?” To “Do I want to lug this around to more apartments?” Hours wasted browsing used bookstores were better spent walking and listening to books. Last year, I saved thousands of dollars not buying books—this year, I am thinking of it differently. I’ve internalized library usage, and I’ll never buy books in significant numbers again (lol, says the addict, time will tell).

    It Is Surprisingly Easy To Read New Stuff For Free in 2023

    This year, I realized what many book reviewers already knew: reading new books in exchange for a review is a valuable service readers provide. Publishers and authors need this, and asking for free virtual copies is not an imposition on them. Without 5-star reviews, new books are not aggregated on buying markets like Amazon, Kindle, Barnes and Noble, or even library apps like Hoopla. So, in 2024, I’ll keep asking for more free stuff!

    Querying Time vs. Inputting Time

    A subtler trend, but I realized this year I spent comparable time querying the database than creating it. Intuitively, cataloging would take more time in 2023, which intuitively makes sense: more records = more time. However, I streamlined my process for faster entry. So, writing 183 lines in a spreadsheet didn’t take much time at all, especially compared to the time spent reading the books. How often I referred to my notes from this year compared to previous ones surprised me. I referenced the database in bookstores, online conversations, and while writing. I’m comfortable with this data now. I also created the dashboards at the end of 2022 and was able to use them throughout 2023. Through reflection, I could comprehend the iterative progress of these milestones.

    Methodology and Tools

    I used an Airtable database to input, track, and filter the information. Here’s an Interface displaying these visualizations. [link]


    Here’s the entire list of all 183 books considered: