Tom Owad’s Data Mining 101: Finding Subversives with Amazon Wishlists is superb read (via Boing Boing):
“Using a pair of 5-year-old computers, two home DSL connections, 42 hours of computer time, and 5 man hours, I now had documents describing the reading preferences of 260,000 U.S. citizens.
I downloaded all the files to an external 120 GB Firewire drive in UFS format. The raw data occupied little more than 5 GB. I initially wanted to move all the files into a single directory to facilitate searching, but as the directory contents exceeded 100,000 items, the speed became glacially slow, so I kept the data divided into chunks of 25,000 wishlists.”
The sad part is, I can’t even get my wishlist out of Amazon without some furious hacking.
That’s why I’m using Library Thing (thanks Steve!). Sure, you could mine it all day, but least I can get at my data by exporting my catalog as a CSV file. Making this catalog into an OPML list is the next step (export function would be nice Tim)…
Tim Spalding is the developer, great job Tim!.
If you like / love books you *have* to check out Library Thing.