Big data! I don’t know how many times lately I have read or heard that computer science students need to work with big data. But what is big data and where do you get it? If you have ever tried to build fake data you know it can be hard. This is especially true if you want the data to be “real” by some definition of real. Fortunately there is a huge amount of data on the Internet. The US Government has some great collections of data that are available in many formats that often include Excel, comma delimitated list text files, HTML and others. Below are a few of my favorite data sources.
- File A: Top 1000 Names [XLS – 132k]
- File B: Surnames Occurring 100 or more times [ZIP – 357k] (151,671 records)
Want some large text files for analysis and projects take a look at the large collection of free books at Project Gutenberg. There are books there in many languages by the way!
There are a couple of other links in the comments as I update this over the weekend. I really hope more of you will add your favorite online data sets. Thanks for the comments!