Those involved in data wrangling would have easier jobs if the processing power of a single machine could keep up with the size of the data they needed to process. The fact is it hasn’t, and over the past decade a new field of Big Data has emerged. This Big Data field is focused on reliably processing massive amounts of data. Processing large amounts of data has been done in the past, but what is new this time around is that it’s being done on commodity hardware and open-source software tools.
We have chosen a selection of the most useful books for data analysis in this Safari Books Online bibliography. These start from high level concepts of business intelligence, data analysis and data mining, and work their way down to the tools needed for number crunching mathematical toolkits, machine learning, and natural language processing. We then cover Cloud Services and Infrastructure and Amazon Web Services. Finally, we have Hadoop and NoSql sections that list the Big Data tools that can be deployed locally or in the cloud.