Big Data Books Reviews

January 10, 2018

Learning Spark SQL

  • Level Ent.
  • Level Mid.
  • Level Adv.

Published in Sep. 2017. Start reading it.


  • Level Ent.
  • Level Mid.

There are very few books about Apache Flink. Besides offical document, this is a good one for people who wants to know Flink quicker. This book, published in the earlier of 2017, covers most of core topics for Flink with examples. Some examples may lack of details and explainations, such as on the ML section, but it is a good in general.


Hadoop: The Definitive Guide

  • Level Ent.
  • Level Mid.
  • Level Adv.

This is a really a good Hadoop book to recommend. I have read both 2nd and 3rd edition. The latest 3rd edition is based on the Hadoop 1.0. It covers almost everything on the Hadoop including Yarn. The author also has github site to share the code. Here is my book note


Hadoop In Practice

  • Level Mid.
  • Level Adv.

This is pretty good book, especially in the data science chapter. The hive part is a little bit old than other latest Hadoop book. The reading experience is also good. I like the way it provides number of “TECHNIQUE”. It touches some new tool of big data that other books do not cover, such as Cloudera Crunch. There are no comments in the source code (request by publication), but there are enough comments added inline in the book.


Hadoop Real World Solution Cookbook

  • Level Mid.
  • Level Adv.

The code has no comments with explanation below. The way is really not I like. If put the comments in code, the book may have less pages to read. In addition, there are logic mistakes in the book because copy & paste error I think at least three – five time after I read 100s of pages, eg. p143 “hashset” should be “hashmap”. The charpter 7 starts looking good and deep which requires your knowledge on data mining and graph processing. This is a good tool reference book anyway


Hadoop 实战

  • Level Ent.
  • Level Mid.

This book is a Chinese book which has same name to below but with totally different. It covers majority Hadoop components and reading friendly. I only read the 1st edition, so the things are a little out of date. The 2ed is also on the shelf right now. Generally, it is just introduction and lacks of details and high skills.


Hadoop In Action

  • Level Ent.
  • Level Mid.

I got hard copy of this. This book is a little bit old based on Hadoop 0.19. It covers majority Hadoop components. It also has Chinese version.


Hadoop MapReduce Cookbook

  • Level Ent.
  • Level Mid.
  • Level Adv.

Some sample Hadoop commands lack of necessary space between command/parameters. In Ch8, it provide some data analytics implementation using Java and MapReduce, which I did not see details like this in other books. It it worthy more time of reading this part.


MapReduce Design Patterns

  • Level Mid.
  • Level Adv.

The topic is really focus. The pattern is not that exciting comparing with Java’s in description. There is small values if you already read below other books. There are typos and mistakes. I cannot find the source code either.


Programming Pig

  • Level Mid.
  • Level Adv.

This is a tiny book about pig, around 200 pages. It covers everything. The extension of UDF parts lacks of enough examples. Also, these parts are a little bit hard for reading. I have also read the translation one, which is so so. You cannot find more examples of Pig than anywhere else. However, I expect there is another book I believe that could/should cover more practical examples and hands on scripts.


Hadoop Mapreduce Internals

  • Level Adv.

This book tells how map and reduce are implemented in source code level. It covers lots of detail that other book never mentioned. It can help reading the source code. This is kind of book helping uderstanding instead of practicing something. There are less code samples with book. The picture and comparing form in this book are really good for reading and undersanding.


HBase Administration Cookbook

  • Level Mid.
  • Level Adv.

This book is for HBase administrators, developers, and will even help Hadoop administrators. You are not required to have HBase experience, but are expected to have a basic understanding of Hadoop and MapReduce. This is very practical tookit book for HBase admin. It does not talk more about API and focus on administration only.


HBase: The Definitive Guide

  • Level Ent.
  • Level Mid.
  • Level Adv.

This is a really a good HBase book to recommend. This is the 1st edition and shows you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. The author also has github site to share the code. I am still in reading for now and it is a little bit hard.


Big Data: A Revolution That Will Transform How We Live, Work, and Think

  • Level Ent.

It is one of few books of big data using real example to tell what’s revolution brought by big data. The signatures of big data it describes are really impressive. This book motives readers to explore the value behind of big data. It is a good book to encourage people to explore the big data area.


Instant Apache Hive Essentials How-to

  • Level Ent.
  • Level Mid.

The book creates fast way to query data using hive in few hours. This is great than searching the apache confluence to see the breaked help documents especially for new hive users. The book has few pages to read and easier to understand. The author also gives level of complex for each chapters so that different level of users could quickly pick up what he/she needs.


Tableau Your Data

  • Level Ent.
  • Level Mid.

This is a good Tableau guide for data visualization based on the latest version of the software. It is detail oritented. It covers lots of details especially on the server deployment and security. The case study charpter is also good reference.

comments powered by Disqus