Pig

Hive vs. Pig

Both projects are top Apache projects to process data in Hadoop. Here, I try to compare the difference. Below is picture I found (I cannot find the original link, but there is mirror here) In addition, I am here to share some more difference. To be honest, I prefer pig more. Mapred, Hive, and pig are like Java, ruby, and python. Pig It is more easy to install pig than hive.

Continue reading

SQL in MySQL and Pig Comparision

Here, it is using Mysql 5.1.x and Pig 0.8 as sample. Two sample files are used as follows. 00.Prepared Files cat /tmp/data_file_1 zhangsan 23 1 lisi 24 1 wangmazi 30 1 meinv 18 0 dama 55 0 cat /tmp/data_file_2 1 a 23 bb 50 ccc 30 dddd 66 eeeee 01.Load Files 1)Mysql (Need to create the table first). CREATE TABLE TMP_TABLE(USER VARCHAR(32),AGE INT,IS_MALE BOOLEAN); CREATE TABLE TMP_TABLE_2(AGE INT,OPTIONS VARCHAR(50)); -- 用于Join LOAD DATA LOCAL INFILE '/tmp/data_file_1' INTO TABLE TMP_TABLE ; LOAD DATA LOCAL INFILE '/tmp/data_file_2' INTO TABLE TMP_TABLE_2; 2)Pig tmp_table = LOAD '/tmp/data_file_1' USING PigStorage('\t') AS (user:chararray, age:int,is_male:int); tmp_table_2= LOAD '/tmp/data_file_2' USING PigStorage('\t') AS (age:int,options:chararray); 02.

Continue reading