Module-11D

If you are not able to view above video, then signIn . If still not able to view then visit this page to subscribe.

Module 11D :  Hands On :   Apache Pig Statements : Available (Length 8 Minutes) 

1. ForEach statement

2. Example 1 : Data projecting and foreach statement

3. Example 2 : Projection using schema

4. Example 3 : Another way of selecting columns using two dots ..

Data Transformation using following major operators.

foreach : Suppose you have 60 records in a relation, then you can apply your operation using 'foreach' , on each record of your relation.

Syntax : 

alias  = FOREACH { block };

myData = foreach categories generate *; -- It will select all the columns from categories and generate new relation myData

alias : Name of the relation also it is a outer bag.

Example 1 : Projection using foreach (HandsOn)

categories = LOAD '/user/cloudera/Training/pig/cat.txt' USING PigStorage(',');

myData = foreach categories generate *;

dump myData;

(1,2,Football) (2,2,Soccer)

myDataSelected = foreach categories generate $0,$1; --Selecting only first 2 columns

dump myDataSelected;

(1,2) (2,2) (3,2)

Example 2 : Projection using schema (HandsOn)

categories2 = LOAD '/user/cloudera/Training/pig/cat.txt' USING PigStorage(',') AS (id:int, subId:int, catName:chararray);

selectedCat = foreach categories2 generate subId,catName; --Selecting only two columns, using column name

DUMP selectedCat;

(2,Football) (2,Soccer) (2,Baseball & Softball)

subtract = foreach categories2 generate id-subId; --This is just to show you, you can use expression

dump subtract;

(-1) (0) (1)

So you can refer columns in a relation with

Example 3 : Another way of selecting columns using two dots ..  (HandsOn) 

selectedCat3 = foreach categories2 generate id..catName; --Select all the columns between id and catName

DUMP selectedCat3;

selectedCat4 = foreach categories2 generate subId..; --Select all the columns subId and rest which comes after subId

DUMP selectedCat4;

selectedCat5 = foreach categories2 generate ..catName; --Select all the columns comes before catName inclusive

DUMP selectedCat5;