module-11_pig

Introduction : 

        Apache PIG is a very important component of Hadoop Eco-System. It is very well matured component and being used in production. Apache PIG helps you write Data Flow engine , which can process data stored in HDFS (Hadoop Distributed File System). With the help of Apache Pig you can avoid writing MapReduce Jobs. With the help of Pig Latin script you can write a long series of data operations and following are the activities can be completed using 

 WebServerLogs --> Extract Relevant Info --> Apply Transformation (e.g. Mapping between page content and userid, aggregate, join,sort etc.) --> Load in Hive Table

          

Other Benefits : 

Components of Apache Pig : Pig has following components.

Execution Mode : Pig can be executed with the any of the below mode.