Tajo Stage flow and StorageManager

The planner of Tajo makes multiple stage to execute a query. Each stage looks like a mapreduce job of Hive query. Tajo saves the intermediate file not in HDFS but stores on the local disk. The next flow explains how Tajo worker runs each task internally.

Tajo supports various storage like HDFS, HBase, S3, Swift because of abstraction layer. The following diagram is a class diagram about Tajo’s storage package.    

사용자 삽입 이미지