Access Control Mechanisms in Big Data Processing

Yenumula B Reddy


Hadoop, Big Data, Security, MapReuce, Kerberos, Authentication


Hadoop distributed file system (HDFS) must provide a distributed file system and MapReduce framework. The core components of HDFS are fault tolerant, high throughput, and files of arbitrary size. These components include the shared nothing architecture, a massive parallelism of tasks, and basic data structure is key/value pair. HDFS has shared multi-talent service and is used to store sensitive data. Currently, HDFS is used on private clusters behind firewalls and requires strong authentication and authorization (access control) to protect the sensitive private and public data. The HDFS job is partitioned and distributed on nodes for execution different from the node that the client authenticated and submitted the job. Further, job tasks from various users are executed on the same computer node and the system scales thousands of servers and performs many concurrent tasks. Therefore, the total performance path of the system requires authentication checks at multiple points. Kerberos authentication mechanism helps to meet the security requirements as a supplement to trusted users. Special access control mechanisms may require for high sensitive data to keep the hackers away. The proposed model helps the access control mechanism for high sensitive data in Big Data processing.

Important Links:

Go Back