Enable Kerberos secured Hadoop cluster with Cloudera Manager

I created an secured Hadoop cluster for P&G with cloudera
manager, and this document is to record how to enable kerberos secured
cluster with cloudera manager. Firstly we should have a cluster that
contains kerberos KDC and kerberos clients

1. Install KDC server
Only one server run this, note, kdc is only install on a single server

sudo yum -y install krb5-server krb5-libs krb5-workstation krb5-auth-dialog openldap-clients

This command will install Kerberos Server and some useful commands with krb5-workstation

2.Modify /var/kerberos/krb5kdc/kdc.conf

kdcdefaults] kdc_ports = 88
 kdc_tcp_ports = 88
 
[realms]
 PG.COM = {
  #master_key_type = aes256-cts
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
  max_renewable_life = 7d
 }

PG.COM used to be EXAMPLE.COM
and add max_renewable_life = 7d or more longer than this, 4w means 4 weeks

3. Modify/etc/krb5.conf

libdefaults]default_realm = PG.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 259200
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = rc4-hmac
default_tkt_enctypes = rc4-hmac
permitted_enctypes = rc4-hmac
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
 PG.COM = {
  kdc = pg-dmp-master2.hadoop
  admin_server = pg-dmp-master2.hadoop
 }

EXAMPLE.COM to PG.COM and in [realms] area, kdc point to node where I
install KDC and admin_server point to server where installed kadmin

4. Create the realm

kdb5_util create -s -r PG.COM

This will create the working realm named PG.COM

5. Create the admin user principle in PG.COM on KDC server

kadmin.local -q "addprinc root/admin"

I use root/admin as an admin user, you can type in a different
password for this admin, note, root/admin is not same as root, they can
keep in kerberos’s database either, and they are two account.

6. Edit /var/kerberos/kadm.acl

*/[email protected]

This will define who is admin in PG.COM realm, this means everyone with /[email protected] could be kerberos admin.

7. Check all the kerberos’s configure files, ensure there are no errors.

/etc/krb5.conf
/var/kerberos/krb5kdc/krb.conf
/var/kerberos/krb5kdc/kadm.acl

8. And now start KDC and Kadmin service on KDC server

service krb5kdc start
service kadmin start

This will start KDC and Kadmin service on KDC server

9. Login to all other nodes of this cluster, and run this

yum install krb5-workstation krb5-libs krb5-auth-dialog openldap-clients cyrus-sasl-plain cyrus-sasl-gssapi

Installing kerberos clients on all nodes

And then, back to Cloudera Manager , click Cluster->Operation, and
find enable kerberos in dropdown menu. Then enable all the question,
and Next->Next->Next… till it ends.

When the all procedure done, you can use kerberos secured hadoop cluster.

So there were some more questions when installed kerberos secured cluster.

How could I add a new user principle?

kadmin.local -q "addprinc [email protected]"

And the note is , when you add a common user to the cluster, you
should run this command above on Kadmin(KDC) server, and [email protected]
is a normal user in kerberos, if you want to add an admin user, you
should use username/[email protected], the /admin is which you defined in
kadm.acl

How should I administrating HDFS or YARN, which I mean, how to use hdfs user or yarn user?

Well, since cloudera manager will automatically create several users
in cluster such as hdfs, yarn, hbase… and these users were all defined
as nologin without any password. And these users are all admin users of
Hadoop but not in kerberos database. So you can’t ask ticket like this.

# kinit hdfs
kinit: Client not found in Kerberos database while getting initial credentials
# kinit [email protected]
kinit: Client not found in Kerberos database while getting initial credentials

But you can ask ticket by using keytabs of these cluster users like this

 kinit -kt hdfs.keytab hdfs/[email protected]

current_server is the hostname of  your current logged in server and
wants to access with HDFS or YARN, in cluster, hdfs.keytab are all
different on each node. so you must write command like this above.

When I  run a smoke test like Pi, it gone wrong? I’ve already add a new user principle in KDC database.

17/04/24 11:18:35 INFO mapreduce.Job: Job job_1493003216756_0004 running in uber mode : false
17/04/24 11:18:35 INFO mapreduce.Job:  map 0% reduce 0%
17/04/24 11:18:35 INFO mapreduce.Job: Job job_1493003216756_0004 failed with state FAILED due to: Application application_1493003216756_0004 failed 2 times due to AM Container for appattempt_1493003216756_0004_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://pg-dmp-master1.hadoop:8088/proxy/application_1493003216756_0004/Then, click on links to logs of each attempt.
Diagnostics: Application application_1493003216756_0004 initialization failed (exitCode=255) with output: main : command provided 0
main : run as user is dmp
main : requested yarn user is dmp
User dmp not found

Failing this attempt. Failing the application.
17/04/24 11:18:35 INFO mapreduce.Job: Counters: 0
Job Finished in 1.094 seconds
java.io.FileNotFoundException: File does not exist: hdfs://PG-dmp-HA/user/dmp/QuasiMonteCarlo_1493003913376_864529226/out/reduce-out
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1257)
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1249)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1249)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1817)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1841)
        at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
        at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

In an unsecured cluster, hadoop will distibuted all the containers to
nodes and start the containers as an exists username such as yarn. But
in a secured cluster, job’s containers will be distributed and run as
the username who you submitted it.  So, this error is you add a
principle only, but there were no such a user on every node. So the
container executor can not start up the application master or mapper or
reducer. And you must add this user to each node to solve this problem.
Simply use linux command

useradd dmp

On every node in cluster and client, another express way to add user
to all nodes you can use openldap to instead add user manually.

Finally we should talk about some basic theory of kerberos

There are three phases required for a client to access a service
– Authentication
– Authorization
– Service request

Client sends a Ticket-Granting Ticket (TGT) request to AS

AS checks database to authen2cate client
  – Authentication typically done by checking LDAP/Active Directory
  – If valid, AS sends Ticket/Granting Ticket (TGT) to client

Client uses this TGT to request a service ticket from TGS
  – A service Bcket is validation that a client can access a service

TGS verifies whether client is permitted to use requested service
  – If access granted, TGS sends service Bcket to client

Client can use then use the service
  – Service can validate client with info from the service ticket



kinit program is used to obtain a ticket from Kerberos
klist to see users in kerberos database
kdestroy to explicitly delete your ticket

( Pics and their comments are reffered from Cloudera Administrator Training Course, copyright to cloudera.com )

时间: 2024-12-03 10:52:20

Enable Kerberos secured Hadoop cluster with Cloudera Manager的相关文章

【Hadoop】3、Hadoop安装之cloudera manager(1)

1.网络配置 1.1  为每台机器配置固定IP 安装完cm后尽量不要修改IP,  cm在安装时会绑定IP, 修改IP会造成cm不能正确识别主机. 1.2 设置开机自动连接 图形界面 或修改配置文件 vi  /etc/sysconfig/network-scripts/ifcfg-eth0 将ONBOOT由no改为yes ifcfg-eth0对应的网卡名称 2. 修改主机名(hostname) 2.1 用root用户进入,打开终端: #  vi /etc/sysconfig/network 2.2

【Hadoop】5、Hadoop安装之cloudera manager(3)

安装 http://blog.sina.com.cn/s/blog_75262f0b0101aeuo.html 在这之前我们先把cm包里面的所有文件安装在说 这是由于CM依赖postgresql,需要在本机上安装postgresql,如果是在线安装,自动以Yum方式安装,由于是离线,无法自动安装postgresql. 检查是否安装了postgresql: [[email protected] postgresql84]# rpm -qa|grep postgres 设置了软件源还不够,还会从互联

【Hadoop】4、Hadoop安装之cloudera manager(2)

8.配置ssh免密码登陆 且过,后面再补 进入.ssh里面 这将生成一个隐藏文件 .ssh,进入这个文件夹,然后将公钥追加到authorized_keys文件中,此文件最初并不存在,但执行追加命令后将自动生成:   cd .ssh cat id_rsa.pub >> authorized_keys 验证本机 # ssh-copy-id 192.168.42.99 # ssh-copy-id 192.168.42.100 - (10.0.7.238是你具体要和谁免key,自己也要和自己设置免ke

Cloudera Manager和CDH版本的对应关系

来源:https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#cm_cdh_compatibility Cloudera uses the following versioning convention: major.minor.maintenance. If a cluster runs Cloudera Manager 5.14.0, the major v

Install hadoop with Cloudera Manager 5 on CentOS 6.5

分区考虑,不要使用LVMroot -- > 20Gswap -- 2倍系统内存 RAM -- >4GB Master node:RAID 10, dual Ethernet cards, dual power supplies, etc. Slave node:1. RAID is not necessary 2. HDFS分区, not using LVM/etc/fstab -- ext3    defaults,noatime挂载到/data/N, for N=0,1,2... (one

Install hadoop with Cloudera Manager 5.2 on CentOS 6.5

分区考虑,不要使用LVMroot -- >40G var -- >100G swap -- 2倍系统内存 RAM -- >8GB Master node:RAID 10, dual Ethernet cards, dual power supplies, etc. Slave node:1. RAID is not necessary 2. HDFS分区, not using LVM/etc/fstab -- ext3    defaults,noatime挂载到/data/N/dfs/

Centos6.5安装配置Cloudera Manager CDH5.6.0 Hadoop

环境规划操作系统版本:CentOS-6.5-x86_64-minimal.iso 192.168.253.241    master   虚拟机4G内存192.168.253.242    slave1    虚拟机2G内存192.168.253.243    slave2    虚拟机2G内存 一.环境初始化1.修改主机名(每台机器都执行) 192.168.253.241    master 192.168.253.242    slave1 192.168.253.243    slave2

Hadoop 离线安装 CDH5.1 第二章:cloudera manager与agent 安装

新建cloudera-scm用户 (做了分行,这个是一条linux命令,) [[email protected] cloudera-manager-5.1.0]$ sudo useradd --system  --home=/opt/cloudera-manager-5.1.0/run/cloudera-scm-server  --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm 改

Cloudera Manager 5 和 CDH5 本地(离线)安装指南(Hadoop集群搭建)

系统环境 4台联想R510服务器,其中master节点4核心.8GB,其它节点双核心.4GB. 网卡:100M. 共有硬盘6TB. 网络环境内网. Centos6.5 x64(安装系统时尽量把开发包安装齐全,另master节点需要Mysql可以在安装系统时勾选). 准备工作 卸载系统自带OPEN-JDK(所有节点) 安装好的Centos系统有时会自动安装OpenJdk,用命令java -version查看: 1 java version "1.6.0" 2 OpenJDK Runtim