#########
Hadoop 2
#########
Overview
========
===== ===================
Port Description
===== ===================
50070 hadoop namenode web
54310 hadoop namenode
50010 hadoop datanode
50020 hadoop datanode
50075 hadoop datanode
8030 hadoop resourcemanager
8031 hadoop resourcemanager
8032 hadoop resourcemanager
8033 hadoop resourcemanager
8088 hadoop resourcemanager web
13562 hadoop nodemanager
8040 hadoop nodemanager
34568 hadoop nodemanager
8042 hadoop nodemanager
19888 hadoop job history web
10020 hadoop job history
10033 hadoop job history
===== ===================
Installation
============
* Edit ``etc/hadoop/core-site.xml`` to configure tmp dir and location of name node
.. code-block:: bash
hadoop.tmp.dir
/local/hadoop/tmp
A base for other temporary directories.
fs.defaultFS
hdfs://127.0.0.1:54310
hadoop.security.authorization
true
hadoop.http.staticuser.user
hdfs
* Edit ``etc/hadoop/mapred-site.xml`` to set the locations of the job tracker and its working dir
.. code-block:: bash
mapred.job.tracker
127.0.0.1:54311
mapreduce.jobtracker.staging.root.dir
/user
mapred.tasktracker.map.tasks.maximum
16
mapred.tasktracker.reduce.tasks.maximum
16
mapreduce.framework.name
yarn
mapreduce.jobtracker.expire.trackers.interval
600000
Expert: The time-interval, in miliseconds, after which
a tasktracker is declared 'lost' if it doesn't send heartbeats.
mapreduce.jobtracker.restart.recover
false
"true" to enable (job) recovery upon restart,
"false" to start afresh
mapreduce.task.io.sort.factor
10
The number of streams to merge at once while sorting
files. This determines the number of open file handles.
mapreduce.task.io.sort.mb
100
The total amount of buffer memory to use while sorting
files, in megabytes. By default, gives each merge stream 1MB, which
should minimize seeks.
mapreduce.tasktracker.http.threads
40
The number of worker threads that for the http server. This is
used for map output fetching
mapreduce.task.timeout
600000
The number of milliseconds before a task will be
terminated if it neither reads an input, writes an output, nor
updates its status string. A value of 0 disables the timeout.
mapreduce.task.tmp.dir
./tmp
To set the value of tmp directory for map and reduce tasks.
If the value is an absolute path, it is directly assigned. Otherwise, it is
prepended with task's working directory. The java tasks are executed with
option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
streaming are set with environment variable,
TMPDIR='the absolute path of the tmp dir'
mapreduce.output.fileoutputformat.compress
false
Should the job outputs be compressed?
mapreduce.shuffle.ssl.enabled
false
Whether to use SSL for for the Shuffle HTTP endpoints.
* Edit ``etc/hadoop/hdfs-site.xml`` to set working dirs of name and data node and how often a file gets replicated
.. code-block:: bash
dfs.replication
3
dfs.data.dir
/data/hadoop/data-node
dfs.name.dir
/data/hadoop/name-node
dfs.permissions.supergroup
hadoop
dfs.namenode.accesstime.precision
3600000
The access time for HDFS file is precise upto this value.
The default value is 1 hour. Setting a value of 0 disables
access times for HDFS.
dfs.permissions.enabled
true
If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
dfs.namenode.fs-limits.min-block-size
1048576
Minimum block size in bytes, enforced by the Namenode at create
time. This prevents the accidental creation of files with tiny block
sizes (and thus many blocks), which can degrade
performance.
dfs.blocksize
134217728
The default block size for new files, in bytes.
You can use the following suffix (case insensitive):
k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.),
Or provide complete size in bytes (such as 134217728 for 128 MB).
dfs.namenode.fs-limits.max-blocks-per-file
1048576
Maximum number of blocks per file, enforced by the Namenode on
write. This prevents the creation of extremely large files which can
degrade performance.
dfs.heartbeat.interval
3
Determines datanode heartbeat interval in seconds.
dfs.namenode.handler.count
10
The number of server threads for the namenode.
dfs.namenode.name.dir.restore
false
Set to true to enable NameNode to attempt recovering a
previously failed dfs.namenode.name.dir. When enabled, a recovery of any
failed directory is attempted during checkpoint.
dfs.image.compress
false
Should the dfs image be compressed?
dfs.image.transfer.bandwidthPerSec
0
Maximum bandwidth used for image transfer in bytes per second.
This can help keep normal namenode operations responsive during
checkpointing. The maximum bandwidth and timeout in
dfs.image.transfer.timeout should be set such that normal image
transfers can complete successfully.
A default value of 0 indicates that throttling is disabled.
dfs.datanode.max.transfer.threads
4096
Specifies the maximum number of threads to use for transferring data
in and out of the DN.
dfs.ha.automatic-failover.enabled
false
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
dfs.webhdfs.enabled
false
Enable WebHDFS (REST API) in Namenodes and Datanodes.
dfs.https.enable
false
Decide if HTTPS(SSL) is supported on HDFS
* Edit ``etc/hadoop/yarn-site.xml``
.. code-block:: bash
yarn.resourcemanager.resource-tracker.address
[% HADOOP_MASTER %]:8031
host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
yarn.resourcemanager.scheduler.address
127.0.0.1:8030
host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
In case you do not want to use the default scheduler
yarn.nodemanager.local-dirs
/data/hadoop/nm
the local directories used by the nodemanager
yarn.nodemanager.address
127.0.0.1:8040
the nodemanagers bind to this port
yarn.nodemanager.resource.memory-mb
10240
the amount of memory on the NodeManager in GB
yarn.nodemanager.remote-app-log-dir
/app-logs
directory on hdfs where the application logs are moved to
yarn.nodemanager.log-dirs
the directories used by Nodemanagers as log directories
yarn.nodemanager.aux-services
mapreduce_shuffle
shuffle service that needs to be set for Map Reduce to run
* Create a hadoop user with an SSH key
.. code-block:: bash
useradd -d /opt/hadoop hadoop
chown -R hadoop /opt/hadoop
su - hadoop
ssh-keygen
cat .ssh/id_rsa.pub > .ssh/authorized_keys
chmod 400 .ssh/authorized_keys
ssh localhost
* Format the HDFS
.. code-block:: bash
su - hadoop -c '/opt/hadoop/bin/hdfs namenode -format -force'
* Start the services
.. code-block:: bash
su - hadoop -c '/opt/hadoop/sbin/hadoop-daemon.sh start namenode && /opt/hadoop/sbin/hadoop-daemon.sh start datanode' && /opt/hadoop/sbin/yarn-daemon.sh start resourcemanager && /opt/hadoop/sbin/yarn-daemon.sh start nodemanager'"
Check status
============
* HDFS
.. code-block:: bash
/opt/hadoop/bin/hdfs dfsadmin -report
* YARN
.. code-block:: bash
/opt/hadoop/bin/yarn node -list
* Test Namenode
.. code-block:: bash
su - hadoop -c '/opt/hadoop/bin/hadoop fs -mkdir /user'
su - hadoop -c '/opt/hadoop/bin/hadoop fs -mkdir /user/hadoop'
* Test Datanode
.. code-block:: bash
su - hadoop -c '/opt/hadoop/bin/hadoop fs -put /opt/hadoop/etc/hadoop/hadoop-env.sh /user/hadoop/hadoop-env'
* Test YARN
.. code-block:: bash
su - hadoop -c "/opt/hadoop/bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 2 10"
Configure High Availability
===========================
* Edit core-site.xml
.. code-block:: bash
fs.defaultFS
hdfs://mycluster
ha.zookeeper.quorum
zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
* Edit hdfs-site.xml
.. code-block:: bash
dfs.nameservices
mycluster
dfs.ha.namenodes.mycluster
nn1,nn2
dfs.namenode.rpc-address.mycluster.nn1
hadoop_master:8020
dfs.namenode.rpc-address.mycluster.nn2
hadoop_master2:8020
dfs.namenode.http-address.mycluster.nn1
hadoop_master:50070
dfs.namenode.http-address.mycluster.nn2
hadoop_master2:50070
dfs.namenode.shared.edits.dir
qjournal://hadoop_master:8485;hadoop_master2:8485/mycluster
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/opt/hadoop/.ssh/id_rsa
dfs.ha.automatic-failover.enabled
true
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
ha.zookeeper.quorum
zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
* Edit yarn-site.xml
.. code-block:: bash
yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.cluster-id
mycluster
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
hadoop_master
yarn.resourcemanager.hostname.rm2
hadoop_master2
yarn.resourcemanager.webapp.address.rm1
hadoop_master:8088
yarn.resourcemanager.webapp.address.rm2
hadoop_master2:8088
yarn.resourcemanager.zk-address
zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
* Start Zookeeper Fencing Service additionally to normal Zookeeper
.. code-block:: bash
/opt/hadoop/sbin/hadoop-daemon.sh --script /opt/hadoop/bin/hdfs start zkfc
* Start HDFS Journalnode on Namenode servers
.. code-block:: bash
/opt/hadoop/sbin/hadoop-daemon.sh start journalnode
* Initialize and format Namenode
.. code-block:: bash
/opt/hadoop/bin/hdfs zkfc -formatZK
/opt/hadoop/bin/hdfs namenode -format -force
/opt/hadoop/sbin/hadoop-daemon.sh start namenode
* Check the status
.. code-block:: bash
bin/hdfs haadmin -getServiceState nn1
bin/hdfs haadmin -getServiceState nn2
bin/yarn rmadmin -getServiceState rm1
bin/yarn rmadmin -getServiceState rm2
Convert single namenode to HA
=============================
.. code-block:: bash
/opt/hadoop/bin/hdfs namenode -bootstrapStandby
/opt/hadoop/bin/hdfs namenode -initializeSharedEdits
Configure Capacity Scheduler
============================
* The CapacityScheduler is designed to allow sharing a large cluster while giving each organization a minimum capacity guarantee.
* Make sure ``org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler`` is set as ``yarn.resourcemanager.scheduler.class`` in ``etc/hadoop/yarn-site.xml``
* Configure resources for unix groups a, b and default
* Edit ``etc/hadoop/capacity-scheduler.xml``
.. code-block:: bash
yarn.scheduler.capacity.root.queues
a,b,default
The queues at the this level (root is the root queue).
yarn.scheduler.capacity.root.a.capacity
30
Default queue target capacity.
yarn.scheduler.capacity.root.a.user-limit-factor
1
Default queue user limit a percentage from 0.0 to 1.0.
yarn.scheduler.capacity.root.a.maximum-capacity
100
The maximum capacity of the default queue.
yarn.scheduler.capacity.root.a.state
RUNNING
The state of the default queue. State can be one of RUNNING or STOPPED.
yarn.scheduler.capacity.root.a.acl_submit_applications
group_a
The ACL of who can submit jobs to the default queue.
yarn.scheduler.capacity.root.b.capacity
30
Default queue target capacity.
yarn.scheduler.capacity.root.b.user-limit-factor
1
Default queue user limit a percentage from 0.0 to 1.0.
yarn.scheduler.capacity.root.b.maximum-capacity
100
The maximum capacity of the default queue.
yarn.scheduler.capacity.root.b.state
RUNNING
The state of the default queue. State can be one of RUNNING or STOPPED.
yarn.scheduler.capacity.root.b.acl_submit_applications
group_b
The ACL of who can submit jobs to the default queue.
yarn.scheduler.capacity.root.default.capacity
40
Default queue target capacity.
yarn.scheduler.capacity.root.default.user-limit-factor
1
Default queue user limit a percentage from 0.0 to 1.0.
yarn.scheduler.capacity.root.default.maximum-capacity
100
The maximum capacity of the default queue.
yarn.scheduler.capacity.root.default.state
RUNNING
The state of the default queue. State can be one of RUNNING or STOPPED.
yarn.scheduler.capacity.root.default.acl_submit_applications
*
The ACL of who can submit jobs to the default queue.
* Refresh queues
.. code-block:: bash
bin/yarn rmadmin -refreshQueues
bin/hadoop queue -list
* Submit a test job
.. code-block:: bash
bin/hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi -Dmapred.job.queue.name=a 2 10
Configure Fair Scheduler
========================
* All jobs get, on average, an equal share of resources over time
* Make sure ``org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler`` is set as ``yarn.resourcemanager.scheduler.class`` in ``etc/hadoop/yarn-site.xml``
* A pool has the same name as the user
* Create file ``etc/hadoop/fair-scheduler.xml``
.. code:: xml
5
90
20
2.0
3
2
* Restart resource manager
HDFS NFS Gateway
================
* Edit ``hdfs-site.xml``
.. code-block:: bash
dfs.nfs3.dump.dir
/data/hadoop/.hdfs-nfs
dfs.nfs.exports.allowed.hosts
* rw
* Edit ``core-site.xml``
.. code-block:: bash
hadoop.proxyuser.nfsserver.groups
*
The 'nfsserver' user is allowed to proxy all members of the 'nfs-users1' and
'nfs-users2' groups. Set this to '*' to allow nfsserver user to proxy any group.
hadoop.proxyuser.nfsserver.hosts
*
This is the host where the nfs gateway is running. Set this to '*' to allow
requests from any hosts to be proxied.
* Start the daemons
.. code-block:: bash
systemctl stop rpcbind
sbin/hadoop-daemon.sh start portmap
su - nfsserver -c "sbin/hadoop-daemon.sh start nfs3"
* Test nfs mount
.. code-block:: bash
rpcinfo -p localhost
showmount -e
mount -t nfs -o vers=3,proto=tcp,nolock localhost:/ /mnt
Troubleshooting
===============
* ``java.lang.IllegalArgumentException: Illegal capacity of -1.0 for queue`` -> You dont have defined a capacity for the queue like yarn.scheduler.capacity.root.$QUEUENAME.capacity
* ``org.apache.hadoop.util.Shell$ExitCodeException: chmod: cannot access `/user/myuser1544460269/.staging/job_local1544460269_0001': No such file or directory`` -> set mapreduce.framework.name to yarn in mapred-site.xml
* datanode says ``Initialization failed for Block pool`` -> Namenode got formated / changed afterwards, delete the contents of dfs.data.dir