hadoop(二)The version hadoop 0.23.0 on Ubuntu

2012-09-19

hadoop(2)The version hadoop 0.23.0 on Ubuntuhadoop(2)The version hadoop 0.23.0 on Ubuntu1. Single N

hadoop(2)The version hadoop 0.23.0 on Ubuntu
hadoop(2)The version hadoop 0.23.0 on Ubuntu

1. Single Node
Mapreduce Tarball
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative -DskipTests=true

error message:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.2.1:assembly (default-cli) on project hadoop-mapreduce: Error reading assemblies: No assembly descriptors found. -> [Help 1]

solution:
>mvn package -Pdist -DskipTests=true -Dtar
>vi conf/yarn-env.sh
HADOOP_MAPRED_HOME=/usr/local/hadoop-0.23.0

>vi conf/core-site.xml
<property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop-${user.name}</value>
    <description>No description</description>
    <final>true</final>
</property>

>vi conf/mapred-site.xml
<property>
    <name>mapreduce.cluster.temp.dir</name>
    <value>${hadoop.tmp.dir}/mapred/temp</value>
    <description>No description</description>
    <final>true</final>
</property>
<property>
    <name>mapreduce.cluster.local.dir</name>
    <value>${hadoop.tmp.dir}/mapred/local</value>
    <description>No description</description>
    <final>true</final>
</property>

>vi conf/yarn-site.xml
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>0.0.0.0:8025</value>
    <description>host is the hostname of the resource manager and
    port is the port on which the NodeManagers contact the Resource Manager.
    </description>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>0.0.0.0:8030</value>
    <description>host is the hostname of the resourcemanager and port is the port
    on which the Applications in the cluster talk to the Resource Manager.
    </description>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    <description>In case you do not want to use the default scheduler</description>
</property>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>0.0.0.0:8040</value>
    <description>the host is the hostname of the ResourceManager and the port is the port on
    which the clients can talk to the Resource Manager. </description>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/tmp/nm-local-dir</value>
    <description>the local directories used by the nodemanager</description>
</property>
<property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:0</value>
    <description>the nodemanagers bind to this port</description>
</property>
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>8192</value>
    <description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
    <description>directory on hdfs where the application logs are moved to </description>
</property>
   <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>/tmp/logs</value>
    <description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
</property>

Some default configuration:
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-common/core-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/yarn-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Create Symlinks
>cd $HADOOP_COMMON_HOME/share/hadoop/common/lib/
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-app-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-jobclient-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-common-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-shuffle-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-core-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-common-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-api-*-SNAPSHOT.jar .

Running daemons
Run Resourcemanager and NodeManager
>cd $HADOOP_MAPRED_HOME
>bin/yarn-daemon.sh start resourcemanager
>bin/yarn-daemon.sh start nodemanager

Run the example
>$HADOOP_COMMON_HOME/bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar randomwriter out
>bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar grep input output 'YARN[a-zA-Z.]+'
>cat output/*
1       YARNtestforfun

http://192.168.56.101:8088/cluster
http://192.168.56.101:9999/node

2. Cluster
>cd conf
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-common/core-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/yarn-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

HADOOP_PREFIX_HOME=/usr/local/hadoop-0.23.0
YARN_HOME=/usr/local/hadoop-0.23.0

>bin/start-all.sh
>$YARN_HOME/bin/yarn start historyserver --config $HADOOP_CONF_DIR
>$HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR

>$YARN_HOME/bin/yarn historyserver --config $HADOOP_CONF_DIR
http://192.168.56.101:19888/jobhistory

>$YARN_HOME/bin/yarn resourcemanager --config $HADOOP_CONF_DIR
>$YARN_HOME/bin/yarn nodemanager --config $HADOOP_CONF_DIR
>$YARN_HOME/bin/yarn proxyserver --config $HADOOP_CONF_DIR

>sbin/hadoop-daemon.sh namenode --config $HADOOP_CONF_DIR start namenode

It seems ok, but without the web UI interface.

I will prepare some slave machines and try some examples.

references:
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
http://hadoop.apache.org/common/docs/r0.19.2/cn/cluster_setup.html
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html

热点排行

编程

hadoop(二)The version hadoop 0.23.0 on Ubuntu