Nutch相关框架视频教程20
第二十讲(22分钟)
1、Ganglia以单播方式监控跨多个网段的单一集群
vi? /etc/ganglia/gmetad.conf
data_source "hadoop-cluster" ??10 ?host6
?
/etc/init.d/gmetad restart
?
在集群的所有节点中指定以下配置:
vi /etc/ganglia/gmond.conf
指定集群名称:
cluster {? name = "hadoop-cluster"? owner = "unspecified"? latlong ="unspecified"? url ="unspecified"}
指定接收数据的节点:
udp_send_channel {
? # mcast_join = 239.2.11.71
? host = host6
? port = 8649
? ttl = 1
}
udp_recv_channel {
? # mcast_join = 239.2.11.71
? port = 8649
? # bind = 239.2.11.71
}
/etc/init.d/ganglia-monitor restart
2、配置Hadoop集群使用单播地址
vi conf/hadoop-metrics2.properties
设置内容为:
??#大于0.20以后的版本用ganglia31??
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
??*.sink.ganglia.period=10
??# default for supportsparse is false
??*.sink.ganglia.supportsparse=true
?*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
?*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
??namenode.sink.ganglia.servers=host6
??datanode.sink.ganglia.servers= host6
??jobtracker.sink.ganglia.servers= host6
??tasktracker.sink.ganglia.servers= host6
??maptask.sink.ganglia.servers= host6
??reducetask.sink.ganglia.servers= host6
??dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
??dfs.period=10
??dfs.servers= host6
??mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
??mapred.period=10
??mapred.servers= host6
??jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
??jvm.period=10
??jvm.servers= host6
?
把配置文件复制到集群其他节点,重启集群。
3、扩展集群,节点分别位于3个不同网段
将host226重新加入集群,并新增host138
在host6和host8的include文件中加入host226和host138
在host6和host8的slaves文件中加入host226和host138
在新增的节点host138上面执行:
指定主机名
vi? /etc/hostname
指定主机名到IP地址的映射
vi? /etc/hosts
增加用户和组
addgrouphadoop
adduser--ingroup hadoop hadoop
更改临时目录权限
chmod777 /tmp
在host2和host8?上面配置对host138的SSH登陆:
ssh-copy-id? -i?.ssh/id_rsa.pub? hadoop@host138
在host2上将hadoop文件复制到host138:
scp-r /home/hadoop/hadoop-1.1.2? hadoop@host138:/home/hadoop/hadoop-1.1.2
?
如果集群已经在运行,则在host226和host138上面执行以下命令以动态增加节点:
hadoop-daemon.sh?start ?datanode
hadoop-daemon.sh?start ?tasktracker
4、配置host138
在host138上安装数据收集服务:
创建用户和组:
addgroup ?ganglia
adduser ?--ingroup ?ganglia ?ganglia
安装:
apt-get ?install? ?ganglia-monitor
配置gmond:
vi /etc/ganglia/gmond.conf
指定集群名称:
cluster {? name = "hadoop-cluster"? owner = "unspecified"? latlong ="unspecified"? url ="unspecified"}
指定接收数据的节点:
udp_send_channel {
? # mcast_join = 239.2.11.71
? host = host6
? port = 8649
? ttl = 1
}
udp_recv_channel {
? # mcast_join = 239.2.11.71
? port = 8649
? # bind = 239.2.11.71
}
/etc/init.d/ganglia-monitor? restart