hadoop的hdfs的namnode源码分析(一)
RandomAccessFile
此类的实例支持对随机访问文件的读取和写入。随机访问文件的行为类似存储在文件系统中的一个大型 byte 数组。存在指向该隐含数组的光标或索引,称为文件指针;输入操作从文件指针开始读取字节,并随着对字节的读取而前移此文件指针。如果随机访问文件以读取/写入模式创建,则输出操作也可用;输出操作从文件指针开始写入字节,并随着对字节的写入而前移此文件指针。写入隐含数组的当前末尾之后的输出操作导致该数组扩展。该文件指针可以通过 getFilePointer 方法读取,并通过 seek 方法设置。
通常,如果此类中的所有读取例程在读取所需数量的字节之前已到达文件末尾,则抛出 EOFException(是一种 IOException)。如果由于某些原因无法读取任何字节,而不是在读取所需数量的字节之前已到达文件末尾,则抛出 IOException,而不是 EOFException。需要特别指出的是,如果流已被关闭,则可能抛出 IOException。
?
?相关算法
生成NameSpaceID的算法
FSImage.new
?
?
?
?
?
?a). loadNamesystem加载名称系统
FSNamesystem.initialize
?
??
BlockManager
Keeps information related to the blocks stored in the Hadoop cluster. This class is a helper class for?
FSNamesystemand requires several methods to be called with lock held on?FSNamesystem.保持存储在Hadoop集群的块相关的信息。这个类是一个FSNamesystem helper类,需要几种方法 获得FSNamesystem的锁。
?
??
(A).FSNamesystem.activate
? /**
??
?
?
?*/MetricsServlet
/**
?* A servlet to print out the running configuration data.
?*/ConfServlet
httpServer 的使用
name.node:this
name.node.address:localhost/127.0.0.1:60914
name.system.image:org.apache.hadoop.hdfs.server.namenode.FSImage@244f74
name.conf: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml
addInternalServlet
/**
?* Serve delegation tokens over http for use in hftp.?* 通过在 hftp 中使用 http 服务代表团令牌。
?*/getDelegationToken:/getDelegationToken,DelegationTokenServlet.class
/**
?* This class is used in Namesystem's web server to do fsck on namenode.?*这个类是用来在Namesystem Web服务器上做的NameNode的fsck。
?*/fsck:/fsck,FsckServlet.class
/**
?* This class is used in Namesystem's jetty to retrieve a file.
?* Typically used by the Secondary NameNode to retrieve image and
?* edit file for periodic checkpointing.?* 这个类用于获取NameSystem‘s jetty 来获取文件。典型的用于第二个NameNode获取image和editlog file达到定期检查的目地
?*/getimage, /getimage, GetImageServlet.class
/**
* 获取文件系统的元数据信息
?* Obtain meta-information about a filesystem.
?* @see org.apache.hadoop.hdfs.HftpFileSystem
?*/listPaths, /listPaths/*, ListPathsServlet.class
/** Redirect queries about the hosted filesystem to an appropriate datanode.
*关于hosted的文件系统重定查询到合适的datanode。
?* @see org.apache.hadoop.hdfs.HftpFileSystem
?*/data, /data/*, FileDataServlet.class
/** Redirect file checksum queries to an appropriate datanode. */
/**重定向文件的校验和查询,以适当的datanode*/
checksum, /fileChecksum/*,FileChecksumServlets.RedirectServlet.class
/** Servlets for file checksum */
contentSummary,/contentSummary/*, ContentSummaryServlet.class
this.httpServer.start();
?
(C).Server.start
?
?
?
?
d). 启动数据节点
DefaultUri: hdfs://localhost:46620
??? // Set up the right ports for the datanodes
"dfs.datanode.address", "127.0.0.1:0"
"dfs.datanode.http.address", "127.0.0.1:0"
"dfs.datanode.ipc.address", "127.0.0.1:0"
目录
/data/dfs/data/data1
/data/dfs/data/data2
?
?
??for (int i = curDatanodesNum; i < curDatanodesNum+numDataNodes; i++) {
??
DataNode.startDataNode 注册数据节点
?
?
??
FSDatasetInterface
/**
?* This is an interface for the underlying storage that stores blocks for
?* a data node.
?* Examples are the FSDataset (which stores blocks on dirs)? and
?* SimulatedFSDataset (which simulates data).
?*这是一个操作数据节点底层块块存储的接口。例如FSDataset(存储在DIRS上的块)和SimulatedFSDataset(模拟数据)
?*/
?class SimulatedFSDataset? implements FSConstants, FSDatasetInterface, Configurable
/**
?* This class implements a simulated FSDataset.
?*
?* Blocks that are created are recorded but their data (plus their CRCs) are
?*? discarded.
?* Fixed data is returned when blocks are read; a null CRC meta file is
?* created for such data.
?*
?* This FSDataset does not remember any block information across its
?* restarts; it does however offer an operation to inject blocks
?*? (See the TestInectionForSImulatedStorage()
?* for a usage example of injection.
?*
?* Note the synchronization is coarse grained - it is at each method.这个类实现了模拟FSDataset。创建的块被记录,但他们的数据(加上他们的CRC)被丢弃。固定的数据块被读取时,返回一个空的CRC元文件是对这些数据的创建。这FSDataset不记得块在其重新启动的任何信息,但它提供的运作注入了注射的用法的例子块(请参阅TestInectionForSImulatedStorage()注意同步的是粗粒 - 它在每个方法。
?crc:checksum stream
?*/?
FSDataset.FSDataset
/**************************************************
?* FSDataset manages a set of data blocks.? Each block
?* has a unique name and an extent on disk.
?*FSDataset管理的数据块集合。每个块都有一个唯一的名称和磁盘上的extent。
?***************************************************/
DataXceiverServer
/**
?* Server used for receiving/sending a block of data.
?* This is created to listen for requests from clients or
?* other DataNodes.? This small server does not use the
?* Hadoop IPC mechanism.
?*/服务器用于接收/发送data的块.This被创建用于监听来自客户或其他DataNodes的请求。这种小型服务器不使用Hadoop的IPC机制。
DataXceiver
/**
?* Thread for processing incoming/outgoing data stream.?*? 这个线程处理输入/输出数据流
?*/DataStorage
/**
?* Data storage information file.?* 数据存储信息文件
?*/
Storage
/**
?* Storage information file.
?* <p>
?* Local storage information is stored in a separate file VERSION.
?* It contains type of the node,
?* the storage layout version, the namespace id, and
?* the fs state creation time.
?* <p>
?* Local storage can reside in multiple directories.
?* Each directory should contain the same VERSION file as the others.
?* During startup Hadoop servers (name-node and data-nodes) read their local
?* storage information from them.
?* <p>
?* The servers hold a lock for each storage directory while they run so that
?* other nodes were not able to startup sharing the same storage.
?* The locks are released when the servers stop (normally or abnormally).
?*
?*/存储信息的文件。
本地存储的信息是存储在一个单独的文件版本。它包含类型的节点,存储布局的版本,命名空间的ID,和FS状态创建时间
本地存储可以驻留在多个目录中。每个目录应该包含为同一版本的其他文件。在启动Hadoop的服务器(名称节点和数据节点)从其中读取其本地存储信息。
同时运行,这样,其他节点无法启动共享相同的存储,服务器持有锁定每个存储目录。锁被释放时,服务器停止(正常或异常)。
/**
?? * This method starts the data node with the specified conf.
?? * 这个方法通过制定参数启动数据节点
?? */DataNode.startDataNode
?
??
?
??
DataBlockScanner
Performs two types of scanning:
Gets block files from the data directories and reconciles the difference between the blocks on the disk and in memory in FSDataset
Scans the data directories for block files and verifies that the files are not corrupt
This keeps track of blocks and their last verification times. Currently it does not modify the metadata for block.执行两种类型的扫描:
获取块从数据目录的文件和和解磁盘上的块之间的差异,并在内存中FSDataset
扫描数据块文件的目录,并验证该文件未损坏
这使跟踪块和他们最后的验证时间。目前,它不修改块的元数据。?
?
???
?
DataNode.runDatanodeDaemon
对应NameNode的register方法, 其中一个是版本校验, 第二个是主要逻辑
FSNamesystem.registerDatanode
/////////////////////////////////////////////////////////
? //
? // These methods are called by datanodes
? //
? /////////////////////////////////////////////////////////Register Datanode.
The purpose of registration is to identify whether the new datanode serves a new data storage, and will report new data block copies, which the namenode was not aware of; or the datanode is a replacement node for the data storage that was previously served by a different or the same (in terms of host:port) datanode. The data storages are distinguished by their storageIDs. When a new data storage is reported the namenode issues a new unique storageID.
Finally, the namenode returns its namespaceID as the registrationID for the datanodes. namespaceID is a persistent attribute of the name space. The registrationID is checked every time the datanode is communicating with the namenode. Datanodes with inappropriate registrationID are rejected. If the namenode stops, and then restarts it can restore its namespaceID and will continue serving the datanodes that has previously registered with the namenode without restarting the whole cluster.
注册Datanode。
注册的目的是确定是否新datanode提供一个新的数据存储,并报告新的数据块拷贝,NameNode的是不知道的;或datanode是由以前担任更换为数据存储节点不同或相同(主机:端口)datanode。数据存放的区别在于他们storageIDs。当一个新的数据存储报道NameNode的问题一个新的唯一storageID。
最后,Namenode会返回的datanodes /** Replica is finalized. The state when replica is not modified. */副本定稿。副本没有被修改时状态。 FINALIZED(0), /** Replica is being written to. */副本正在写入。 RBW(1), /** Replica is waiting to be recovered. */副本等待恢复。 RWR(2), /** Replica is under recovery. */副本在恢复中 RUR(3), /** Temporary replica: created for replication and relocation only. */临时副本:创建仅用于复制和重新安置。 TEMPORARY(4);块副本状态,它可以通过正在建造的同时。?
BlockInfo
/**
?* Internal class for block metadata.
?*/
块的元数据内部类
BlockInfo.triplets
? /**
? ?* This array contains triplets of references.
? ?* For each i-th datanode the block belongs to
? ?* triplets[3*i] is the reference to the DatanodeDescriptor
? ?* and triplets[3*i+1] and triplets[3*i+2] are references?
? ?* to the previous and the next blocks, respectively, in the?
? ?* list of blocks belonging to this data-node.
? ?*/
此数组包含引用3个引用。每一个第i?datanode的triplets[3*i]?指向DatanodeDescriptor和triplets[3*?I +1]和triplets[3*?I +2]引用到以前的和下一个块,分别属7于这个数据节点的块列表。
?