Linpack的安装部署
需要的软件包:
1> mpi运行环境,这里我们使用的是:mpich2-1.5.tar.gz
2>矩阵库GOTOBLAS,我们使用的是:GotoBLAS2-1.13.tar.gz
3>linpack测试软件包:hpl-2.1.tar.gz
安装过程:
1> GOTOBLAS2代数库的安装
查看cpu架构:cat /proc/cpuinfo
[root@compute-0 ~]# cat /proc/cpuinfo
processor: 0
vendor_id: AuthenticAMD
cpu family: 16
model: 5
model name: AMD Athlon(tm) II X4 620 Processor
# 注:我的是AMD的架构 如果你的是intel 应该是CORE2的架构
stepping: 2
cpu MHz: 2600.147
cache size: 512 KB
fpu: yes
fpu_exception: yes
cpuid level: 5
wp: yes
flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc up rep_good tsc_reliable nonstop_tsc unfair_spinlock pni cx16 x2apic popcnt hypervisor lahf_lm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw
bogomips: 5200.29
TLB size: 1024 4K pages
clflush size: 64
cache_alignment: 64
address sizes: 40 bits physical, 48 bits virtual
power management:
修改Makefile.rul (注:对Makefile.rul修改,用户可以更加针对自己的硬件平台进行安装,因此效率会高很多。
所有架构、编译器的选择,多线程的设置等都是修改这个文件。)
#
# Beginning of user configuration
#
# This library's version
VERSION = 1.13
# You can specify the target architecture, otherwise it's
# automatically detected.
# TARGET = PENRYN
# If you want to support multiple architecture in one binary
# DYNAMIC_ARCH = 1
# C compiler including binary type(32bit / 64bit). Default is gcc.
# Don't use Intel Compiler or PGI, it won't generate right codes as I expect.
CC = gcc
# Fortran compiler. Default is g77.
# FC = gfortran
# Even you can specify cross compiler
# CC = x86_64-w64-mingw32-gcc
# FC = x86_64-w64-mingw32-gfortran
# If you need 32bit binary, define BINARY=32, otherwise define BINARY=64
BINARY=64
# About threaded BLAS. It will be automatically detected if you don't
# specify it.
# For force setting for single threaded, specify USE_THREAD = 0
# For force setting for multi threaded, specify USE_THREAD = 1
USE_THREAD = 1
# If you're going to use this library with OpenMP, please comment it in.
USE_OPENMP = 1
# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by the the script.
# NUM_THREADS = 24
# If you don't need CBLAS interface, please comment it in.
# NO_CBLAS = 1
# If you want to use legacy threaded Level 3 implementation.
# USE_SIMPLE_THREADED_LEVEL3 = 1
# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure(equivalent to "-i8" option).
# INTERFACE64 = 1
# Unfortunately most of kernel won't give us high quality buffer.
# BLAS tries to find the best region before entering main function,
# but it will consume time. If you don't like it, you can disable one.
# NO_WARMUP = 1
# If you want to disable CPU/Memory affinity on Linux.
# NO_AFFINITY = 1
# If you would like to know minute performance report of GotoBLAS.
# FUNCTION_PROFILE = 1
# Support for IEEE quad precision(it's *real* REAL*16)( under testing)
# QUAD_PRECISION = 1
# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
# which corresponds to (1 << n) cycles. For example, if you set to 26,
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber by GOTO_THREAD_TIMEOUT
# CCOMMON_OPT += -DTHREAD_TIMEOUT=26
# Using special device driver for mapping physically contigous memory
# to the user space. If bigphysarea is enabled, it will use it.
# DEVICEDRIVER_ALLOCATION = 1
# If you need to synchronize FP CSR between threads (for x86/x86_64 only).
# CONSISTENT_FPCSR = 1
# If you need santy check by comparing reference BLAS. It'll be very
# slow (Not implemented yet).
# SANITY_CHECK = 1
# Common Optimization Flag; -O2 is enough.
COMMON_OPT += -O2
# Profiling flags
COMMON_PROF = -pg
#
# End of user configuration
#
执行make , 成功之后会显示:
GotoBLAS build complete.
OS ... Linux
Architecture ... x86_64
BINARY ... 64bit
C compiler ... GCC (command line : gcc)
Fortran compiler ... G77 (command line : g77)
Library Name ... libgoto2_barcelona-r1.13.a (Single threaded)
在gotoblas2的目录下多出几个文件:这两个文件就是我们后面用到的库文件。
lrwxrwxrwx 1 root root 23 Mar 28 22:14 libgoto2.a -> libgoto2_athlon-r1.13.a
-rw-r--r-- 1 root root 5235402 Mar 28 22:18 libgoto2_athlon-r1.13.a
-rwxr-xr-x 1 root root 2503038 Mar 28 22:18 libgoto2_athlon-r1.13.so
lrwxrwxrwx 1 root root 24 Mar 28 22:18 libgoto2.so -> libgoto2_athlon-r1.13.so
2> mpi运行环境的安装
省略。
3>HPL的安装
1. 从setup里面,复制出来符合自己系统的Make文件,这里我的是AMD的,我复制出
来的是Make.Linux_ATHLON_CBLAS,如果你是Intel的应该复制Make.Linux_PII_CBLAS
到上一级目录。
2. 修改该Make.Linux_ATHLON_CBLAS文件
#
# ----------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------
#
TOPdir = /home/houqd/hpl-2.1 # hpl的目录
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
# ----------------------------------
# - MPI directories - library ------------------------------------------
# ----------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /home/houqd/mpich2.1.5 # mpi的路径
MPinc = -I$(MPdir)/src/include
MPlib = $(MPdir)/lib/.libs/libmpich.a # 这个需要注意一下,安装完mpi的目录需要看一下,可能会有些不同
#
# ----------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /home/houqd/gotoblas2 # gotoblas2的安装目录
LAinc =
LAlib = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so
#
# ----------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------
#
CC = /usr/local/bin/mpicc # mpicc的路径
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
LINKER = /usr/local/bin/mpicc
LINKFLAGS = $(CCFLAGS)
3. 执行make arch=Linux_ATHLON_CBLAS
完成后在bin目录的Linux_ATHLON_CBLAS下面将产生测试文件
HPL.dat和xhpl
在lib目录的Linux_ATHLON_CBLAS下面将产生库文件
libhpl.a 完成后显示:
T/V N NB P Q Time Gflops
--------------------------------------------
WR00R2R4 35 4 4 1 0.61 5.008e-05
HPL_pdgesv() start time Thu Mar 28 20:47:20 2013
HPL_pdgesv() end time Thu Mar 28 20:47:21 2013
--------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0164438 ...... PASSED
================================================================================
Finished 864 tests with the following results:
864 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------
mpirun -np 4 ./xhpl
当编译MPI程序并且编译通过,但是执行时报错
[root@Master Linux_ATHLON_CBLAS]# mpirun -np 4 xhpl
[proxy:0:0@Master.Hadoop] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory)
[proxy:0:0@Master.Hadoop] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory)
[proxy:0:0@Master.Hadoop] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory)
[proxy:0:0@Master.Hadoop] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 255
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
此时很可能是由于执行程序时没有使用绝对路径。
只要使用绝对路径执行程序即可,如下:
$mpicc -o cpi cpi.c
$mpirun -np 4 ./cpi(一定要加“./”)
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 29 30 34 35
NB : 1 2 3 4
PMAP : Row-major process mapping
P : 2 1 4
Q : 2 4 1
PFACT : Left Crout Right
NBMIN : 2 4
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words