Software Required to Build GridPACK

From GridPACK
Revision as of 14:37, 19 April 2018 by Bjpalmer (talk | contribs) (Linux Basics)

Jump to: navigation, search

Building on Specific Platforms

Scripts or instructions for installing these libraries on specific platforms can be found by following the links below. We highly recommend that you find an example that resembles the platform you are intending to use. Use the scripts for that example as the basis for building GridPACK on your own system. Most scripts require only minor modifications to get them working. These usually involve changing file names so that they reflect your local directory structure.

Many clusters use modules to install software such as the compilers, MPI libraries, CMake, Git, etc. If you have such a system, you can install compilers, MPI and CMake using a set of commands such as

  module purge
  module load gcc
  module load openmpi
  module cmake

These automatically install the libraries and modify environment variables so that binaries and executables are in your path. Many systems may also have modules for Boost and PETSc, but we urge users to be more cautious about using these. These libraries are frequently not built with features that may be needed by individual applications. For example, system builds of Boost often lack MPI and PETSc is frequently missing the C++ interface. Users are probably better off using their own library builds. This also guarantees that all libraries are built with the same compiler and version of MPI. We suggest that you create a library under your home directory that can be used to store all the libraries used to build GridPACK and build and install all libraries (with the possible exception of MPI) in this directory.

Some basic knowledge of Linux is also necessary for downloading libraries and constructing build scripts. This is discussed in more detail at the bottom of this page.

CMake/CTest

GridPACK uses the CMake cross-platform build system. A reasonably modern version should be used. Currently, we require version 2.8.8 or above. You can check which version of CMake is on your machine by typing

 cmake -version

CMake projects are designed to be built outside of the source code location. In the top directory of a GridPACK release (GRIDPACK/src) create a subdirectory to use as the location of the build. (In this documentation, GRIDPACK stands for the location of the top level GridPACK directory.) The GRIDPACK/src directory should contain a file called CMakeList.txt. Configure and build GridPACK in the subdirectory.

MPI

A working MPI implementation is required. OpenMPI and MPICH been used successfully. We have used OpenMPI 1.8.3 and MPICH 3.2 recently. Other implementations have also been used successfully. Most MPI installations have compiler wrappers, such as mpicc and mpicxx that combine the compiler with all the directives needed to link to the MPI libraries. These can be used directly in the GridPACK configuration.

Identify the compilers and mpiexec to the configuration by including cmake options like:

   -D MPI_CXX_COMPILER:STRING='mpicxx' \
   -D MPI_C_COMPILER:STRING='mpicc' \
   -D MPIEXEC:STRING='mpiexec' \

You can check to see if these wrappers are available on your system by typing

 which mpicc

If the wrapper is available, you should see a listing pointing to the location of the wrapper. If you don't, you will need to modify your environment to include it. Note that although mpicc and mpicxx are fairly common names for the compiler wrappers, there is no standard and other implementations of MPI may use something completely different. Check with your system consultant for more information. Depending on the version of MPI you are using, you may be able to find out more information by typing

 mpicxx -v

Other options may be needed by CMake to specify the MPI environment. See the documentation here.

Global Arrays

GridPACK depends heavily on "Global Arrays". The GA libraries used with GridPACK must have the C++ interface be enabled and the Fortran interface disabled. The GridPACK configuration is not able to identify additional required libraries if the Fortran interface is enabled or independent BLAS/LAPACK libraries are used.

To configure GridPACK, specify the directory where Global Arrays is installed and any extra libraries that are required:

   -D GA_DIR:PATH=/path/to/ga/install \
   -D GA_EXTRA_LIBS:STRING="..." \
   -D USE_PROGRESS_RANKS:BOOL=FALSE \

The GA_EXTRA_LIBS variable is used to include required libraries not identified in the configuration. The USE_PROGRESS_RANKS variable depends on the runtime used to build GA.

We have used three versions of the GA build to run GridPACK. If you are using GridPACK on a Linux cluster with an Infiniband interconnect, then you can use the OpenIB runtime (--with-openib option) for building GA. This is the highest performing version of GA for clusters with Infiniband, although for large calculations you can run into problems with memory allocation. For any system with a working version of MPI, you can also use the MPI two-sided runtime (--with-mpi-ts) or the progress ranks runtime (--with-mpi-pr). The two-sided runtime is the simplest runtime and is suitable for workstations with a limited number of cores (our experience is that you should limit this runtime to 8 or less processors). The two-sided runtime provides reasonable performance on a small number of cores but slows down considerably at larger core counts. It is not recommended for large-scale parallel computation. The progress ranks runtime is much higher performing and approaches the performance of the OpenIB runtime. It is very reliable and runs on any platform that supports MPI. However, it has one peculiarity in that it reserves on MPI process on each SMP node to act as a communication manager. Thus, if you are running your calculation on 2 nodes with 5 processes on each node, the GridPACK application will only see 8 processes (4 on each node). To make sure that the GridPACK build is aware of this, the USE_PROGRESS_RANKS parameter should be set to TRUE when using the progress ranks build of GA.

It is also important to build GA with the C++ interface and without the Fortran interface and the BLAS libraries (which are downloaded and built with PETSc). The following options (or their equivalents) should be included in the GA configuration step

   --enable-cxx --without-blas --disable-f77

Boost

The Boost C++ Library is used heavily throughout the GridPACK framework, and a relatively recent version is required. The configuration requires version 1.49 or later, but older versions may work. The Boost installation must include Boost::MPI which must have been built with the same MPI compiler used for GridPACK.

To configure GridPACK one need only specify where Boost is installed, like this

   -D BOOST_ROOT:STRING='/path/to/boost' \

Boost is tied quite closely to the latest features in C++ and problems can be encountered if the version of Boost that you are using was released much later than the compiler. Reverting to an earlier Boost version can sometimes eliminate problems if you are having difficulties building it. The same is true for Boost and CMake. If the CMake version was released earlier than the Boost version, CMake may have problems identifying the libraries in Boost that it needs for GridPACK. Again, going to an earlier version of Boost may fix these issues.

PETSc

GridPACK currently relies on the Portable, Extensible Toolkit for Scientific Computation (PETSc) for parallel linear algebra, and linear and nonlinear system solvers. The PETSc interface tends to change a bit as new releases come out, requiring adjustments in any applications that use it. We have currently used PETSc versions 3.4-3.7 with GridPACK.

PETSc is a complicated package with numerous options. PETSc needs to be built with MPI enabled and using the same MPI implementation used for GridPACK. It also needs to use C++ as the base language. Originally, GridPACK could only use PETSc if it was configured for complex support. The current GridPACK release can use either complex or real builds. However, most applications in GridPACK use complex matrices, so it is still preferable to configure PETSc to use complex variables. Refer to the PETSc installation documentation for additional information on how configuring PETSc.

Configuring and building PETSc is done in the top level PETSc directory. One of the configuration variables that needs to be set when configuring and building PETSc is PETSC_ARCH. In the example below, PETSC_ARCH was set to 'arch-Darwin-cxx-opt'. After the build is complete, there will be a directory beneath the top level directory with whatever name was assigned to PETSC_ARCH. This directory contains the include and lib directories for the PETSc libraries.

The GridPACK configuration must know where PETSc is installed. This is specified by two options as shown below.

   -D PETSC_DIR:STRING='/Users/d3g096/ProjectStuff/petsc-3.4.0' \
   -D PETSC_ARCH:STRING='arch-darwin-cxx-opt' \

Currently, the configuration will recognize and adjust the GridPACK build if the PETSc build includes ParMETIS, Superlu_DIST and/or MUMPS. Many of the example GridPACK applications expect a parallel direct linear solver to be built into PETSc. This is satisfied by including Superlu_DIST or MUMPS in the PETSc build.

ParMETIS

GridPACK uses ParMETIS to (re)distribute an electrical network over several processors. It needs to be built with the same MPI configuration as Boost and PETSc. GridPACK configuration will find ParMETIS automatically if it has been included in the PETSc build. Otherwise, the GridPACK configuration just needs to know where ParMETIS was installed, which is specified by

   -D PARMETIS_DIR:STRING="/pic/projects/gridpack/software" \

GridPACK requires version ParMETIS version 4.0. Older versions will not work. On most systems, it is straightforward to download and build ParMETIS as part of the PETSc build. We highly recommend that you do this to access ParMETIS.

Doxygen

GridPACK uses Doxygen to help document code. It's use is optional. Doxygen documentation can optionally be prepared during the build process. This is enabled if Doxygen is found. Graphviz is necessary for full documentation features.

Linux Basics

Some basic knowledge of Linux is necessary in order to build GridPACK. Familiarity with a Linux editor such as VIM or EMACS is required. Extensive documentation is readily available for both these editors, both online and in books. In addition, you will need to download tarballs (.tar or .tar.gz files) of the Boost, PETSc and GA libraries, uncompress them, and then configure and build the libraries. If you usually download files using a Windows machine, you can download a library tarball using Windows and then copy it to your Linux platform using the WinSCP utility. This will allow you to transfer your files from Windows to Linux in a straightforward way. You may also be able to download directly to a Linux platform by bringing up a browser such as Firefox from the Linux command prompt, going to the appropriate download site and downloading directly to your Linux directory.

Once you have a tarball downloaded to your software directory, the next step is to uncompress the file into its own directory. For example, if you have downloaded the Boost tarball for version 1.65.0, you would see the following file in your directory

   boost_1_65_0.tar.gz

The .tar extension means that all the files in the boost directory have been concatenated into a single file using the Linux tar command. The .gz extension means that the tarball has been further compressed using the gzip command. You can uncompress the file and untar it using the single command

   tar xvf boost_1_65_0.tar.gz

This will produce a directory

   boost_1_65_0

in the same directory that the tarball is located in and will contain all the individual files and subdirectories contained in the Boost library.

Once you have the Boost directory, the next step is to cd into it and create a script for building Boost. This would consist of creating a file at the top of the boost_1_65_0 directory with a name such as build.sh that contains the commands for configuring Boost. On a Redhat Linux cluster using the GNU compilers, you would use these lines

   echo "using mpi ;" > ~/user-config.jam
   sh ./bootstrap.sh \
       --prefix="/my_home_directory/software/boost_1_65_0" \
       --without-icu \
       --with-toolset=gcc \
       --without-libraries=python
   ./b2 -a -d+2 link=static stage
   ./b2 -a -d+2 link=static install
   rm ~/user-config.jam

Note that the argument to --prefix is the path to the Boost directory that you are currently in. Once these lines have been copied into the build.sh file, the file needs to made executable by changing its permissions with the command

   chmod +x build.sh

The script can then be run (in the boost_1_65_0 directory) by typing

   ./build.sh

This will configure and build Boost. It is not strictly necessary to put these commands in a script. They will also work by just typing them into the the Linux command line. However, for such a long set of commands, it is obviously more desirable to avoid a mistake by using the script.