Build Script¶

src/ops_hpc/build_opensees_full.sh runs on the VM as root. It installs everything needed to run OpenSees with Intel MPI on Rocky Linux 8.

What it installs¶

Intel oneAPI — icx (C), icpx (C++), ifx (Fortran), MKL, Intel MPI
MUMPS 5.5.1 — sparse direct solver, built with Intel compilers
OpenSees — sequential, SP (domain decomposition), MP (multi-interpreter)
System commands — opensees, openseessp, openseesmp, opensees_build

Build steps¶

Step	What	Time
[1/8]	Add Intel oneAPI repo	<1 min
[2/8]	Install oneAPI + tools (91 packages, 12 GB)	~5 min
[2b]	Kernel tuning (hugepages, shmmax)	<1 min
[3/8]	Source oneAPI environment	<1 min
[4/8]	Python venv + Conan package manager	~2 min
[5/8]	Build MUMPS with Intel compilers	~2 min
[6/8]	Clone OpenSees from GitHub	~1 min
[7/8]	Build OpenSees (3 targets, -j32)	~10 min
[8/8]	Create run scripts + verify	<1 min

Total: ~25-35 minutes on h3-standard-88.

Build location¶

Everything goes to /opt/opensees/:

/opt/opensees/
    OpenSees/          # source + build
    mumps/             # MUMPS source + build
    buildenv/          # Python venv with Conan

The build path is hardcoded to avoid permission issues with user home directories on GCP (gcloud creates homes with restrictive permissions).

Run scripts¶

The build creates wrapper scripts in /opt/opensees/ symlinked to /usr/local/bin/:

opensees → sequential Tcl interpreter
openseessp → parallel domain decomposition (mpirun -np N OpenSeesSP)
openseesmp → parallel multi-interpreter (mpirun -np N OpenSeesMP)
opensees_build → rebuild all targets

Each wrapper:

Sources Intel oneAPI environment
Reads config from /etc/opensees/opensees.conf
Sets MKL threading (1 thread per rank)
Sets Intel MPI pinning (sequential, one rank per core)
Runs the binary

Intel MPI tuning¶

The run scripts set these environment variables for optimal h3-standard-88 performance:

MKL_NUM_THREADS=1           # 1 MKL thread per MPI rank
MKL_DYNAMIC=FALSE
I_MPI_PIN=on
I_MPI_PIN_DOMAIN=core
I_MPI_FABRICS=shm           # shared memory (single node)
I_MPI_SHM_HEAP_VSIZE=4096   # 4 GB shared memory heap

Note: I_MPI_PIN_ORDER=scatter was removed. Sequential rank placement (the default) works correctly for both single-job and multi-job scenarios. Scatter caused 10x slowdowns when running two simultaneous analyses because ranks from both jobs were interleaved across all 4 NUMA nodes (h3-standard-88 = 2 sockets x 44 cores, SNC2 per socket). For multi-job runs, pin each job to its own socket with I_MPI_PIN_PROCESSOR_LIST=0-43 / 44-87.

Known issues¶

GPG key: Must rpm --import Intel key before dnf install or signature check fails
Python 3.6: Too old for Conan — script auto-detects Python 3.11
-j88: Too many parallel compiles — capped to -j32
OpenSeesPy: cmake needs Python headers even though we don't build it — script provides Python_INCLUDE_DIRS