Build Script¶
src/ops_hpc/build_opensees_full.sh runs on the VM as root. It installs everything needed to run OpenSees with Intel MPI on Rocky Linux 8.
What it installs¶
- Intel oneAPI — icx (C), icpx (C++), ifx (Fortran), MKL, Intel MPI
- MUMPS 5.5.1 — sparse direct solver, built with Intel compilers
- OpenSees — sequential, SP (domain decomposition), MP (multi-interpreter)
- System commands —
opensees,openseessp,openseesmp,opensees_build
Build steps¶
| Step | What | Time |
|---|---|---|
| [1/8] | Add Intel oneAPI repo | <1 min |
| [2/8] | Install oneAPI + tools (91 packages, 12 GB) | ~5 min |
| [2b] | Kernel tuning (hugepages, shmmax) | <1 min |
| [3/8] | Source oneAPI environment | <1 min |
| [4/8] | Python venv + Conan package manager | ~2 min |
| [5/8] | Build MUMPS with Intel compilers | ~2 min |
| [6/8] | Clone OpenSees from GitHub | ~1 min |
| [7/8] | Build OpenSees (3 targets, -j32) | ~10 min |
| [8/8] | Create run scripts + verify | <1 min |
Total: ~25-35 minutes on h3-standard-88.
Build location¶
Everything goes to /opt/opensees/:
/opt/opensees/
OpenSees/ # source + build
mumps/ # MUMPS source + build
buildenv/ # Python venv with Conan
The build path is hardcoded to avoid permission issues with user home directories on GCP (gcloud creates homes with restrictive permissions).
Run scripts¶
The build creates wrapper scripts in /opt/opensees/ symlinked to /usr/local/bin/:
opensees→ sequential Tcl interpreteropenseessp→ parallel domain decomposition (mpirun -np N OpenSeesSP)openseesmp→ parallel multi-interpreter (mpirun -np N OpenSeesMP)opensees_build→ rebuild all targets
Each wrapper:
- Sources Intel oneAPI environment
- Reads config from
/etc/opensees/opensees.conf - Sets MKL threading (1 thread per rank)
- Sets Intel MPI pinning (sequential, one rank per core)
- Runs the binary
Intel MPI tuning¶
The run scripts set these environment variables for optimal h3-standard-88 performance:
MKL_NUM_THREADS=1 # 1 MKL thread per MPI rank
MKL_DYNAMIC=FALSE
I_MPI_PIN=on
I_MPI_PIN_DOMAIN=core
I_MPI_FABRICS=shm # shared memory (single node)
I_MPI_SHM_HEAP_VSIZE=4096 # 4 GB shared memory heap
Note: I_MPI_PIN_ORDER=scatter was removed. Sequential rank placement (the default) works correctly for both single-job and multi-job scenarios. Scatter caused 10x slowdowns when running two simultaneous analyses because ranks from both jobs were interleaved across all 4 NUMA nodes (h3-standard-88 = 2 sockets x 44 cores, SNC2 per socket). For multi-job runs, pin each job to its own socket with I_MPI_PIN_PROCESSOR_LIST=0-43 / 44-87.
Known issues¶
- GPG key: Must
rpm --importIntel key beforednf installor signature check fails - Python 3.6: Too old for Conan — script auto-detects Python 3.11
- -j88: Too many parallel compiles — capped to
-j32 - OpenSeesPy: cmake needs Python headers even though we don't build it — script provides
Python_INCLUDE_DIRS