Skip to content

Build Script

src/ops_hpc/build_opensees_full.sh runs on the VM as root. It installs everything needed to run OpenSees with Intel MPI on Rocky Linux 8.

What it installs

  1. Intel oneAPI — icx (C), icpx (C++), ifx (Fortran), MKL, Intel MPI
  2. MUMPS 5.5.1 — sparse direct solver, built with Intel compilers
  3. OpenSees — sequential, SP (domain decomposition), MP (multi-interpreter)
  4. System commandsopensees, openseessp, openseesmp, opensees_build

Build steps

Step What Time
[1/8] Add Intel oneAPI repo <1 min
[2/8] Install oneAPI + tools (91 packages, 12 GB) ~5 min
[2b] Kernel tuning (hugepages, shmmax) <1 min
[3/8] Source oneAPI environment <1 min
[4/8] Python venv + Conan package manager ~2 min
[5/8] Build MUMPS with Intel compilers ~2 min
[6/8] Clone OpenSees from GitHub ~1 min
[7/8] Build OpenSees (3 targets, -j32) ~10 min
[8/8] Create run scripts + verify <1 min

Total: ~25-35 minutes on h3-standard-88.

Build location

Everything goes to /opt/opensees/:

/opt/opensees/
    OpenSees/          # source + build
    mumps/             # MUMPS source + build
    buildenv/          # Python venv with Conan

The build path is hardcoded to avoid permission issues with user home directories on GCP (gcloud creates homes with restrictive permissions).

Run scripts

The build creates wrapper scripts in /opt/opensees/ symlinked to /usr/local/bin/:

  • opensees → sequential Tcl interpreter
  • openseessp → parallel domain decomposition (mpirun -np N OpenSeesSP)
  • openseesmp → parallel multi-interpreter (mpirun -np N OpenSeesMP)
  • opensees_build → rebuild all targets

Each wrapper:

  1. Sources Intel oneAPI environment
  2. Reads config from /etc/opensees/opensees.conf
  3. Sets MKL threading (1 thread per rank)
  4. Sets Intel MPI pinning (sequential, one rank per core)
  5. Runs the binary

Intel MPI tuning

The run scripts set these environment variables for optimal h3-standard-88 performance:

MKL_NUM_THREADS=1           # 1 MKL thread per MPI rank
MKL_DYNAMIC=FALSE
I_MPI_PIN=on
I_MPI_PIN_DOMAIN=core
I_MPI_FABRICS=shm           # shared memory (single node)
I_MPI_SHM_HEAP_VSIZE=4096   # 4 GB shared memory heap

Note: I_MPI_PIN_ORDER=scatter was removed. Sequential rank placement (the default) works correctly for both single-job and multi-job scenarios. Scatter caused 10x slowdowns when running two simultaneous analyses because ranks from both jobs were interleaved across all 4 NUMA nodes (h3-standard-88 = 2 sockets x 44 cores, SNC2 per socket). For multi-job runs, pin each job to its own socket with I_MPI_PIN_PROCESSOR_LIST=0-43 / 44-87.

Known issues

  • GPG key: Must rpm --import Intel key before dnf install or signature check fails
  • Python 3.6: Too old for Conan — script auto-detects Python 3.11
  • -j88: Too many parallel compiles — capped to -j32
  • OpenSeesPy: cmake needs Python headers even though we don't build it — script provides Python_INCLUDE_DIRS