Troubleshooting¶
Setup issues¶
"No GCP project set"¶
gcloud config set project <YOUR_PROJECT_ID>
"Auth FAILED"¶
gcloud auth application-default login
"H3_CPUS quota = 0"¶
Request quota increase at IAM & Admin > Quotas. Filter for H3_CPUS, request 88. Takes a few hours to a couple days.
"OS Login NOT enabled"¶
OS Login is required for reliable SSH access. Enable it at the project level:
gcloud compute project-info add-metadata --project=<PROJECT_ID> --metadata enable-oslogin=TRUE
After enabling, your SSH username will be your Google email with special characters replaced by underscores (e.g. user_example_com). Update your profile:
ops-hpc config set username <your_oslogin_username>
"Permission denied" or wrong SSH username¶
With OS Login enabled, gcloud uses your Google-identity username. Make sure your profile username matches:
ops-hpc config show # check username field
If it doesn't match your OS Login username, update it:
ops-hpc config set username <your_oslogin_username>
"VM service account lacks storage write access"¶
ops-hpc setup should fix this automatically. If it fails:
gcloud projects add-iam-policy-binding <PROJECT_ID> \
--member="serviceAccount:<NUMBER>-compute@developer.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
Build issues¶
"pd-ssd disk type cannot be used by h3-standard-88"¶
Fixed in code. H3 requires pd-balanced.
"GPG check FAILED" during Intel oneAPI install¶
Fixed in code. The build script now pre-imports the Intel GPG key with rpm --import.
Build hangs with no output¶
PuTTY SSH buffers output on Windows. The build is running but output is held. Monitor from a second SSH session:
gcloud compute ssh <VM_NAME> --zone=us-central1-a
watch -n 3 'ps aux | grep -E "cmake|make|icx|dnf" | grep -v grep && echo "---" && df -h /'
"cmake --build" crashes with -j88¶
Fixed in code. Parallel jobs capped to 32.
"CMake Error: _Python_INCLUDE_DIR NOTFOUND"¶
Fixed in code. Build script provides Python headers to cmake.
Analysis issues¶
"Permission denied" on .mpco files¶
File ownership mismatch between gcloud SSH user and the user running OpenSees. Fixed in the pipeline by using a consistent remote directory. If running manually:
sudo chmod -R 777 /tmp/ops-hpc-analysis/
Analysis times out¶
Extend the deadline in the GUI (+30m / +60m / +120m buttons), or increase timeout in the jobs file.
"gsutil: AccessDeniedException 403"¶
The VM service account needs storage.objectAdmin role. Run ops-hpc setup to fix automatically.
GUI issues¶
GUI opens then closes immediately¶
Run with error capture:
python -c "
import traceback
try:
from ops_hpc.gui.app import launch_gui
launch_gui()
except Exception as e:
traceback.print_exc()
input('Press Enter')
"
GUI stuck after panic buttons¶
Fixed in code. Panic buttons now check if VM is still alive and reset GUI state accordingly.
No deadline timer showing¶
The deadline only appears when an analysis is running and was started from the GUI (not CLI).
Cost issues¶
"How much did I spend?"¶
ops-hpc status --usage
For exact billing, check GCP Billing Console.
Forgot to delete a VM¶
ops-hpc status # check for running VMs
ops-hpc cleanup # kill all opensees VMs
Image costs too much¶
Archive when not running analyses:
ops-hpc image archive # ~$0.65/mo instead of ~$2.50/mo
ops-hpc image restore # bring back when ready (5-15 min)