Architecture¶
Package structure¶
src/ops_hpc/
__main__.py # Entry point (GUI or CLI dispatch)
config.py # Configuration dataclass
cli.py # argparse CLI dispatcher
core/ # GCP SDK wrappers (no UI imports)
compute.py # Compute Engine: VM + image CRUD
storage.py # Cloud Storage: bucket + blob ops
monitoring.py # Monitoring API + SSH fallback
ssh.py # gcloud compute ssh/scp wrappers + OS Login check
vm_guard.py # Quadruple-safety VM cleanup
session_state.py # Crash recovery state tracking
download_tracker.py # Persistent download progress
commands/ # One module per CLI command
setup.py # Check auth, OS Login, bucket, IAM, quota
build.py # Spot VM → build script → image → delete
run.py # VM → analysis → GCS upload → delete
results.py # GCS → local download → verify → delete
status.py # Show image, VMs, bucket, costs
image.py # Image archive/restore/delete
cleanup.py # Force-wipe bucket, kill VMs
gui/ # PySide6 dashboard
app.py # QApplication launcher
main_window.py # Single-window dashboard
workers.py # QThread wrappers (introspects signatures)
widgets/
status_bar.py # Top bar indicators
job_queue.py # Job queue with load/save
log_viewer.py # Scrollable OpenSees output
vm_monitor.py # Text-based VM metrics
panic_buttons.py # Emergency controls + deadline
settings_panel.py # Configuration panel
Design principles¶
Backend/Frontend separation¶
All logic lives in core/ and commands/. GUI and CLI are thin wrappers that call the same functions with different callbacks:
# CLI passes print
execute(config, on_log=print, on_confirm=lambda msg: input("y/n: "))
# GUI passes Qt signals
execute(config, on_log=worker.log_line.emit, on_confirm=worker.confirm_requested.emit)
Callback pattern¶
Commands accept optional callbacks:
on_log(str)— log messageson_confirm(str) -> bool— confirmation dialogson_progress(str, int, int)— download progresson_metrics(dict)— VM metrics updateson_deadline(Deadline)— mutable deadline object
The CommandWorker uses inspect.signature to only pass callbacks the function accepts.
Thread-safe deadline¶
class Deadline:
def extend(self, minutes: int) # thread-safe, called from GUI
def remaining_s -> float # thread-safe, read from worker
def expired -> bool
GUI's extend buttons call deadline.extend(30) from the main thread. The worker's watchdog thread reads deadline.expired every 2 seconds.
Data flow¶
Build pipeline¶
Config.load() → create_instance(spot, hpc-rocky-8)
→ VMGuard.__enter__()
→ wait_for_ssh()
→ scp_upload(build_opensees_full.sh)
→ run_ssh("sudo bash build_opensees_full.sh 88")
→ stop_instance()
→ create_image_from_disk()
→ delete_instance()
→ VMGuard.disarm()
Analysis pipeline¶
Config.load() → create_instance(from opensees-hpc image)
→ VMGuard.__enter__()
→ wait_for_ssh()
→ for each job:
→ scp_upload(*.tcl, *.cdata, *.pltbg)
→ Deadline(job.timeout_min * 60)
→ run_ssh("openseesmp N main.tcl", stream=True)
→ _start_background_upload(gsutil → GCS)
→ wait for uploads
→ delete_instance()
→ VMGuard.disarm()
Results pipeline¶
poll_and_download():
→ list_result_sets(GCS bucket)
→ for each new set:
→ download_result_set(GCS → local)
→ verify_download(size matching)
→ delete_result_set(GCS)
→ DownloadTracker persists state