Skip to content

Architecture

Package structure

src/ops_hpc/
    __main__.py              # Entry point (GUI or CLI dispatch)
    config.py                # Configuration dataclass
    cli.py                   # argparse CLI dispatcher

    core/                    # GCP SDK wrappers (no UI imports)
        compute.py           # Compute Engine: VM + image CRUD
        storage.py           # Cloud Storage: bucket + blob ops
        monitoring.py        # Monitoring API + SSH fallback
        ssh.py               # gcloud compute ssh/scp wrappers + OS Login check
        vm_guard.py          # Quadruple-safety VM cleanup
        session_state.py     # Crash recovery state tracking
        download_tracker.py  # Persistent download progress

    commands/                # One module per CLI command
        setup.py             # Check auth, OS Login, bucket, IAM, quota
        build.py             # Spot VM → build script → image → delete
        run.py               # VM → analysis → GCS upload → delete
        results.py           # GCS → local download → verify → delete
        status.py            # Show image, VMs, bucket, costs
        image.py             # Image archive/restore/delete
        cleanup.py           # Force-wipe bucket, kill VMs

    gui/                     # PySide6 dashboard
        app.py               # QApplication launcher
        main_window.py       # Single-window dashboard
        workers.py           # QThread wrappers (introspects signatures)
        widgets/
            status_bar.py    # Top bar indicators
            job_queue.py     # Job queue with load/save
            log_viewer.py    # Scrollable OpenSees output
            vm_monitor.py    # Text-based VM metrics
            panic_buttons.py # Emergency controls + deadline
            settings_panel.py # Configuration panel

Design principles

Backend/Frontend separation

All logic lives in core/ and commands/. GUI and CLI are thin wrappers that call the same functions with different callbacks:

# CLI passes print
execute(config, on_log=print, on_confirm=lambda msg: input("y/n: "))

# GUI passes Qt signals
execute(config, on_log=worker.log_line.emit, on_confirm=worker.confirm_requested.emit)

Callback pattern

Commands accept optional callbacks:

  • on_log(str) — log messages
  • on_confirm(str) -> bool — confirmation dialogs
  • on_progress(str, int, int) — download progress
  • on_metrics(dict) — VM metrics updates
  • on_deadline(Deadline) — mutable deadline object

The CommandWorker uses inspect.signature to only pass callbacks the function accepts.

Thread-safe deadline

class Deadline:
    def extend(self, minutes: int)  # thread-safe, called from GUI
    def remaining_s -> float        # thread-safe, read from worker
    def expired -> bool

GUI's extend buttons call deadline.extend(30) from the main thread. The worker's watchdog thread reads deadline.expired every 2 seconds.

Data flow

Build pipeline

Config.load() → create_instance(spot, hpc-rocky-8)
  → VMGuard.__enter__()
    → wait_for_ssh()
    → scp_upload(build_opensees_full.sh)
    → run_ssh("sudo bash build_opensees_full.sh 88")
    → stop_instance()
    → create_image_from_disk()
    → delete_instance()
  → VMGuard.disarm()

Analysis pipeline

Config.load() → create_instance(from opensees-hpc image)
  → VMGuard.__enter__()
    → wait_for_ssh()
    → for each job:
        → scp_upload(*.tcl, *.cdata, *.pltbg)
        → Deadline(job.timeout_min * 60)
        → run_ssh("openseesmp N main.tcl", stream=True)
        → _start_background_upload(gsutil → GCS)
    → wait for uploads
    → delete_instance()
  → VMGuard.disarm()

Results pipeline

poll_and_download():
  → list_result_sets(GCS bucket)
  → for each new set:
      → download_result_set(GCS → local)
      → verify_download(size matching)
      → delete_result_set(GCS)
  → DownloadTracker persists state