Skip to main content

Benchmarking

The standalone benchmark runner (benchmark_runner.py) measures raw GPU throughput and streaming I/O overhead.

Usage

python benchmark_runner.py [OPTIONS]

Arguments

ArgumentTypeDefaultDescription
--configstringconfig.jsonPath to base config
--steps-no-saveint10000Iterations for no-save benchmark
--steps-chunkedint1000Iterations for chunked benchmark
--chunk-sizeint10Chunk size for chunked benchmark
--grid-sizeint2048Phase screen grid size (NxN)
--num-layersint20Number of atmospheric layers

Two Benchmarks

Benchmark 1: No-Save (Raw GPU Throughput)

Runs simulate_turb_no_save() -- pure GPU computation without any frame storage. Measures the maximum achievable throughput.

python benchmark_runner.py --steps-no-save 5000

Benchmark 2: Chunked (Streaming with I/O)

Runs simulate_turb_streaming() -- GPU computation with double-buffered disk writes. Measures real-world throughput including I/O.

python benchmark_runner.py --steps-chunked 500 --chunk-size 50

Examples

# Default benchmarks
python benchmark_runner.py

# Custom grid size and layers
python benchmark_runner.py --grid-size 1024 --num-layers 10

# Quick benchmark with fewer steps
python benchmark_runner.py --steps-no-save 1000 --steps-chunked 200

Output

Results are saved to benchmark_results/<timestamp>/:

benchmark_results/
20240115_143022/
no_save_results.json
chunked_results.json
gpu_memory_no_save.png
gpu_memory_chunked.png
timing_comparison.png
peak_memory_comparison.png
benchmark_summary.txt

Summary Report

The benchmark_summary.txt contains:

  • Timing: Total time, per-step time, steps/second for each benchmark
  • Memory: Peak GPU memory usage
  • I/O Overhead: Percentage slowdown from streaming vs. raw GPU
  • Conclusions: Which mode is recommended for the tested configuration

Interpreting Results

MetricWhat It Tells You
No-save steps/secMaximum GPU throughput
Chunked steps/secReal-world throughput with disk I/O
I/O overhead %How much streaming slows down computation
Peak memory (no-save)Memory needed for pure GPU mode
Peak memory (chunked)Memory needed for streaming mode

A low I/O overhead (under 5%) indicates that double-buffering is effectively hiding disk latency. Higher overhead suggests the disk is a bottleneck -- consider using an SSD or increasing chunk size.