Benchmarking
The standalone benchmark runner (benchmark_runner.py) measures raw GPU throughput and streaming I/O overhead.
Usage
python benchmark_runner.py [OPTIONS]
Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
--config | string | config.json | Path to base config |
--steps-no-save | int | 10000 | Iterations for no-save benchmark |
--steps-chunked | int | 1000 | Iterations for chunked benchmark |
--chunk-size | int | 10 | Chunk size for chunked benchmark |
--grid-size | int | 2048 | Phase screen grid size (NxN) |
--num-layers | int | 20 | Number of atmospheric layers |
Two Benchmarks
Benchmark 1: No-Save (Raw GPU Throughput)
Runs simulate_turb_no_save() -- pure GPU computation without any frame storage. Measures the maximum achievable throughput.
python benchmark_runner.py --steps-no-save 5000
Benchmark 2: Chunked (Streaming with I/O)
Runs simulate_turb_streaming() -- GPU computation with double-buffered disk writes. Measures real-world throughput including I/O.
python benchmark_runner.py --steps-chunked 500 --chunk-size 50
Examples
# Default benchmarks
python benchmark_runner.py
# Custom grid size and layers
python benchmark_runner.py --grid-size 1024 --num-layers 10
# Quick benchmark with fewer steps
python benchmark_runner.py --steps-no-save 1000 --steps-chunked 200
Output
Results are saved to benchmark_results/<timestamp>/:
benchmark_results/
20240115_143022/
no_save_results.json
chunked_results.json
gpu_memory_no_save.png
gpu_memory_chunked.png
timing_comparison.png
peak_memory_comparison.png
benchmark_summary.txt
Summary Report
The benchmark_summary.txt contains:
- Timing: Total time, per-step time, steps/second for each benchmark
- Memory: Peak GPU memory usage
- I/O Overhead: Percentage slowdown from streaming vs. raw GPU
- Conclusions: Which mode is recommended for the tested configuration
Interpreting Results
| Metric | What It Tells You |
|---|---|
| No-save steps/sec | Maximum GPU throughput |
| Chunked steps/sec | Real-world throughput with disk I/O |
| I/O overhead % | How much streaming slows down computation |
| Peak memory (no-save) | Memory needed for pure GPU mode |
| Peak memory (chunked) | Memory needed for streaming mode |
A low I/O overhead (under 5%) indicates that double-buffering is effectively hiding disk latency. Higher overhead suggests the disk is a bottleneck -- consider using an SSD or increasing chunk size.