Benchmarking

The standalone benchmark runner (benchmark_runner.py) measures raw GPU throughput and streaming I/O overhead.

Usage

python benchmark_runner.py [OPTIONS]

Arguments

Argument	Type	Default	Description
`--config`	string	`config.json`	Path to base config
`--steps-no-save`	int	`10000`	Iterations for no-save benchmark
`--steps-chunked`	int	`1000`	Iterations for chunked benchmark
`--chunk-size`	int	`10`	Chunk size for chunked benchmark
`--grid-size`	int	`2048`	Phase screen grid size (NxN)
`--num-layers`	int	`20`	Number of atmospheric layers

Two Benchmarks

Benchmark 1: No-Save (Raw GPU Throughput)

Runs simulate_turb_no_save() -- pure GPU computation without any frame storage. Measures the maximum achievable throughput.

python benchmark_runner.py --steps-no-save 5000

Benchmark 2: Chunked (Streaming with I/O)

Runs simulate_turb_streaming() -- GPU computation with double-buffered disk writes. Measures real-world throughput including I/O.

python benchmark_runner.py --steps-chunked 500 --chunk-size 50

Examples

# Default benchmarks
python benchmark_runner.py

# Custom grid size and layers
python benchmark_runner.py --grid-size 1024 --num-layers 10

# Quick benchmark with fewer steps
python benchmark_runner.py --steps-no-save 1000 --steps-chunked 200

Output

Results are saved to benchmark_results/<timestamp>/:

benchmark_results/
  20240115_143022/
    no_save_results.json
    chunked_results.json
    gpu_memory_no_save.png
    gpu_memory_chunked.png
    timing_comparison.png
    peak_memory_comparison.png
    benchmark_summary.txt

Summary Report

The benchmark_summary.txt contains:

Timing: Total time, per-step time, steps/second for each benchmark
Memory: Peak GPU memory usage
I/O Overhead: Percentage slowdown from streaming vs. raw GPU
Conclusions: Which mode is recommended for the tested configuration

Interpreting Results

Metric	What It Tells You
No-save steps/sec	Maximum GPU throughput
Chunked steps/sec	Real-world throughput with disk I/O
I/O overhead %	How much streaming slows down computation
Peak memory (no-save)	Memory needed for pure GPU mode
Peak memory (chunked)	Memory needed for streaming mode

A low I/O overhead (under 5%) indicates that double-buffering is effectively hiding disk latency. Higher overhead suggests the disk is a bottleneck -- consider using an SSD or increasing chunk size.

Usage​

Arguments​

Two Benchmarks​

Benchmark 1: No-Save (Raw GPU Throughput)​

Benchmark 2: Chunked (Streaming with I/O)​

Examples​

Output​

Summary Report​

Interpreting Results​