Skip to main content

Single-File SASA Benchmarks

Large-scale benchmark results for zsasa using Shrake-Rupley algorithm.

  • Dataset: 2,013 structures (stratified sampling from PDB + AlphaFold)
  • Tools: zsasa (f64/f32, standard/bitmask), FreeSASA (C), RustSASA (Rust)

Note:

  • Absolute execution times are environment-dependent. Relative speedup ratios are the meaningful metric for comparison.
  • All implementations use identical parameters: n_points=100, probe_radius=1.4Å
  • Benchmarks measure SASA calculation time only (file I/O excluded). FreeSASA and RustSASA have unstable I/O timing, so SASA-only measurement is used for fair comparison. See Methodology for details.
  • SASA accuracy validated: mean error <0.001% vs FreeSASA reference. See validation for details.

TL;DR

zsasa's key advantage: Large structures + Multi-threading

Speedup at threads=10 (50k+ atoms, n=794)Thread Scaling (50k+ atoms)
SpeedupThread Scaling

Key Results (50k+ atoms, n=794, threads=10):

  • 1.89x median speedup vs FreeSASA
  • 1.84x median speedup vs RustSASA
  • Bitmask mode: ~15x less memory with competitive speed on large structures

Test Environment

ItemValue
MachineMacBook Pro
ChipApple M4
Cores10 (4 performance + 6 efficiency)
Memory32 GB
OSmacOS

Overall Statistics (threads=10)

ToolStructuresMedian (ms)Mean (ms)P95 (ms)
zsasa_f642,0138.5384.82243.25
zsasa_f322,0138.7583.68243.75
zsasa_f64_bitmask2,01319.4977.38198.77
zsasa_f32_bitmask2,01319.2876.34196.30
FreeSASA2,01215.47482.69490.89
RustSASA2,01316.29160.29449.29

Note: zsasa_f64 has lower median but bitmask variants have lower mean/P95. This is because bitmask avoids the worst-case memory allocation overhead on very large structures, resulting in more stable performance at the high end.

FreeSASA shows 2,012 instead of 2,013 because it was OOM-killed (exit code 137) on the largest structure in the dataset — 9mjn (~3.8M atoms, 316MB PDB). All zsasa variants completed this structure successfully. See #247 for details.


Speedup by Structure Size (threads=10)

Speedup by Size and Threads

Size BinCountvs FreeSASAvs RustSASA
0-5001501.41x1.60x
500-1k1501.60x1.74x
1k-2k1501.58x1.81x
2k-3k1501.63x1.87x
3k-5k1501.71x1.87x
5k-10k1501.81x1.89x
10k-20k1501.89x1.89x
20k-50k1491.74x1.84x
50k-75k1501.85x1.84x
75k-100k1521.98x1.85x
100k-150k1481.90x1.85x
150k-200k1501.90x1.85x
200k-500k1501.83x1.84x
500k+631.89x1.83x

Observations:

  • vs FreeSASA: speedup increases with structure size, peaking at 1.98x (75k-100k)
  • vs RustSASA: consistently 1.8-1.9x across medium-to-large structures
  • Small structures (0-500): overhead dominant, but still 1.4-1.6x faster

Single-Thread Performance (threads=1)

Single-threaded comparison (excluding parallelization effects):

Size BinCountvs FreeSASAvs RustSASA
0-5001501.09x0.89x
500-1k1501.25x0.93x
1k-2k1501.27x0.95x
2k-3k1501.28x0.95x
3k-5k1501.30x0.96x
5k-10k1501.37x0.99x
10k-20k1501.38x1.00x
20k-50k1491.40x1.02x
50k-75k1501.43x1.02x
75k-100k1521.46x1.01x
100k-150k1481.45x1.01x
150k-200k1501.45x1.02x
200k-500k1501.44x1.02x
500k+631.46x1.01x

Observations:

  • vs FreeSASA: 1.4-1.5x on large structures (SIMD optimization)
  • vs RustSASA: nearly equal at t=1 (both use SIMD), zsasa slightly ahead on large structures
  • The gap between t=1 and t=10 speedups shows zsasa's parallel efficiency advantage

Thread Scaling

Median Execution Time by Thread Count

Thread Scaling

Threadszsasa_f64 (ms)FreeSASA (ms)RustSASA (ms)zsasa_f64_bitmask (ms)
123.1130.8422.6320.98
410.3216.7316.8219.80
89.0015.9516.0719.60
108.5315.4716.2919.49

Speedup from threads=1 to threads=10:

  • zsasa_f64: 23.11 → 8.53 = 2.71x
  • FreeSASA: 30.84 → 15.47 = 1.99x
  • RustSASA: 22.63 → 16.29 = 1.39x

Key Insight:

  • zsasa_f64 scales best with thread count, maintaining large gains up to t=10
  • RustSASA barely improves beyond t=1 (parallel efficiency issue)
  • Bitmask variants have low per-structure overhead but limited thread scaling (work is already cheap per atom)

Bitmask Variants

Bitmask mode uses LUT bitmask neighbor lists instead of full float arrays. This dramatically reduces memory usage at the cost of slightly higher per-structure computation time and minor accuracy differences.

When to Use

ModeBest ForTrade-off
Standard (f64)Maximum speed, highest accuracyHigher memory usage
Bitmask (f64)Large structures, memory-constrained~15x less memory for 500k+ atoms; minor accuracy difference

Accuracy note: Bitmask mode uses a fixed cutoff for neighbor detection, which may produce slightly different SASA values compared to standard mode. The difference is negligible for most use cases. See SASA Validation for detailed accuracy analysis.

Overall Performance (threads=10)

ToolMedian (ms)Mean (ms)P95 (ms)
zsasa_f648.5384.82243.25
zsasa_f64_bitmask19.4977.38198.77
zsasa_f328.7583.68243.75
zsasa_f32_bitmask19.2876.34196.30

Large Structure Performance (100k+ atoms, threads=10)

On large structures, bitmask variants close the speed gap and become competitive:

ToolnMedian (ms)Mean (ms)P95 (ms)
zsasa_f64512147.25285.981,051.50
zsasa_f64_bitmask512123.95230.15817.08
zsasa_f32512145.25282.091,038.91
zsasa_f32_bitmask512121.90227.25806.76

Key finding: For 100k+ atom structures, bitmask is actually faster than standard mode (16% lower median, 20% lower mean) while using far less memory. The bitmask's fixed-size neighbor storage avoids the allocation overhead that dominates large structures.


Memory Comparison

Peak memory (RSS) measured by hyperfine (threads=1):

Memory vs Structure SizeMemory Scatter
By SizeScatter

Observations:

  • FreeSASA uses the most memory across all sizes (~15x more than zsasa for 500k+ atoms)
  • RustSASA uses ~5x more than zsasa for large structures
  • zsasa standard and bitmask have similar memory for small structures
  • For 500k+ atoms, bitmask uses significantly less memory than standard mode

Large Structure Analysis

Summary (50k+ atoms, n=794, threads=10)

Speedup at threads=10Thread Scaling
SpeedupThread Scaling
ComparisonMedian Speedup
vs FreeSASA1.89x
vs RustSASA1.84x

Best Speedup Structures (50k+ atoms)

Speedup Comparison

Maximum Structure Thread Scaling

Max Structure Scaling


Execution Time Distribution

SR Scatter Plot

Observations:

  • Nearly linear on log scale → O(N) neighbor list is effective (all tools use cell list)
  • zsasa (orange) is consistently lower (faster) across all sizes
  • Gap between tools widens with increasing thread count
  • Few outliers → stable performance

Stability: Outlier Structures

One of zsasa's key advantages is consistent, predictable performance across all structures. FreeSASA and RustSASA exhibit pathological slowdowns on certain structures, where computation time spikes by orders of magnitude. zsasa shows no such behavior.

FreeSASA Pathological Cases

FreeSASA Outliers

19 structures where FreeSASA takes 3x–114x longer than zsasa (SASA-only time, threads=1). The worst case (7n9f, 212,962 atoms) shows 113.9x slowdown. The cause is unknown but appears structure-dependent.

RustSASA Pathological Cases

Example: 9gdy (509,160 atoms) — RustSASA takes ~14,000ms vs zsasa's ~50ms at threads=1:

9gdy Outlier

RustSASA is ~280x slower on this structure, and the slowdown persists across all thread counts. Including wall-clock (I/O) timing reveals even more outlier structures for both FreeSASA and RustSASA.

Key takeaway: zsasa produces stable, predictable timing across all 2,013 structures tested. No pathological cases were observed. This predictability is important for batch processing and pipeline reliability.

FreeSASA OOM on Very Large Structures

FreeSASA was OOM-killed (SIGKILL, exit code 137) when processing 9mjn (~3.8M atoms), the largest structure in the dataset. This was the only failure across all 2,013 structures. All zsasa variants (f64, f32, standard, bitmask) completed this structure without issue, demonstrating zsasa's ability to handle arbitrarily large inputs within bounded memory. (#247)


Per-Bin Sample Results

Thread scaling details on representative structures selected from each size bin.

BinAtom RangeSample Plot
0-5000 – 500View
500-1k500 – 1,000View
1k-2k1,000 – 2,000View
2k-3k2,000 – 3,000View
3k-5k3,000 – 5,000View
5k-10k5,000 – 10,000View
10k-20k10,000 – 20,000View
20k-50k20,000 – 50,000View
50k-75k50,000 – 75,000View
75k-100k75,000 – 100,000View
100k-150k100,000 – 150,000View
150k-200k150,000 – 200,000View
200k-500k200,000 – 500,000View
500k+500,000+View

Key Takeaways

Why is zsasa faster? SIMD optimization (8-wide distance calculation), multi-threading with work stealing, and spatial hashing for O(1) neighbor lookup.

  1. Consistent advantage across all sizes

    • 1.9x vs FreeSASA, 1.8x vs RustSASA on large structures (threads=10)
    • Outperforms FreeSASA at all sizes (even 0-500 atoms)
  2. Best thread scaling

    • 2.71x scaling from t=1 to t=10 (vs 1.99x FreeSASA, 1.39x RustSASA)
  3. Memory-efficient bitmask mode

    • ~15x less memory for 500k+ atom structures
    • Competitive speed with lower P95 latency
  4. Accurate results


Methodology

SASA-Only Timing

For fair comparison, we measure SASA calculation time only. File I/O is excluded because FreeSASA and RustSASA exhibit unstable I/O timing.

Total time = File I/O + SASA calculation + Output
^^^^^^^^^^^^^^^^
Only this is measured

Measurement method for each implementation:

ImplementationMethod
zsasa (all variants)Internal measurement via --timing option (stderr output)
FreeSASA CPatched binary outputs SASA calculation time to stderr
RustSASAPatched binary outputs SASA_TIME_US to stderr

Parameters

ParameterValueNotes
AlgorithmShrake-RupleySupported by all implementations
n_points100Number of test points
probe_radius1.4 ÅWater molecule radius
Warmup1Discarded before measurement
Runs3Average value used

Tool Variants

ToolPrecisionNeighbor StorageNotes
zsasa_f64f64Full arrayDefault, highest accuracy
zsasa_f32f32Full array~3% faster than f64 at t=1
zsasa_f64_bitmaskf64Bitmask~15x less memory for 500k+ atoms
zsasa_f32_bitmaskf32BitmaskLowest memory usage
FreeSASAf64C reference implementation
RustSASAf32Rust implementation

Stratified Sampling

Stratified sampling from PDB and AlphaFold structures, ~150 structures per bin (14 bins):

BinAtom RangeCount
0-5000 – 500150
500-1k500 – 1,000150
1k-2k1,000 – 2,000150
2k-3k2,000 – 3,000150
3k-5k3,000 – 5,000150
5k-10k5,000 – 10,000150
10k-20k10,000 – 20,000150
20k-50k20,000 – 50,000149
50k-75k50,000 – 75,000150
75k-100k75,000 – 100,000152
100k-150k100,000 – 150,000148
150k-200k150,000 – 200,000150
200k-500k200,000 – 500,000150
500k+500,000+63
Total2,013

Running Benchmarks

Setup

# Build Zig binary
zig build -Doptimize=ReleaseFast

# External tools setup (for comparison)
cd benchmarks/external
git clone https://github.com/N283T/freesasa-bench.git
cd freesasa-bench && ./configure --enable-threads && make && cd ..
git clone --recursive https://github.com/N283T/rustsasa-bench.git
cd rustsasa-bench && cargo build --release --features cli && cd ..

Index & Sample Generation

# Create index (first time only)
./benchmarks/scripts/build_index.py benchmarks/inputs

# Check distribution
./benchmarks/scripts/sample.py benchmarks/inputs/index.json --analyze

# Generate sample
./benchmarks/scripts/sample.py benchmarks/inputs/index.json \
--seed 42 \
-o benchmarks/dataset/sample.json

Running

# Basic usage
./benchmarks/scripts/bench.py --tool zig --algorithm sr --threads 1,4,8,10

# All tools
./benchmarks/scripts/bench.py --tool zig --algorithm sr --threads 1,4,8,10
./benchmarks/scripts/bench.py --tool freesasa --algorithm sr --threads 1,4,8,10
./benchmarks/scripts/bench.py --tool rustsasa --algorithm sr --threads 1,4,8,10

# With bitmask mode (Zig only)
./benchmarks/scripts/bench.py --tool zig --algorithm sr --bitmask --threads 1,4,8,10

# With f32 precision (Zig only)
./benchmarks/scripts/bench.py --tool zig --algorithm sr --precision f32 --threads 1,4,8,10

Analysis & Visualization

# Summary tables (SASA-only metric, default)
./benchmarks/scripts/analyze.py summary

# Summary tables (wall-clock metric)
./benchmarks/scripts/analyze.py summary --metric wall

# Generate all plots
./benchmarks/scripts/analyze.py all

# Individual plot types
./benchmarks/scripts/analyze.py scatter # Atoms vs time scatter
./benchmarks/scripts/analyze.py threads # Thread scaling
./benchmarks/scripts/analyze.py grid # Speedup grid by size/threads
./benchmarks/scripts/analyze.py samples # Per-bin sample plots
./benchmarks/scripts/analyze.py large # Large structure analysis
./benchmarks/scripts/analyze.py memory # Peak memory comparison
./benchmarks/scripts/analyze.py speedup # Best speedup structures
./benchmarks/scripts/analyze.py outliers # Outlier advantage plots

# Export to CSV
./benchmarks/scripts/analyze.py export-csv

Scripts

ScriptPurpose
build_index.pyCreate atom count index from all input files
sample.pyStratified sampling from index
bench.pyRun benchmarks (single-file mode)
analyze.pyAnalyze results and generate plots

Notes

  1. Initial runs are slow: due to file cache and warmup effects
  2. Thread count depends on CPU: optimal when matching physical core count
  3. External tools require patches: SASA-only timing requires modified binaries

Appendix: Lee-Richards (LR) Algorithm

zsasa also supports the Lee-Richards (LR) algorithm, which uses slice-based numerical integration instead of sphere test points.

Key Differences from Shrake-Rupley

PropertyShrake-Rupley (SR)Lee-Richards (LR)
MethodTest points on sphereSlice integration
SpeedFaster3-5x slower
AccuracyDepends on n_pointsDepends on n_slices
Supported byzsasa, FreeSASA, RustSASAzsasa, FreeSASA

LR benchmarks are not included in this comparison because RustSASA does not support LR, making a three-way comparison impossible. For LR-specific analysis, see the analyze_lr.py script.

# Run LR benchmark
./benchmarks/scripts/bench_lr.py --tool zig --algorithm lr --n-slices 20

# Analyze LR results
./benchmarks/scripts/analyze_lr.py summary