Skip to main content

Batch Processing Benchmarks

Throughput benchmarks for processing complete proteome datasets using hyperfine timing.

Note: Measures total wall-clock time for batch processing a directory of PDB files. All tools use native multi-threading. Test points: 128 (required for Lahuta bitmask support).

TL;DR

DatasetStructureszsasa_bitmask (f32)vs FreeSASAvs RustSASAvs Lahuta BMRSS
E. coli4,3701.42s8.9x faster3.7x faster1.4x faster43 MB
Human23,58614.04s9.9x faster3.9x faster1.6x faster73 MB
SwissProt550,1224m 02s8.0x faster2.7x faster1.3x faster157 MB
  • zsasa_bitmask (LUT bitmask) is the fastest tool across all datasets, using 3.7–7.2x less memory than RustSASA
  • Memory stays flat (~157 MB) even at 550K structures, while RustSASA scales to 1.1 GB and Lahuta BM to 2.2 GB

Test Environment

ItemValue
MachineMacBook Pro
ChipApple M4 (10 cores: 4P + 6E)
Memory32 GB
OSmacOS 15.3.2 (Darwin 24.6.0)

SwissProt additionally tested on M2 Max (96 GB) — see SwissProt section.

Tools Compared

ToolLanguageDescription
zsasaZigf64/f32 precision, standard neighbor lists
zsasa_bitmaskZigf64/f32 precision, LUT bitmask neighbor lists
LahutaC++Standard neighbor lists
Lahuta BitmaskC++LUT bitmask neighbor lists
RustSASARustNative multi-threading
FreeSASACNo native batch mode — custom wrapper (freesasa_batch.cc)

E. coli Proteome (4,370 structures)

Dataset: AlphaFold E. coli K-12 proteome (UP000000625_83333_ECOLI_v6), PDB format.

10-Thread Comparison

Benchmark: warmup=3, runs=10, threads=10.

ToolTime (s)Std Devfiles/secvs FreeSASAvs RustSASAvs Lahuta BMRSS (MB)
zsasa_bitmask (f32)1.42±0.0133,0858.9x3.7x1.4x43
zsasa_bitmask (f64)1.42±0.0083,0728.9x3.7x1.4x45
Lahuta Bitmask2.01±0.0042,1726.3x2.6xbaseline291
zsasa (f32)4.09±0.3461,0693.1x1.3x0.5x40
zsasa (f64)4.08±0.0341,0713.1x1.3x0.5x43
RustSASA5.24±0.0358342.4xbaseline0.4x169
Lahuta6.70±0.0326521.9x0.8x0.3x315
FreeSASA12.60±0.049347baseline0.4x0.2x467

Key findings:

  • zsasa_bitmask (f32) processes 4,370 structures in 1.42s3.7x faster than RustSASA, 1.4x faster than Lahuta Bitmask
  • Memory: zsasa uses 40–45 MB vs RustSASA 169 MB (3.8x less), Lahuta BM 291 MB (6.5x less)

Thread Scaling

Benchmark: warmup=3, runs=3, threads=1,8,10.

1 thread8 threads10 threads
1t8t10t
Memory (1t)Memory (8t)Memory (10t)
1t8t10t

Human Proteome (23,586 structures)

Dataset: AlphaFold Human proteome (UP000005640_9606_HUMAN_v6), PDB format.

Benchmark: warmup=3, runs=10, threads=10.

ToolTime (s)Std Devfiles/secvs FreeSASAvs RustSASAvs Lahuta BMRSS (MB)
zsasa_bitmask (f32)14.04±0.0731,6809.9x3.9x1.6x73
zsasa_bitmask (f64)14.85±0.5791,5899.3x3.6x1.5x77
Lahuta Bitmask22.67±0.2501,0406.1x2.4xbaseline1,415
zsasa (f32)39.82±0.9975923.5x1.4x0.6x70
zsasa (f64)42.33±1.4835573.3x1.3x0.5x75
RustSASA54.16±0.3074352.6xbaseline0.4x334
Lahuta78.64±0.6743001.8x0.7x0.3x1,077
FreeSASA138.77±0.664170baseline0.4x0.2x1,913

Key findings:

  • zsasa_bitmask (f32) processes 23,586 structures in 14.04s3.9x faster than RustSASA, 1.6x faster than Lahuta Bitmask
  • Memory: zsasa uses 70–77 MB vs RustSASA 334 MB (4.5x less), Lahuta BM 1,415 MB (19x less)
Time (10t)Memory (10t)
timememory

SwissProt (550,122 structures)

Dataset: SwissProt PDB v6, PDB format. The largest benchmark at 550K structures.

M2 Max (96 GB) — Compute Bound

ItemValue
ChipApple M2 Max (12 cores)
Memory96 GB

With sufficient RAM, all file data stays in the OS page cache and performance is purely compute-bound.

Benchmark: warmup=3, runs=3, threads=10.

ToolTimeStd Devfiles/secvs FreeSASAvs RustSASAvs Lahuta BMRSS (MB)
zsasa_bitmask (f32)4m 02s±1.862,2698.0x2.7x1.3x157
zsasa_bitmask (f64)4m 07s±1.492,2297.9x2.7x1.3x162
Lahuta Bitmask5m 12s±3.321,7616.2x2.1xbaseline2,187
zsasa (f32)10m 39s±0.578613.0x1.03x0.5x154
zsasa (f64)10m 41s±1.438583.0x1.03x0.5x159
RustSASA10m 58s±1.308352.9xbaseline0.5x1,131
Lahuta16m 34s±2.515532.0x0.66x0.3x1,873
FreeSASA32m 21s±8.95283baseline0.34x0.2x2,875
Time (10t)Memory (10t)
timememory

M4 (32 GB) — I/O Bound (mmap)

When the dataset exceeds available RAM, mmap page faults become the bottleneck and performance converges across tools.

Benchmark: warmup=3, runs=3, threads=10.

ToolTimeStd Devfiles/secvs FreeSASAvs RustSASAvs Lahuta BMRSS (MB)
zsasa_bitmask (f32)11m 05s±6.488282.9x2.4x1.0x157
zsasa_bitmask (f64)11m 07s±2.148242.9x2.4x1.0x161
Lahuta Bitmask11m 08s±9.918232.8x2.4xbaseline2,152
zsasa (f32)16m 02s±11.875722.0x1.6x0.7x154
zsasa (f64)16m 11s±3.655672.0x1.6x0.7x159
Lahuta22m 11s±5.214131.4x1.2x0.5x1,820
RustSASA26m 16s±8.053491.2xbaseline0.4x1,131
FreeSASA31m 42s±7.67289baseline0.8x0.4x2,440
Time (10t)Memory (10t)
timememory

Key findings:

  • M2 Max (96 GB): zsasa_bitmask completes 550K structures in 4 minutes2.7x faster than RustSASA, 1.3x faster than Lahuta BM
  • M4 (32 GB): I/O-bound — zsasa_bitmask and Lahuta Bitmask converge at ~11 min (both 2.4x faster than RustSASA)
  • Memory: zsasa uses ~157 MB regardless of dataset size, vs RustSASA 1.1 GB (7.2x less), Lahuta BM 2.2 GB (14x less)

Summary

Performance (10 threads)

DatasetStructureszsasa_bitmask (f32)vs RustSASAvs Lahuta BMRSS
E. coli4,3701.42s3.7x1.4x43 MB
Human23,58614.04s3.9x1.6x73 MB
SwissProt (96 GB)550,1224m 02s2.7x1.3x157 MB
SwissProt (32 GB)550,12211m 05s2.4x1.0x157 MB

Memory Efficiency (10 threads)

DatasetzsasaRustSASALahuta BMzsasa ratio
E. coli43 MB169 MB291 MB3.9x less than RustSASA
Human73 MB334 MB1,415 MB4.6x less than RustSASA
SwissProt157 MB1,131 MB2,187 MB7.2x less than RustSASA

Memory advantage increases with dataset size. zsasa_bitmask has virtually no memory overhead compared to standard zsasa, while Lahuta Bitmask uses 5–14x more memory than zsasa_bitmask.

Bitmask Variants

The _bitmask variants use LUT (look-up table) bitmask neighbor lists instead of per-atom arrays:

  • zsasa_bitmask: ~2.5–3x faster than regular zsasa, minimal memory overhead (~3 MB)
  • Lahuta Bitmask: ~2.5–3x faster than regular Lahuta, but with significant memory overhead
  • zsasa_bitmask is 1.3–1.6x faster than Lahuta Bitmask across all datasets
  • Requires n_points ≥ 128

Datasets Larger Than Available RAM

zsasa uses mmap for file I/O. When the dataset fits in RAM, all file data stays in the OS page cache and performance depends purely on compute speed. When the dataset exceeds available RAM, mmap page faults must read from storage, and performance becomes storage I/O-bound regardless of compute efficiency.

In this regime, multiple worker threads fault pages in random order, so effective read throughput drops well below the SSD's sequential maximum. This is not specific to zsasa — any mmap-based tool (including Lahuta and RustSASA) hits the same bottleneck, and bitmask variants converge to similar wall-clock times (as seen in the M4 32 GB results).

zsasa's low memory footprint helps here: less RSS means more RAM is available for the OS page cache, which can reduce page fault frequency.

Methodology

Uses hyperfine for timing:

  1. Warmup runs (default 3) to eliminate cold-start effects
  2. Multiple timed runs for statistical reliability (3–10 runs)
  3. IQR outlier filtering applied when ≥5 runs
  4. Reports mean and stddev after filtering

Tool Configurations

ToolConfigurations
zsasa (Zig)f64/f32 precision, standard and bitmask variants
Lahuta (Zig)Standard and bitmask variants
FreeSASA (C)Sequential batch wrapper (sasa_batch.cpp)
RustSASA (Rust)Native multi-threading

Notes

  1. Test points: All benchmarks use 128 test points (required for Lahuta bitmask support).
  2. Input format: All tools use PDB input for fair comparison. zsasa supports JSON input which would be faster.
  3. FreeSASA: The CLI only supports single-threaded batch processing. A custom wrapper processes files sequentially.
  4. SwissProt (M4): I/O-bound due to mmap on 32 GB RAM — bitmask tools converge at ~11 min.

Running Benchmarks

# E. coli proteome (10 threads, 10 runs)
./benchmarks/scripts/bench_batch.py \
-i benchmarks/UP000000625_83333_ECOLI_v6/pdb \
-n ecoli --runs 10 --threads 10 -N 128

# Human proteome (10 threads, 10 runs)
./benchmarks/scripts/bench_batch.py \
-i benchmarks/UP000005640_9606_HUMAN_v6/pdb \
-n human --runs 10 --threads 10 -N 128

# SwissProt (large dataset, 3 runs)
./benchmarks/scripts/bench_batch.py \
-i /path/to/swissprot_pdb_v6 \
-n swissprot --runs 3 --threads 10 -N 128

Options

OptionDescriptionDefault
--input, -iInput directory (PDB files)(required)
--name, -nBenchmark name(required)
--threads, -TThread counts: 1,8,10 or 1-101,8
--runs, -rNumber of benchmark runs3
--warmup, -wNumber of warmup runs3
--n-points, -NNumber of sphere test points100
--tool, -tTool: zig, zig_bitmask, freesasa, rustsasa, lahuta, lahuta_bitmask (repeatable)all
--output, -oOutput directoryresults/batch/<N>/<name>
--dry-runShow commands without runningfalse

Analysis

./benchmarks/scripts/analyze_batch.py summary -N 128           # Summary table
./benchmarks/scripts/analyze_batch.py summary -N 128 -n ecoli # Specific benchmark
./benchmarks/scripts/analyze_batch.py plot -N 128 -n ecoli # Time charts
./benchmarks/scripts/analyze_batch.py memory -N 128 -n ecoli # Memory charts
./benchmarks/scripts/analyze_batch.py all -N 128 # Everything