Skip to main content

Classifiers

Classifiers assign van der Waals radii and polarity classes to atoms based on residue and atom names. The choice of classifier directly affects SASA values, so understanding the differences is important.

Quick Recommendation

  • CCD (default for calc/batch): ProtOr-compatible hybridization-based radii (Tsai et al. 1999), extended with CCD bond topology analysis for non-standard residues. Uses united-atom radii (implicit H). Recommended for crystal structures.
  • ProtOr: Alias for CCD. Accepted for backward compatibility with FreeSASA.
  • NACCESS (default for traj): Use for MD trajectories with explicit hydrogens, or when reproducing NACCESS-based studies.
  • OONS: Larger aliphatic carbon radii (2.00 Å). Use when reproducing or comparing with OONS-based studies.

How Classifiers Work

  1. Residue-specific match: Exact match of (residue name, atom name) pair
  2. ANY fallback: Generic atom definitions shared across residues (NACCESS and OONS only)
  3. Element estimation: Estimate radius from element symbol if no match found

Classifier Comparison

Radii Reference Table

All classifiers use united-atom radii (implicit hydrogens) — heavy atom radii already account for the volume of attached hydrogen atoms. The CCD column differentiates radii by hybridization state and implicit hydrogen count, following Tsai et al. 1999.

Explicit hydrogens and united-atom radii

CCD and ProtOr use united-atom radii where heavy atom radii already include the contribution of implicit hydrogens (e.g., C sp3 with 3 implicit H = 1.88 Å). Using --include-hydrogens with CCD/ProtOr causes double-counting and inaccurate SASA values. A warning is printed if this combination is detected.

For MD trajectories with explicit hydrogens, use zsasa traj which defaults to NACCESS. NACCESS handles explicit H atoms via element-based fallback (H = 1.10 Å).

ElementContextProtOr / CCDNACCESSOONSPolarity
Carbonsp2, 0 H — carbonyl (backbone C, ASN CG, etc.)1.611.761.55apolar / apolar / polar (OONS)
sp2, 0 H — aromatic without H (PHE CG, TRP CE2, etc.)1.611.761.75apolar (all)
sp2, 1+ H — aromatic CH (PHE CD1, TYR CE1, etc.)1.761.761.75apolar (all)
sp3, 1–3 H — aliphatic (CA, CB, CG, CD, etc.)1.881.872.00apolar (all)
Nitrogenall (amide, amino, imino, etc.)1.641.50–1.651.55polar (all)
Oxygensp2, 0 H — carbonyl (C=O)1.421.401.40polar (all)
sp3, 0–1 H — hydroxyl, ether1.461.401.40polar (all)
sp3, 2 H — water1.461.401.40polar (all)
Sulfursp3 — thiol, thioether1.771.852.00polar / apolar (NACCESS) / polar
Seleniumsp3 — MSE, SEC1.901.801.90polar / apolar (NACCESS) / polar
Phosphorussp3 — nucleic acid backbone1.801.901.80polar / apolar (NACCESS) / polar
Sulfur, Selenium, and Phosphorus polarity

NACCESS treats S, Se, and P as apolar, while ProtOr, OONS, and CCD treat them as polar. This affects polar/nonpolar SASA partitioning for CYS, MET, MSE, SEC residues and nucleic acid backbones.

Key Differences

PropertyCCD (= ProtOr)NACCESSOONS
BasisHybridization stateAtom typeAtom type
ANY fallbackNoYesYes
S, Se, P polarityPolarApolarPolar
Carbonyl C polarityApolarApolarPolar
Non-standard residuesCCD bond topologyElement fallbackElement fallback
ReferenceTsai et al. 1999 + wwPDB CCDHubbard & Thornton 1993Ooi et al. 1987
ProtOr is an alias for CCD

--classifier=protor and --classifier=ccd use the same classifier internally. ProtOr is accepted for backward compatibility with FreeSASA. There is no reason to prefer ProtOr over CCD — CCD is strictly a superset.

Usage

# CCD (default for PDB/mmCIF)
zsasa calc --classifier=ccd structure.cif output.json

# NACCESS
zsasa calc --classifier=naccess structure.cif output.json

# OONS
zsasa calc --classifier=oons structure.cif output.json

# CCD (auto-includes HETATM)
zsasa calc --classifier=ccd structure.cif output.json

# CCD with external dictionary (for non-standard residues)
zsasa calc --classifier=ccd --ccd=components.zsdc structure.cif output.json

# Custom config file
zsasa calc --config=my_radii.toml structure.cif output.json

Custom Classifier

You can define custom atom radii using a config file.

TOML Format

name = "my-classifier"

[types]
C_ALI = { radius = 1.87, class = "apolar" }
C_CAR = { radius = 1.76, class = "apolar" }
N = { radius = 1.65, class = "polar" }
O = { radius = 1.40, class = "polar" }
S = { radius = 1.85, class = "apolar" }

[[atoms]]
residue = "ANY"
atom = "CA"
type = "C_ALI"

[[atoms]]
residue = "ALA"
atom = "CB"
type = "C_ALI"

FreeSASA Format

name: my-classifier

types:
C_ALI 1.87 apolar
C_CAR 1.76 apolar
O 1.40 polar

atoms:
ANY CA C_ALI
ALA CB C_ALI

The format is auto-detected by file extension: .toml for TOML, all others for FreeSASA format.

CCD Classifier

The CCD classifier is the default classifier in zsasa (--classifier=protor is an alias). It derives van der Waals radii from the wwPDB Chemical Component Dictionary (CCD) bond topology, using the same hybridization-based radii as ProtOr (Tsai et al. 1999). Unlike NACCESS and OONS — which only cover standard amino acids and nucleotides — the CCD classifier can assign hybridization-aware radii to any chemical component: ligands, modified residues, cofactors, post-translational modifications, and more.

Why CCD?

NACCESS and OONS have a fixed set of known residues. When they encounter a non-standard residue (e.g., HEM, ATP, NAG), they fall back to generic element-based radii that ignore hybridization. This can lead to inaccurate SASA values for structures containing ligands or modified residues.

The CCD classifier solves this by analyzing bond topology (single, double, aromatic bonds and hydrogen count) to determine the hybridization state of each atom, then mapping it to the corresponding ProtOr-compatible radius. This gives you hybridization-aware radii for any component with CCD data.

How It Works

The CCD classifier uses a 3-level lookup:

  1. Hardcoded table (compile-time, O(1)): Pre-compiled ProtOr radii for 45 common residues — the 20 standard amino acids, selenoamino acids (SEC, MSE), non-standard amino acids (PYL, ASX, GLX), post-translationally modified residues (HYP, MLY, SEP, TPO), modified nucleotide (PSU), nucleotides (A, C, G, I, T, U, DA, DC, DG, DI, DT, DU), capping groups (ACE, NH2), and water (HOH).
  2. Runtime CCD analysis: For non-standard residues, bond topology from CCD data is analyzed at runtime to derive hybridization-aware radii. CCD data comes from two sources:
    • Inline CCD_chem_comp_atom / _chem_comp_bond loops embedded in mmCIF structure files
    • External CCD — a separate CCD dictionary loaded via --ccd=<path> (CIF or ZSDC format)
  3. Element fallback: If no CCD data is available, a generic van der Waals radius based on the element symbol is used.
Inline CCD vs External CCD

Inline CCD refers to the _chem_comp_atom and _chem_comp_bond data loops that many mmCIF files include alongside coordinate data. These describe the bond topology for each chemical component present in the structure.

External CCD is a separate dictionary file (the full wwPDB CCD or subsets of it) that you provide via --ccd=<path>. This works with both mmCIF and PDB input formats.

Input Format Support

Input FormatHardcoded TableInline CCDExternal CCD (--ccd=)
mmCIF✅ auto-extracted
PDB— (not available in PDB format)
  • mmCIF input: Inline CCD data (if present) is automatically parsed. No extra flags needed.
  • PDB input: PDB format does not include bond topology, so inline CCD is not available. For non-standard residues, provide an external CCD dictionary with --ccd=<path> to get the same hybridization-aware radii.
# mmCIF — inline CCD is automatically used
zsasa calc --classifier=ccd structure.cif output.json

# PDB — use external CCD for non-standard residues
zsasa calc --classifier=ccd --ccd=HEM.cif structure.pdb output.json

# PDB with full CCD dictionary (covers all ~35,000 components)
zsasa calc --classifier=ccd --ccd=components.zsdc structure.pdb output.json
PDB users

If your PDB file contains only standard amino acids and nucleotides, no external dictionary is needed — the hardcoded table covers all standard residues. For non-standard residues, provide --ccd=<path> to get hybridization-aware radii.

Auto-Including HETATM

When using --classifier=ccd, HETATM records are included automatically without needing --include-hetatm. This is because the CCD classifier is specifically designed to handle non-standard residues (ligands, modified amino acids, etc.) that are recorded as HETATM in PDB/mmCIF files.

External CCD Dictionary

You can provide CCD data for specific components or the entire wwPDB CCD:

# Single component CIF (downloadable from RCSB)
zsasa calc --classifier=ccd --ccd=HEM.cif structure.pdb output.json

# Multiple components concatenated
cat HEM.cif PO4.cif ATP.cif > my_ligands.cif
zsasa calc --classifier=ccd --ccd=my_ligands.cif structure.pdb output.json

# Gzipped CIF
zsasa calc --classifier=ccd --ccd=components.cif.gz structure.pdb output.json

# Pre-compiled binary format (faster loading)
zsasa calc --classifier=ccd --ccd=components.zsdc structure.pdb output.json

Component CIF files can be downloaded from RCSB Ligand Expo, e.g.:

  • https://files.rcsb.org/ligands/view/HEM.cif
  • https://files.rcsb.org/ligands/view/ATP.cif

compile-dict Subcommand

The compile-dict subcommand converts a CCD dictionary from CIF text to compact binary ZSDC format for faster loading:

# Download the full CCD dictionary (~35,000 components)
wget https://files.wwpdb.org/pub/pdb/data/monomers/components.cif.gz

# Compile to binary ZSDC format
zsasa compile-dict components.cif.gz -o components.zsdc

# Use with CCD classifier
zsasa calc --classifier=ccd --ccd=components.zsdc structure.pdb output.json

Supported input formats for compile-dict:

  • .cif — CIF text
  • .cif.gz — gzip-compressed CIF text

Hybridization Analysis

The CCD classifier determines van der Waals radii through the following process:

  1. Parse bond graph — Extract atoms and bonds from CCD _chem_comp_atom / _chem_comp_bond loops
  2. Analyze bonds — For each non-hydrogen atom, count single, double, and aromatic bonds
  3. Determine hybridization — Classify as sp, sp2, or sp3 based on bond pattern
  4. Map to ProtOr radius — Each (element, hybridization) pair maps to a specific ProtOr-compatible radius

For example, an aromatic CH carbon like PHE CD1 (two aromatic bonds to heavy atoms, one implicit hydrogen) is classified as sp2 with implicit H, receiving a radius of 1.76 Å. A backbone carbonyl carbon (one double bond to O, one single bond to CA, one implicit H) is also sp2 but receives 1.76 Å as well. A typical aliphatic carbon like ALA CB (one single bond to CA, three implicit hydrogens) is sp3, receiving 1.88 Å.

Supported Residues

Amino Acids

All 20 standard amino acids are supported by all classifiers. Additional residues vary by classifier:

  • All classifiers: SEC (selenocysteine), MSE (selenomethionine)
  • CCD and OONS: PYL (pyrrolysine)
  • CCD only: HYP (hydroxyproline), MLY (N-dimethyllysine), SEP (phosphoserine), TPO (phosphothreonine)

Nucleic Acids

RNA: A, C, G, I, T, U DNA: DA, DC, DG, DI, DT, DU Modified: PSU (pseudouridine, CCD only)

CCD Coverage

With an external CCD dictionary or inline CCD data from mmCIF files, the CCD classifier can handle any of the ~35,000 components in the wwPDB Chemical Component Dictionary — including ligands, cofactors, modified residues, and post-translational modifications.

Handling Unknown Atoms

When a classifier cannot find a matching (residue, atom name) pair:

  1. NACCESS/OONS check the ANY fallback entries
  2. If still unmatched, the element is extracted from the atom name and a generic van der Waals radius is assigned

The CCD classifier avoids this element fallback for non-standard residues by deriving hybridization-aware radii from CCD bond topology. To maximize coverage:

  • Use mmCIF input (inline CCD is auto-extracted)
  • Or provide --ccd=<path> for PDB input

To avoid ambiguity (e.g., "CA" = Carbon-alpha vs Calcium), include an element field (atomic numbers) in JSON input:

{
"atom_name": ["CA", "CA"],
"element": [6, 20]
}
  • Atomic number 6 = Carbon (Cα)
  • Atomic number 20 = Calcium (metal ion)

References

  • Hubbard, S. J.; Thornton, J. M. NACCESS, Computer Program. UCL, 1993.
  • Tsai, J.; Taylor, R.; Chothia, C.; Gerstein, M. The Packing Density in Proteins. J. Mol. Biol. 1999, 290(1), 253-266.
  • Ooi, T.; Oobatake, M.; Némethy, G.; Scheraga, H. A. Accessible Surface Areas. Proc. Natl. Acad. Sci. 1987, 84(10), 3086-3090.
  • Mantina, M. et al. Consistent van der Waals Radii. J. Phys. Chem. A 2009, 113(19), 5806-5812.