Skip to main content

prd Schema

  • Primary Key: prd_id
  • Tables: 17

brief_summary

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
docidbigintSerial counter (unique integer) to represent the row id.
nametext
formulatext
descriptiontext
pdbx_initial_datedateInclusion date to the PDB of an entry
pdbx_modified_datedateModification date of an entry
update_datetimestamp without time zoneEntry update date (within the RDB).
canonical_smilestext[pmb] Canonical SMILES string generated from PRDCC block via ccd2rdmol
chem_comp_idtext[pmb] Chemical component ID linking to cc.brief_summary (from pdbx_reference_molecule)
keywordstext[]Array of keywords.

RDKit Integration

The prd pipeline automatically sets up RDKit PostgreSQL Cartridge for chemical structure searching, similar to the cc schema. This includes:

  1. RDKit extension (CREATE EXTENSION IF NOT EXISTS rdkit)
  2. mol column on prd.brief_summary -- stores RDKit molecule objects generated from canonical SMILES
  3. GiST index on mol for fast substructure and similarity searches
  4. RDKit descriptor columns -- rdkit_mw, rdkit_logp, rdkit_tpsa, rdkit_hba, rdkit_hbd, rdkit_rotbonds, rdkit_rings, rdkit_formula

To set up RDKit on an existing database:

pixi run pmb setup-rdkit

Example Queries

-- Find PRD entries by substructure
SELECT prd_id, name FROM prd.brief_summary WHERE mol @> 'c1ccccc1'::mol;

-- PRD entries with molecular weight > 500
SELECT prd_id, name, rdkit_mw FROM prd.brief_summary
WHERE rdkit_mw > 500 ORDER BY rdkit_mw DESC;

SMILES Coverage

SMILES are generated from PRDCC files (the _chem_comp_atom / _chem_comp_bond data). Coverage depends on how the molecule is represented in wwPDB:

represent_asEntriesSMILES in prd.brief_summaryReason
polymer~649YesPRDCC file exists with full atom/bond data
branched~153YesPRDCC file exists with full atom/bond data
single molecule~373NoNo PRDCC file -- structure is defined by a single CCD entry

"single molecule" entries are PRDs whose structure is fully described by one CCD component (chem_comp_id in pdbx_reference_molecule). wwPDB does not generate separate PRDCC files for these because the structure already exists in CCD. Their SMILES can be found via the linked CCD entry:

-- Get SMILES for single-molecule PRDs via CCD
SELECT bs.prd_id, pm.chem_comp_id, cc.canonical_smiles
FROM prd.brief_summary bs
JOIN prd.pdbx_reference_molecule pm USING (prd_id)
JOIN cc.brief_summary cc ON pm.chem_comp_id = cc.comp_id
WHERE bs.canonical_smiles IS NULL;
tip

The chem.compounds table combines both sources, so you can search all compounds (cc + prd) with SMILES in one place without worrying about this distinction.

chem_comp

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
formulatextThe formula for the chemical component. Formulae are written according to the following rules: (1) Only recognized element symbols may be used. (2) Each element symbol is followed by a 'count' number. A count of '1' may be omitted. (3) A space or parenthesis must separate each cluster of (element symbol + count), but in general parentheses are not used. (4) The order of elements depends on whether carbon is present or not. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts.
formula_weightdouble precisionFormula mass in daltons of the chemical component.
idtextThe value of _chem_comp.id must uniquely identify each item in the CHEM_COMP list. For protein polymer entities, this is the three-letter code for the amino acid. For nucleic acid polymer entities, this is the one-letter code for the base.
nametextThe full name of the component.
typetextFor standard polymer components, the type of the monomer. Note that monomers that will form polymers are of three types: linking monomers, monomers with some type of N-terminal (or 5') cap and monomers with some type of C-terminal (or 3') cap.
pdbx_release_statustextThis data item holds the current release status for the component.

chem_comp_atom

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
alt_atom_idtextAn alternative identifier for the atom. This data item would be used in cases where alternative nomenclatures exist for labelling atoms in a group.
atom_idtextThe value of _chem_comp_atom.atom_id must uniquely identify each atom in each monomer in the CHEM_COMP_ATOM list. The atom identifiers need not be unique over all atoms in the data block; they need only be unique for each atom in a component. Note that this item need not be a number; it can be any unique identifier.
chargeintegerThe net integer charge assigned to this atom. This is the formal charge assignment normally found in chemical diagrams.
model_Cartn_xdouble precisionThe x component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
model_Cartn_ydouble precisionThe y component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
model_Cartn_zdouble precisionThe z component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
comp_idtextThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.
type_symboltextThe code used to identify the atom species representing this atom type. Normally this code is the element symbol.
pdbx_alignintegerAtom name alignment offset in PDB atom field.
pdbx_ordinalintegerOrdinal index for the component atom list.
pdbx_component_atom_idtextThe atom identifier in the subcomponent where a larger component has been divided subcomponents.
pdbx_component_comp_idtextThe component identifier for the subcomponent where a larger component has been divided subcomponents.
pdbx_model_Cartn_x_idealdouble precisionAn alternative x component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_model_Cartn_y_idealdouble precisionAn alternative y component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_model_Cartn_z_idealdouble precisionAn alternative z component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_stereo_configtextThe chiral configuration of the atom that is a chiral center.
pdbx_aromatic_flagtextA flag indicating an aromatic atom.
pdbx_leaving_atom_flagtextA flag indicating a leaving atom.
pdbx_residue_numberingintegerPreferred residue numbering in the BIRD definition.
pdbx_polymer_typetextIs the atom in a polymer or non-polymer subcomponent in the BIRD definition.
pdbx_ref_idtextA reference to _pdbx_reference_entity_list.ref_entity_id
pdbx_component_idintegerA reference to _pdbx_reference_entity_list.component_id
pdbx_backbone_atom_flagtextA flag indicating the backbone atoms in polypeptide units.
pdbx_n_terminal_atom_flagtextA flag indicating the N-terminal group atoms in polypeptide units.
pdbx_c_terminal_atom_flagtextA flag indicating the C-terminal group atoms in polypeptide units.

chem_comp_bond

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
atom_id_1textThe ID of the first of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.
atom_id_2textThe ID of the second of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.
comp_idtextThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.
value_ordertextThe value that should be taken as the target for the chemical bond associated with the specified atoms, expressed as a bond order.
pdbx_ordinalintegerOrdinal index for the component bond list.
pdbx_stereo_configtextStereochemical configuration across a double bond.
pdbx_aromatic_flagtextA flag indicating an aromatic bond.

pdbx_chem_comp_descriptor

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
comp_idtextThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.
descriptortextThis data item contains the descriptor value for this component.
typetextThis data item contains the descriptor type.
programtextThis data item contains the name of the program or library used to compute the descriptor.
program_versiontextThis data item contains the version of the program or library used to compute the descriptor.

pdbx_chem_comp_identifier

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
comp_idtextThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.
identifiertextThis data item contains the identifier value for this component.
typetextThis data item contains the identifier type.
programtextThis data item contains the name of the program or library used to compute the identifier.
program_versiontextThis data item contains the version of the program or library used to compute the identifier.

pdbx_prd_audit

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
datedateThe date associated with this audit record.
processing_sitetextAn identifier for the wwPDB site creating or modifying the molecule.
action_typetextThe action associated with this audit record.
ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
link_idintegerThe value of _pdbx_reference_entity_link.link_id uniquely identifies linkages between entities with a molecule.
ref_entity_id_1textThe reference entity id of the first of the two entities joined by the linkage. This data item is a pointer to _pdbx_reference_entity_list.ref_entity_id in the PDBX_REFERENCE_ENTITY_LIST category.
ref_entity_id_2textThe reference entity id of the second of the two entities joined by the linkage. This data item is a pointer to _pdbx_reference_entity_list.ref_entity_id in the PDBX_REFERENCE_ENTITY_LIST category.
entity_seq_num_1integerFor a polymer entity, the sequence number in the first of the two entities containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
entity_seq_num_2integerFor a polymer entity, the sequence number in the second of the two entities containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
comp_id_1textThe component identifier in the first of the two entities containing the linkage. For polymer entities, this data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. For non-polymer entities, this data item is a pointer to _pdbx_reference_entity_nonpoly.chem_comp_id in the PDBX_REFERENCE_ENTITY_NONPOLY category.
comp_id_2textThe component identifier in the second of the two entities containing the linkage. For polymer entities, this data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. For non-polymer entities, this data item is a pointer to _pdbx_reference_entity_nonpoly.chem_comp_id in the PDBX_REFERENCE_ENTITY_NONPOLY category.
atom_id_1textThe atom identifier/name in the first of the two entities containing the linkage.
atom_id_2textThe atom identifier/name in the second of the two entities containing the linkage.
value_ordertextThe bond order target for the chemical linkage.
component_1integerThe entity component identifier for the first of two entities containing the linkage.
component_2integerThe entity component identifier for the second of two entities containing the linkage.
link_classtextA code indicating the entity types involved in the linkage.

pdbx_reference_entity_list

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_list.ref_entity_id is a unique identifier the a constituent entity within this reference molecule.
typetextDefines the polymer characteristic of the entity.
detailstextAdditional details about this entity.
component_idintegerThe component number of this entity within the molecule.

pdbx_reference_entity_nonpoly

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_nonpoly.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category.
nametextA name of the non-polymer entity.
chem_comp_idtextFor non-polymer entities, the identifier corresponding to the chemical definition for the molecule.

pdbx_reference_entity_poly

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_poly.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category.
typetextThe type of the polymer.
db_codetextThe database code for this source information
db_nametextThe database name for this source information
ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
link_idintegerThe value of _pdbx_reference_entity_poly_link.link_id uniquely identifies a linkage within a polymer entity.
ref_entity_idtextThe reference entity id of the polymer entity containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly.ref_entity_id in the PDBX_REFERENCE_ENTITY_POLY category.
component_idintegerThe entity component identifier entity containing the linkage.
entity_seq_num_1integerFor a polymer entity, the sequence number in the first of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
entity_seq_num_2integerFor a polymer entity, the sequence number in the second of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
comp_id_1textThe component identifier in the first of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
comp_id_2textThe component identifier in the second of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category.
atom_id_1textThe atom identifier/name in the first of the two components making the linkage.
atom_id_2textThe atom identifier/name in the second of the two components making the linkage.
value_ordertextThe bond order target for the non-standard linkage.

pdbx_reference_entity_poly_seq

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_poly_seq.ref_entity_id is a reference to _pdbx_reference_entity_poly.ref_entity_id in PDBX_REFERENCE_ENTITY_POLY category.
mon_idtextThis data item is the chemical component identifier of monomer.
parent_mon_idtextThis data item is the chemical component identifier for the parent component corresponding to this monomer.
numintegerThe value of _pdbx_reference_entity_poly_seq.num must uniquely and sequentially identify a record in the PDBX_REFERENCE_ENTITY_POLY_SEQ list. This value is conforms to author numbering conventions and does not map directly to the numbering conventions used for _entity_poly_seq.num.
observedtextA flag to indicate that this monomer is observed in the instance example.
heterotextA flag to indicate that sequence heterogeneity at this monomer position.

pdbx_reference_entity_sequence

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_sequence.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category.
typetextThe monomer type for the sequence.
NRP_flagtextA flag to indicate a non-ribosomal entity.

pdbx_reference_entity_src_nat

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
ref_entity_idtextThe value of _pdbx_reference_entity_src_nat.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category.
ordinalintegerThe value of _pdbx_reference_entity_src_nat.ordinal distinguishes source details for this entity.
organism_scientifictextThe scientific name of the organism from which the entity was isolated.
straintextThe strain of the organism from which the entity was isolated.
taxidtextThe NCBI TaxId of the organism from which the entity was isolated.
db_codetextThe database code for this source information
db_nametextThe database name for this source information
sourcetextThe data source for this information.

pdbx_reference_entity_subcomponents

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
seqtextThe subcomponent sequence for the entity.
chem_comp_idtextFor entities represented as single molecules, the identifier corresponding to the chemical definition for the molecule.

pdbx_reference_molecule

ColumnTypeDescription
prd_idtextPRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table.
formula_weightdouble precisionFormula mass in daltons of the entity.
formulatextThe formula for the reference entity. Formulae are written according to the rules: 1. Only recognised element symbols may be used. 2. Each element symbol is followed by a 'count' number. A count of '1' may be omitted. 3. A space or parenthesis must separate each element symbol and its count, but in general parentheses are not used. 4. The order of elements depends on whether or not carbon is present. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts.
typetextDefines the structural classification of the entity.
type_evidence_codetextEvidence for the assignment of _pdbx_reference_molecule.type
classtextBroadly defines the function of the entity.
class_evidence_codetextEvidence for the assignment of _pdbx_reference_molecule.class
nametextA name of the entity.
represent_astextDefines how this entity is represented in PDB data files.
chem_comp_idtextFor entities represented as single molecules, the identifier corresponding to the chemical definition for the molecule.
compound_detailstextSpecial details about this molecule.
descriptiontextDescription of this molecule.
representative_PDB_id_codetextThe PDB accession code for the entry containing a representative example of this molecule.
release_statustextDefines the current PDB release status for this molecule definition.
replacestextAssigns the identifier for the reference molecule which have been replaced by this reference molecule. Multiple molecule identifier codes should be separated by commas.
replaced_bytextAssigns the identifier of the reference molecule that has replaced this molecule.