prd Schema
- Primary Key:
prd_id - Tables: 17
brief_summary
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| docid | bigint | Serial counter (unique integer) to represent the row id. |
| name | text | |
| formula | text | |
| description | text | |
| pdbx_initial_date | date | Inclusion date to the PDB of an entry |
| pdbx_modified_date | date | Modification date of an entry |
| update_date | timestamp without time zone | Entry update date (within the RDB). |
| canonical_smiles | text | [pmb] Canonical SMILES string generated from PRDCC block via ccd2rdmol |
| chem_comp_id | text | [pmb] Chemical component ID linking to cc.brief_summary (from pdbx_reference_molecule) |
| keywords | text[] | Array of keywords. |
RDKit Integration
The prd pipeline automatically sets up RDKit PostgreSQL Cartridge for chemical structure searching, similar to the cc schema. This includes:
- RDKit extension (
CREATE EXTENSION IF NOT EXISTS rdkit) molcolumn onprd.brief_summary-- stores RDKit molecule objects generated from canonical SMILES- GiST index on
molfor fast substructure and similarity searches - RDKit descriptor columns --
rdkit_mw,rdkit_logp,rdkit_tpsa,rdkit_hba,rdkit_hbd,rdkit_rotbonds,rdkit_rings,rdkit_formula
To set up RDKit on an existing database:
pixi run pmb setup-rdkit
Example Queries
-- Find PRD entries by substructure
SELECT prd_id, name FROM prd.brief_summary WHERE mol @> 'c1ccccc1'::mol;
-- PRD entries with molecular weight > 500
SELECT prd_id, name, rdkit_mw FROM prd.brief_summary
WHERE rdkit_mw > 500 ORDER BY rdkit_mw DESC;
SMILES Coverage
SMILES are generated from PRDCC files (the _chem_comp_atom / _chem_comp_bond data). Coverage depends on how the molecule is represented in wwPDB:
represent_as | Entries | SMILES in prd.brief_summary | Reason |
|---|---|---|---|
| polymer | ~649 | Yes | PRDCC file exists with full atom/bond data |
| branched | ~153 | Yes | PRDCC file exists with full atom/bond data |
| single molecule | ~373 | No | No PRDCC file -- structure is defined by a single CCD entry |
"single molecule" entries are PRDs whose structure is fully described by one CCD component (chem_comp_id in pdbx_reference_molecule). wwPDB does not generate separate PRDCC files for these because the structure already exists in CCD. Their SMILES can be found via the linked CCD entry:
-- Get SMILES for single-molecule PRDs via CCD
SELECT bs.prd_id, pm.chem_comp_id, cc.canonical_smiles
FROM prd.brief_summary bs
JOIN prd.pdbx_reference_molecule pm USING (prd_id)
JOIN cc.brief_summary cc ON pm.chem_comp_id = cc.comp_id
WHERE bs.canonical_smiles IS NULL;
The chem.compounds table combines both sources, so you can search all compounds (cc + prd) with SMILES in one place without worrying about this distinction.
chem_comp
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| formula | text | The formula for the chemical component. Formulae are written according to the following rules: (1) Only recognized element symbols may be used. (2) Each element symbol is followed by a 'count' number. A count of '1' may be omitted. (3) A space or parenthesis must separate each cluster of (element symbol + count), but in general parentheses are not used. (4) The order of elements depends on whether carbon is present or not. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts. |
| formula_weight | double precision | Formula mass in daltons of the chemical component. |
| id | text | The value of _chem_comp.id must uniquely identify each item in the CHEM_COMP list. For protein polymer entities, this is the three-letter code for the amino acid. For nucleic acid polymer entities, this is the one-letter code for the base. |
| name | text | The full name of the component. |
| type | text | For standard polymer components, the type of the monomer. Note that monomers that will form polymers are of three types: linking monomers, monomers with some type of N-terminal (or 5') cap and monomers with some type of C-terminal (or 3') cap. |
| pdbx_release_status | text | This data item holds the current release status for the component. |
chem_comp_atom
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| alt_atom_id | text | An alternative identifier for the atom. This data item would be used in cases where alternative nomenclatures exist for labelling atoms in a group. |
| atom_id | text | The value of _chem_comp_atom.atom_id must uniquely identify each atom in each monomer in the CHEM_COMP_ATOM list. The atom identifiers need not be unique over all atoms in the data block; they need only be unique for each atom in a component. Note that this item need not be a number; it can be any unique identifier. |
| charge | integer | The net integer charge assigned to this atom. This is the formal charge assignment normally found in chemical diagrams. |
| model_Cartn_x | double precision | The x component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list. |
| model_Cartn_y | double precision | The y component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list. |
| model_Cartn_z | double precision | The z component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list. |
| comp_id | text | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
| type_symbol | text | The code used to identify the atom species representing this atom type. Normally this code is the element symbol. |
| pdbx_align | integer | Atom name alignment offset in PDB atom field. |
| pdbx_ordinal | integer | Ordinal index for the component atom list. |
| pdbx_component_atom_id | text | The atom identifier in the subcomponent where a larger component has been divided subcomponents. |
| pdbx_component_comp_id | text | The component identifier for the subcomponent where a larger component has been divided subcomponents. |
| pdbx_model_Cartn_x_ideal | double precision | An alternative x component of the coordinates for this atom in this component specified as orthogonal angstroms. |
| pdbx_model_Cartn_y_ideal | double precision | An alternative y component of the coordinates for this atom in this component specified as orthogonal angstroms. |
| pdbx_model_Cartn_z_ideal | double precision | An alternative z component of the coordinates for this atom in this component specified as orthogonal angstroms. |
| pdbx_stereo_config | text | The chiral configuration of the atom that is a chiral center. |
| pdbx_aromatic_flag | text | A flag indicating an aromatic atom. |
| pdbx_leaving_atom_flag | text | A flag indicating a leaving atom. |
| pdbx_residue_numbering | integer | Preferred residue numbering in the BIRD definition. |
| pdbx_polymer_type | text | Is the atom in a polymer or non-polymer subcomponent in the BIRD definition. |
| pdbx_ref_id | text | A reference to _pdbx_reference_entity_list.ref_entity_id |
| pdbx_component_id | integer | A reference to _pdbx_reference_entity_list.component_id |
| pdbx_backbone_atom_flag | text | A flag indicating the backbone atoms in polypeptide units. |
| pdbx_n_terminal_atom_flag | text | A flag indicating the N-terminal group atoms in polypeptide units. |
| pdbx_c_terminal_atom_flag | text | A flag indicating the C-terminal group atoms in polypeptide units. |
chem_comp_bond
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| atom_id_1 | text | The ID of the first of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category. |
| atom_id_2 | text | The ID of the second of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category. |
| comp_id | text | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
| value_order | text | The value that should be taken as the target for the chemical bond associated with the specified atoms, expressed as a bond order. |
| pdbx_ordinal | integer | Ordinal index for the component bond list. |
| pdbx_stereo_config | text | Stereochemical configuration across a double bond. |
| pdbx_aromatic_flag | text | A flag indicating an aromatic bond. |
pdbx_chem_comp_descriptor
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| comp_id | text | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
| descriptor | text | This data item contains the descriptor value for this component. |
| type | text | This data item contains the descriptor type. |
| program | text | This data item contains the name of the program or library used to compute the descriptor. |
| program_version | text | This data item contains the version of the program or library used to compute the descriptor. |
pdbx_chem_comp_identifier
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| comp_id | text | This data item is a pointer to _chem_comp.id in the CHEM_COMP category. |
| identifier | text | This data item contains the identifier value for this component. |
| type | text | This data item contains the identifier type. |
| program | text | This data item contains the name of the program or library used to compute the identifier. |
| program_version | text | This data item contains the version of the program or library used to compute the identifier. |
pdbx_prd_audit
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| date | date | The date associated with this audit record. |
| processing_site | text | An identifier for the wwPDB site creating or modifying the molecule. |
| action_type | text | The action associated with this audit record. |
pdbx_reference_entity_link
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| link_id | integer | The value of _pdbx_reference_entity_link.link_id uniquely identifies linkages between entities with a molecule. |
| ref_entity_id_1 | text | The reference entity id of the first of the two entities joined by the linkage. This data item is a pointer to _pdbx_reference_entity_list.ref_entity_id in the PDBX_REFERENCE_ENTITY_LIST category. |
| ref_entity_id_2 | text | The reference entity id of the second of the two entities joined by the linkage. This data item is a pointer to _pdbx_reference_entity_list.ref_entity_id in the PDBX_REFERENCE_ENTITY_LIST category. |
| entity_seq_num_1 | integer | For a polymer entity, the sequence number in the first of the two entities containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| entity_seq_num_2 | integer | For a polymer entity, the sequence number in the second of the two entities containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| comp_id_1 | text | The component identifier in the first of the two entities containing the linkage. For polymer entities, this data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. For non-polymer entities, this data item is a pointer to _pdbx_reference_entity_nonpoly.chem_comp_id in the PDBX_REFERENCE_ENTITY_NONPOLY category. |
| comp_id_2 | text | The component identifier in the second of the two entities containing the linkage. For polymer entities, this data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. For non-polymer entities, this data item is a pointer to _pdbx_reference_entity_nonpoly.chem_comp_id in the PDBX_REFERENCE_ENTITY_NONPOLY category. |
| atom_id_1 | text | The atom identifier/name in the first of the two entities containing the linkage. |
| atom_id_2 | text | The atom identifier/name in the second of the two entities containing the linkage. |
| value_order | text | The bond order target for the chemical linkage. |
| component_1 | integer | The entity component identifier for the first of two entities containing the linkage. |
| component_2 | integer | The entity component identifier for the second of two entities containing the linkage. |
| link_class | text | A code indicating the entity types involved in the linkage. |
pdbx_reference_entity_list
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_list.ref_entity_id is a unique identifier the a constituent entity within this reference molecule. |
| type | text | Defines the polymer characteristic of the entity. |
| details | text | Additional details about this entity. |
| component_id | integer | The component number of this entity within the molecule. |
pdbx_reference_entity_nonpoly
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_nonpoly.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category. |
| name | text | A name of the non-polymer entity. |
| chem_comp_id | text | For non-polymer entities, the identifier corresponding to the chemical definition for the molecule. |
pdbx_reference_entity_poly
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_poly.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category. |
| type | text | The type of the polymer. |
| db_code | text | The database code for this source information |
| db_name | text | The database name for this source information |
pdbx_reference_entity_poly_link
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| link_id | integer | The value of _pdbx_reference_entity_poly_link.link_id uniquely identifies a linkage within a polymer entity. |
| ref_entity_id | text | The reference entity id of the polymer entity containing the linkage. This data item is a pointer to _pdbx_reference_entity_poly.ref_entity_id in the PDBX_REFERENCE_ENTITY_POLY category. |
| component_id | integer | The entity component identifier entity containing the linkage. |
| entity_seq_num_1 | integer | For a polymer entity, the sequence number in the first of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| entity_seq_num_2 | integer | For a polymer entity, the sequence number in the second of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.num in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| comp_id_1 | text | The component identifier in the first of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| comp_id_2 | text | The component identifier in the second of the two components making the linkage. This data item is a pointer to _pdbx_reference_entity_poly_seq.mon_id in the PDBX_REFERENCE_ENTITY_POLY_SEQ category. |
| atom_id_1 | text | The atom identifier/name in the first of the two components making the linkage. |
| atom_id_2 | text | The atom identifier/name in the second of the two components making the linkage. |
| value_order | text | The bond order target for the non-standard linkage. |
pdbx_reference_entity_poly_seq
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_poly_seq.ref_entity_id is a reference to _pdbx_reference_entity_poly.ref_entity_id in PDBX_REFERENCE_ENTITY_POLY category. |
| mon_id | text | This data item is the chemical component identifier of monomer. |
| parent_mon_id | text | This data item is the chemical component identifier for the parent component corresponding to this monomer. |
| num | integer | The value of _pdbx_reference_entity_poly_seq.num must uniquely and sequentially identify a record in the PDBX_REFERENCE_ENTITY_POLY_SEQ list. This value is conforms to author numbering conventions and does not map directly to the numbering conventions used for _entity_poly_seq.num. |
| observed | text | A flag to indicate that this monomer is observed in the instance example. |
| hetero | text | A flag to indicate that sequence heterogeneity at this monomer position. |
pdbx_reference_entity_sequence
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_sequence.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category. |
| type | text | The monomer type for the sequence. |
| NRP_flag | text | A flag to indicate a non-ribosomal entity. |
pdbx_reference_entity_src_nat
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| ref_entity_id | text | The value of _pdbx_reference_entity_src_nat.ref_entity_id is a reference to _pdbx_reference_entity_list.ref_entity_id in PDBX_REFERENCE_ENTITY_LIST category. |
| ordinal | integer | The value of _pdbx_reference_entity_src_nat.ordinal distinguishes source details for this entity. |
| organism_scientific | text | The scientific name of the organism from which the entity was isolated. |
| strain | text | The strain of the organism from which the entity was isolated. |
| taxid | text | The NCBI TaxId of the organism from which the entity was isolated. |
| db_code | text | The database code for this source information |
| db_name | text | The database name for this source information |
| source | text | The data source for this information. |
pdbx_reference_entity_subcomponents
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| seq | text | The subcomponent sequence for the entity. |
| chem_comp_id | text | For entities represented as single molecules, the identifier corresponding to the chemical definition for the molecule. |
pdbx_reference_molecule
| Column | Type | Description |
|---|---|---|
| prd_id | text | PRDID of an entry. All tables/categories refer back to the PRDID in the brief_summary table. |
| formula_weight | double precision | Formula mass in daltons of the entity. |
| formula | text | The formula for the reference entity. Formulae are written according to the rules: 1. Only recognised element symbols may be used. 2. Each element symbol is followed by a 'count' number. A count of '1' may be omitted. 3. A space or parenthesis must separate each element symbol and its count, but in general parentheses are not used. 4. The order of elements depends on whether or not carbon is present. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts. |
| type | text | Defines the structural classification of the entity. |
| type_evidence_code | text | Evidence for the assignment of _pdbx_reference_molecule.type |
| class | text | Broadly defines the function of the entity. |
| class_evidence_code | text | Evidence for the assignment of _pdbx_reference_molecule.class |
| name | text | A name of the entity. |
| represent_as | text | Defines how this entity is represented in PDB data files. |
| chem_comp_id | text | For entities represented as single molecules, the identifier corresponding to the chemical definition for the molecule. |
| compound_details | text | Special details about this molecule. |
| description | text | Description of this molecule. |
| representative_PDB_id_code | text | The PDB accession code for the entry containing a representative example of this molecule. |
| release_status | text | Defines the current PDB release status for this molecule definition. |
| replaces | text | Assigns the identifier for the reference molecule which have been replaced by this reference molecule. Multiple molecule identifier codes should be separated by commas. |
| replaced_by | text | Assigns the identifier of the reference molecule that has replaced this molecule. |