Skip to main content

cc Schema

  • Primary Key: comp_id
  • Tables: 12

RDKit Integration

The cc pipeline automatically sets up RDKit PostgreSQL Cartridge for chemical structure searching. This includes:

  1. RDKit extension (CREATE EXTENSION IF NOT EXISTS rdkit)
  2. mol column on cc.brief_summary -- stores RDKit molecule objects generated from canonical SMILES
  3. Chemical search SQL functions:
    • similar_compounds(smiles, threshold) -- Tanimoto similarity search
    • substructure_search(smarts) -- substructure matching
    • Additional helper functions for chemical queries

To set up RDKit on an existing database without re-running the full cc pipeline:

pixi run pmb setup-rdkit
note

RDKit requires the postgresql-rdkit extension to be installed on the PostgreSQL server. The Docker-based setup includes this by default.

Example Queries

-- Substructure search: find compounds containing a benzene ring
SELECT comp_id, name FROM cc.brief_summary WHERE mol @> 'c1ccccc1'::mol;

-- Similarity search: find compounds similar to aspirin (Tanimoto > 0.5)
SELECT * FROM similar_compounds('CC(=O)Oc1ccccc1C(O)=O', 0.5);

See the SQL Examples page for more chemical search examples.

brief_summary

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
docidbigintSerial counter (unique integer) to represent the row id.
pdbx_initial_datedateDate that the entry was defined in the dictionary
release_datedateDate at which the entry was officially released
pdbx_modified_datedateModification date of an entry
update_datetimestamp without time zoneEntry update date (within the RDB).
nametextName of an entry.
formulatextChemical formula of an entry.
pdbx_synonymstext[]Known synonyms of an entry.
identifiertextIdentifier of an entry.
smilestext[]SMILES representations of an entry.
inchitext[]InChI representations of an entry.
canonical_smilestext
keywordstext[]Array of keywords.

chem_comp

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
formulatextThe formula for the chemical component. Formulae are written according to the following rules: (1) Only recognized element symbols may be used. (2) Each element symbol is followed by a 'count' number. A count of '1' may be omitted. (3) A space or parenthesis must separate each cluster of (element symbol + count), but in general parentheses are not used. (4) The order of elements depends on whether carbon is present or not. If carbon is present, the order should be: C, then H, then the other elements in alphabetical order of their symbol. If carbon is not present, the elements are listed purely in alphabetic order of their symbol. This is the 'Hill' system used by Chemical Abstracts.
formula_weightdouble precisionFormula mass in daltons of the chemical component.
idtextThe value of _chem_comp.id must uniquely identify each item in the CHEM_COMP list. For protein polymer entities, this is the three-letter code for the amino acid. For nucleic acid polymer entities, this is the one-letter code for the base.
mon_nstd_parent_comp_idtextThe identifier for the parent component of the nonstandard component. May be be a comma separated list if this component is derived from multiple components. Items in this indirectly point to _chem_comp.id in the CHEM_COMP category.
nametextThe full name of the component.
one_letter_codetextFor standard polymer components, the one-letter code for the component. For non-standard polymer components, the one-letter code for parent component if this exists; otherwise, the one-letter code should be given as 'X'. Components that derived from multiple parents components are described by a sequence of one-letter-codes.
three_letter_codetextFor standard polymer components, the common three-letter code for the component. Non-standard polymer components and non-polymer components are also assigned three-letter-codes. For ambiguous polymer components three-letter code should be given as 'UNK'. Ambiguous ions are assigned the code 'UNX'. Ambiguous non-polymer components are assigned the code 'UNL'.
typetextFor standard polymer components, the type of the monomer. Note that monomers that will form polymers are of three types: linking monomers, monomers with some type of N-terminal (or 5') cap and monomers with some type of C-terminal (or 3') cap.
pdbx_synonymstextSynonym list for the component.
pdbx_typetextA preliminary classification used by PDB.
pdbx_ambiguous_flagtextA preliminary classification used by PDB to indicate that the chemistry of this component while described as clearly as possible is still ambiguous. Software tools may not be able to process this component definition.
pdbx_replaced_bytextIdentifies the _chem_comp.id of the component that has replaced this component.
pdbx_replacestextIdentifies the _chem_comp.id's of the components which have been replaced by this component. Multiple id codes should be separated by commas.
pdbx_formal_chargeintegerThe net integer charge assigned to this component. This is the formal charge assignment normally found in chemical diagrams.
pdbx_subcomponent_listtextThe list of subcomponents contained in this component.
pdbx_model_coordinates_detailstextThis data item provides additional details about the model coordinates in the component definition.
pdbx_model_coordinates_db_codetextThis data item identifies the PDB database code from which the heavy atom model coordinates were obtained.
pdbx_ideal_coordinates_detailstextThis data item identifies the source of the ideal coordinates in the component definition.
pdbx_ideal_coordinates_missing_flagtextThis data item identifies if ideal coordinates are missing in this definition.
pdbx_model_coordinates_missing_flagtextThis data item identifies if model coordinates are missing in this definition.
pdbx_initial_datedateDate component was added to database.
pdbx_modified_datedateDate component was last modified.
pdbx_release_statustextThis data item holds the current release status for the component.
pdbx_processing_sitetextThis data item identifies the deposition site that processed this chemical component defintion.
pdbx_pcmtextA flag to indicate if the CCD can be used to represent a protein modification.

chem_comp_atom

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
alt_atom_idtextAn alternative identifier for the atom. This data item would be used in cases where alternative nomenclatures exist for labelling atoms in a group.
atom_idtextThe value of _chem_comp_atom.atom_id must uniquely identify each atom in each monomer in the CHEM_COMP_ATOM list. The atom identifiers need not be unique over all atoms in the data block; they need only be unique for each atom in a component. Note that this item need not be a number; it can be any unique identifier.
chargeintegerThe net integer charge assigned to this atom. This is the formal charge assignment normally found in chemical diagrams.
model_Cartn_xdouble precisionThe x component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
model_Cartn_ydouble precisionThe y component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
model_Cartn_zdouble precisionThe z component of the coordinates for this atom in this component specified as orthogonal angstroms. The choice of reference axis frame for the coordinates is arbitrary. The set of coordinates input for the entity here is intended to correspond to the atomic model used to generate restraints for structure refinement, not to atom sites in the ATOM_SITE list.
type_symboltextThe code used to identify the atom species representing this atom type. Normally this code is the element symbol.
pdbx_alignintegerAtom name alignment offset in PDB atom field.
pdbx_ordinalintegerOrdinal index for the component atom list.
pdbx_component_atom_idtextThe atom identifier in the subcomponent where a larger component has been divided subcomponents.
pdbx_component_comp_idtextThe component identifier for the subcomponent where a larger component has been divided subcomponents.
pdbx_model_Cartn_x_idealdouble precisionAn alternative x component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_model_Cartn_y_idealdouble precisionAn alternative y component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_model_Cartn_z_idealdouble precisionAn alternative z component of the coordinates for this atom in this component specified as orthogonal angstroms.
pdbx_stereo_configtextThe chiral configuration of the atom that is a chiral center.
pdbx_aromatic_flagtextA flag indicating an aromatic atom.
pdbx_leaving_atom_flagtextA flag indicating a leaving atom.
pdbx_residue_numberingintegerPreferred residue numbering in the BIRD definition.
pdbx_polymer_typetextIs the atom in a polymer or non-polymer subcomponent in the BIRD definition.
pdbx_component_idintegerA reference to _pdbx_reference_entity_list.component_id
pdbx_backbone_atom_flagtextA flag indicating the backbone atoms in polypeptide units.
pdbx_n_terminal_atom_flagtextA flag indicating the N-terminal group atoms in polypeptide units.
pdbx_c_terminal_atom_flagtextA flag indicating the C-terminal group atoms in polypeptide units.

chem_comp_bond

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
atom_id_1textThe ID of the first of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.
atom_id_2textThe ID of the second of the two atoms that define the bond. This data item is a pointer to _chem_comp_atom.atom_id in the CHEM_COMP_ATOM category.
value_ordertextThe value that should be taken as the target for the chemical bond associated with the specified atoms, expressed as a bond order.
pdbx_ordinalintegerOrdinal index for the component bond list.
pdbx_stereo_configtextStereochemical configuration across a double bond.
pdbx_aromatic_flagtextA flag indicating an aromatic bond.
ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
related_comp_idtextThe related chemical component for which this chemical component is based.
ordinalintegerAn ordinal index for this category
atom_idtextThe atom identifier/name for the atom mapping
related_atom_idtextThe atom identifier/name for the atom mapping in the related chemical component
related_typetextDescribes the type of relationship

pdbx_chem_comp_audit

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
datedateThe date associated with this audit record.
processing_sitetextAn identifier for the wwPDB site creating or modifying the component.
action_typetextThe action associated with this audit record.

pdbx_chem_comp_descriptor

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
descriptortextThis data item contains the descriptor value for this component.
typetextThis data item contains the descriptor type.
programtextThis data item contains the name of the program or library used to compute the descriptor.
program_versiontextThis data item contains the version of the program or library used to compute the descriptor.

pdbx_chem_comp_feature

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
typetextThe component feature type.
valuetextThe component feature value.
sourcetextThe information source for the component feature.

pdbx_chem_comp_identifier

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
identifiertextThis data item contains the identifier value for this component.
typetextThis data item contains the identifier type.
programtextThis data item contains the name of the program or library used to compute the identifier.
program_versiontextThis data item contains the version of the program or library used to compute the identifier.

pdbx_chem_comp_pcm

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
pcm_idintegerAn ordinal index for this category.
modified_residue_idtextChemical component identifier for the amino acid residue that is being modified.
typetextThe type of protein modification.
categorytextThe category of protein modification.
positiontextThe position of the modification on the amino acid.
polypeptide_positiontextThe position of the modification on the polypeptide.
comp_id_linking_atomtextThe atom on the modification group that covalently links the modification to the residue that is being modified. This is only added when the protein modification is linked and so the amino acid group and the modification group are described by separate CCDs.
modified_residue_id_linking_atomtextThe atom on the polypeptide residue group that covalently links the modification to the residue that is being modified. This is only added when the protein modification is linked and so the amino acid group and the modification group are described by separate CCDs.
uniprot_specific_ptm_accessiontextThe UniProt PTM accession code that is an exact match for the protein modification.
uniprot_generic_ptm_accessiontextThe UniProt PTM accession code that describes the group of PTMs of which this protein modification is a member.
ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
related_comp_idtextThe related chemical component for which this chemical component is based.
relationship_typetextDescribes the type of relationship

pdbx_chem_comp_synonyms

ColumnTypeDescription
comp_idtextChemical Component ID of an entry. All tables/categories refer back to this ID in the brief_summary table.
ordinalintegerAn ordinal index for this category
nametextThe synonym of this particular chemical component.
provenancetextThe provenance of this synonym.
typetextThe type of this synonym.