chem Schema
- Primary Key:
(source, id) - Tables: 1
The chem schema provides a unified view of chemical compounds from both the cc (Chemical Component Dictionary) and prd (BIRD Reference Dictionary) schemas. This enables cross-schema chemical searches without needing to query each source separately.
Why This Table Exists
The prd.brief_summary table only has canonical_smiles for PRD entries that have their own PRDCC file (~802 of ~1175 entries). The remaining ~373 entries are "single molecule" PRDs whose structure is defined by a CCD component -- they don't have PRDCC files, so prd.brief_summary.canonical_smiles is NULL. Their SMILES exist in cc.brief_summary instead.
The chem.compounds table solves this by combining all compounds from both sources into one searchable table. You don't need to worry about which source has the SMILES -- just query chem.compounds.
See PRD SMILES Coverage for details.
How It Works
The chem.compounds table is populated by the pmb compounds command, which:
- Extracts compounds from
cc.brief_summary(~50k CCD entries) - Extracts compounds from
prd.brief_summary(~802 BIRD entries with SMILES) - Combines them into a single table with a
sourcecolumn ('cc'or'prd') to distinguish origin
The table is also automatically refreshed after pmb update when the cc or prd pipelines run.
"Single molecule" PRDs are already included as source = 'cc' entries (via their chem_comp_id), so all PRD-related compounds are searchable in this table.
RDKit Integration
Like the cc and prd schemas, the chem schema has full RDKit support:
molcolumn -- stores RDKit molecule objects generated from canonical SMILES- GiST index on
molfor fast substructure and similarity searches - RDKit descriptor columns -- molecular weight, LogP, TPSA, HBA, HBD, rotatable bonds, rings, formula
- Chemical search SQL functions:
chem.similar_compounds(smiles, threshold)-- Tanimoto similarity search across all sourceschem.substructure_search(smarts)-- substructure matching across all sources
Example Queries
-- Find all compounds (cc + prd) similar to aspirin
SELECT * FROM chem.similar_compounds('CC(=O)Oc1ccccc1C(O)=O', 0.5);
-- Substructure search across all sources
SELECT id, source, name, canonical_smiles
FROM chem.compounds
WHERE mol @> 'c1ccccc1'::mol;
-- Compare compound counts by source
SELECT source, COUNT(*) FROM chem.compounds GROUP BY source;
-- Find PRD compounds that share a CCD component
SELECT id, name, cc_comp_ids
FROM chem.compounds
WHERE source = 'prd' AND cc_comp_ids IS NOT NULL;
compounds
| Column | Type | Description |
|---|---|---|
| id | text | Compound identifier (comp_id for cc, prd_id for prd) |
| source | text | Source schema: 'cc' or 'prd' (CHECK constraint enforced) |
| canonical_smiles | text | Canonical SMILES string |
| name | text | Compound name |
| formula | text | Molecular formula |
| cc_comp_ids | text[] | Associated CCD comp_ids (self-referential for cc, linked chem_comp_id for prd) |
The mol column and RDKit descriptor columns (rdkit_mw, rdkit_logp, rdkit_tpsa, rdkit_hba, rdkit_hbd, rdkit_rotbonds, rdkit_rings, rdkit_formula) are added automatically by pmb setup-rdkit and are not shown in the table above.