Installation
This guide walks you through setting up pdb-mine-builder from scratch.
Pixi (recommended)
Pixi manages all dependencies — Python, PostgreSQL, RDKit, and CLI tools — in a single isolated environment.
Prerequisites
| Requirement | Version | Purpose |
|---|---|---|
| Pixi | Latest | Package manager |
| rsync | Any | Data synchronization from PDBj servers |
Setup
git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
pixi install
cp config.example.yml config.yml # Edit with your data paths
This installs all dependencies including Python, PostgreSQL, and RDKit into an isolated Pixi environment.
pip (alternative)
pip installs the Python package only. You must provide PostgreSQL (17+) and the RDKit PostgreSQL cartridge separately. Database management commands (pixi run db-*) are not available — use your own PostgreSQL instance.
pip install pdbminebuilder
Then create a config file and point it to your PostgreSQL instance:
curl -O https://raw.githubusercontent.com/N283T/pdb-mine-builder/main/config.example.yml
cp config.example.yml config.yml # Edit constring and data paths
pmb --help
conda + pip (alternative)
Database management commands (pixi run db-*) are not available. Use your own PostgreSQL instance.
Use conda to install rdkit-postgresql, then pip for the Python package:
conda create -n pmb python=3.12 rdkit-postgresql -c conda-forge
conda activate pmb
pip install pdbminebuilder
Then create a config file:
curl -O https://raw.githubusercontent.com/N283T/pdb-mine-builder/main/config.example.yml
cp config.example.yml config.yml # Edit constring and data paths
pmb --help
Docker / Podman (alternative)
Data files must be mounted as volumes. The pmb-data volume or a host directory bind mount is required.
Docker or Podman can run PostgreSQL+RDKit and pmb together without installing anything else.
git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
cp config.example.yml config.yml # Edit data paths if needed
docker compose -f docker/docker-compose.yml up -d # Start PostgreSQL+RDKit and pmb
Run pipelines with docker compose run:
# Sync data from PDBj
docker compose -f docker/docker-compose.yml run --rm pmb sync pdbj
# Load data
docker compose -f docker/docker-compose.yml run --rm pmb load pdbj --force
# Check stats
docker compose -f docker/docker-compose.yml run --rm pmb stats
Podman users can replace docker with podman — the same Dockerfile and compose files work with both.
Environment Variables
Copy the example environment file and customize it:
cp .env.example .env
The default .env.example contains:
# PostgreSQL connection
PGPORT=5433
PGHOST=localhost
PGDATA=postgres_data_5433
PGUSER=pdbj
PGDATABASE=pmb
# Data directory (PDBj data root)
DATA_DIR=/path/to/pdb/data
Edit DATA_DIR to point to where you want PDBj data stored on disk.
Default PostgreSQL settings are also defined in pixi.toml under [activation.env]. The .env file overrides those defaults.
PostgreSQL Setup
Initialize and start PostgreSQL:
# Initialize the data directory
pixi run db-init
# Start PostgreSQL
pixi run db-start
# Verify it is running
pixi run db-status
To stop PostgreSQL later:
pixi run db-stop
RDKit Extension
The RDKit PostgreSQL extension enables chemical structure searches (substructure, similarity, etc.) on the cc (Chemical Components) schema.
The extension is automatically configured when you run the cc pipeline. No manual setup is needed in most cases.
To set up RDKit independently (for example, before loading data):
pixi run pmb setup-rdkit
The initial CREATE EXTENSION rdkit requires superuser privileges. If auto-setup fails, run the SQL script manually:
psql -d pmb -f scripts/init_rdkit.sql
Verify Installation
Confirm everything is working:
pixi run pmb --help
pixi run pmb --version
You should see the CLI help output listing all available commands.