Skip to main content

Installation

This guide walks you through setting up pdb-mine-builder from scratch.

Pixi manages all dependencies — Python, PostgreSQL, RDKit, and CLI tools — in a single isolated environment.

Prerequisites

RequirementVersionPurpose
PixiLatestPackage manager
rsyncAnyData synchronization from PDBj servers

Setup

git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
pixi install
cp config.example.yml config.yml # Edit with your data paths

This installs all dependencies including Python, PostgreSQL, and RDKit into an isolated Pixi environment.

pip (alternative)

warning

pip installs the Python package only. You must provide PostgreSQL (17+) and the RDKit PostgreSQL cartridge separately. Database management commands (pixi run db-*) are not available — use your own PostgreSQL instance.

pip install pdbminebuilder

Then create a config file and point it to your PostgreSQL instance:

curl -O https://raw.githubusercontent.com/N283T/pdb-mine-builder/main/config.example.yml
cp config.example.yml config.yml # Edit constring and data paths
pmb --help

conda + pip (alternative)

warning

Database management commands (pixi run db-*) are not available. Use your own PostgreSQL instance.

Use conda to install rdkit-postgresql, then pip for the Python package:

conda create -n pmb python=3.12 rdkit-postgresql -c conda-forge
conda activate pmb
pip install pdbminebuilder

Then create a config file:

curl -O https://raw.githubusercontent.com/N283T/pdb-mine-builder/main/config.example.yml
cp config.example.yml config.yml # Edit constring and data paths
pmb --help

Docker / Podman (alternative)

warning

Data files must be mounted as volumes. The pmb-data volume or a host directory bind mount is required.

Docker or Podman can run PostgreSQL+RDKit and pmb together without installing anything else.

git clone https://github.com/N283T/pdb-mine-builder.git
cd pdb-mine-builder
cp config.example.yml config.yml # Edit data paths if needed
docker compose -f docker/docker-compose.yml up -d # Start PostgreSQL+RDKit and pmb

Run pipelines with docker compose run:

# Sync data from PDBj
docker compose -f docker/docker-compose.yml run --rm pmb sync pdbj

# Load data
docker compose -f docker/docker-compose.yml run --rm pmb load pdbj --force

# Check stats
docker compose -f docker/docker-compose.yml run --rm pmb stats
tip

Podman users can replace docker with podman — the same Dockerfile and compose files work with both.

Environment Variables

Copy the example environment file and customize it:

cp .env.example .env

The default .env.example contains:

# PostgreSQL connection
PGPORT=5433
PGHOST=localhost
PGDATA=postgres_data_5433
PGUSER=pdbj
PGDATABASE=pmb

# Data directory (PDBj data root)
DATA_DIR=/path/to/pdb/data

Edit DATA_DIR to point to where you want PDBj data stored on disk.

tip

Default PostgreSQL settings are also defined in pixi.toml under [activation.env]. The .env file overrides those defaults.

PostgreSQL Setup

Initialize and start PostgreSQL:

# Initialize the data directory
pixi run db-init

# Start PostgreSQL
pixi run db-start

# Verify it is running
pixi run db-status

To stop PostgreSQL later:

pixi run db-stop

RDKit Extension

The RDKit PostgreSQL extension enables chemical structure searches (substructure, similarity, etc.) on the cc (Chemical Components) schema.

The extension is automatically configured when you run the cc pipeline. No manual setup is needed in most cases.

To set up RDKit independently (for example, before loading data):

pixi run pmb setup-rdkit
note

The initial CREATE EXTENSION rdkit requires superuser privileges. If auto-setup fails, run the SQL script manually:

psql -d pmb -f scripts/init_rdkit.sql

Verify Installation

Confirm everything is working:

pixi run pmb --help
pixi run pmb --version

You should see the CLI help output listing all available commands.

Next Steps