# Optional: enable logging to see what's happening behind the scenes
from loguru import logger
logger.enable("planetarypy")Exploring the PDS Product Catalog
The planetarypy.catalog module provides access to a comprehensive catalog of data products across the entire NASA Planetary Data System (PDS) archive. It is built by ingesting instrument definitions from the MillionConcepts pdr-tests repository, which has cataloged ~200 instruments across 60+ missions.
The catalog follows the mission.instrument.product dotted key convention, consistent with planetarypy’s existing PDS index system.
Building the catalog
The first step is to build the catalog database. This clones the pdr-tests repository (sparse checkout, only the definitions folder) and parses all instrument definitions into a local DuckDB database.
This only needs to be done once. Subsequent calls will skip if the database already exists (use force=True to rebuild).
from planetarypy.catalog import (
build_catalog,
list_missions,
list_instruments,
list_products,
example_products,
search,
summary,
ambiguous_mappings,
)stats = build_catalog()
statsCatalog overview
The summary() function gives a bird’s-eye view of what the catalog contains, grouped by mission.
summary()Retrieving product details
The example_products() function returns a DataFrame with sample product entries for a given product type, including PDS archive URLs, file lists, and product IDs.
# Get sample products for Cassini ISS Saturn EDRs
example_products("cassini.iss.edr_sat")# LRO Diviner EDR products
example_products("lro.diviner.edr")# Cassini RADAR BIDR products
example_products("cassini.radar.bidr")Searching across the catalog
The search() function lets you find products by keyword across missions, instruments, product types, and product IDs.
search("hirise")# Search for spectrometer-related products
search("spectrometer")# Find anything related to Rosetta
search("rosetta")Direct database access
For more advanced queries, you can get a DuckDB connection and write SQL directly. The catalog provides a convenient catalog view that joins instruments, product types, and products.
from planetarypy.catalog import get_catalog
con = get_catalog()# Which missions have the most product types?
con.sql("""
SELECT mission, COUNT(DISTINCT product_key) as n_product_types
FROM catalog
GROUP BY mission
ORDER BY n_product_types DESC
LIMIT 10
""").fetchdf()# Find all product types that have attached labels
con.sql("""
SELECT mission, instrument, product_key
FROM catalog
WHERE label_type = 'A'
GROUP BY mission, instrument, product_key
ORDER BY mission, instrument
LIMIT 20
""").fetchdf()# Explore the URL hosting patterns across the archive
con.sql("""
SELECT
CASE
WHEN url_stem LIKE '%pdsimage2.wr.usgs.gov%' THEN 'USGS Imaging'
WHEN url_stem LIKE '%pds-imaging.jpl.nasa.gov%' THEN 'JPL Imaging'
WHEN url_stem LIKE '%planetarydata.jpl.nasa.gov%' THEN 'JPL Planetary Data'
WHEN url_stem LIKE '%pds-atmospheres.nmsu.edu%' THEN 'Atmospheres (NMSU)'
WHEN url_stem LIKE '%pds-geosciences.wustl.edu%' THEN 'Geosciences (WashU)'
WHEN url_stem LIKE '%pds-rings.seti.org%' THEN 'Rings (SETI)'
WHEN url_stem LIKE '%s3.%amazonaws.com%' THEN 'AWS S3 Mirror'
WHEN url_stem LIKE '%hirise-pds.lpl.arizona.edu%' THEN 'HiRISE (Arizona)'
WHEN url_stem LIKE '%pds-smallbodies%' THEN 'Small Bodies'
WHEN url_stem LIKE '%archives.esac.esa.int%' THEN 'ESA PSA'
ELSE 'Other'
END as hosting_node,
COUNT(*) as n_products
FROM products
WHERE url_stem IS NOT NULL AND url_stem != ''
GROUP BY hosting_node
ORDER BY n_products DESC
""").fetchdf()con.close()Mission mapping review
The catalog maps pdr-tests folder names (like diviner, gal_ssi, vg_iss) to proper mission/instrument tuples. Most mappings are automatic (split on underscore) or handled by a curated manual map.
Use ambiguous_mappings() to check if any folder names could not be confidently assigned.
ambiguous_mappings()URL health notes
Not all product URLs in the catalog are still valid. During initial research, the following was found:
| Hosting Node | Status |
|---|---|
USGS Imaging (pdsimage2.wr.usgs.gov/Missions/*) |
Broken – all 404 |
JPL Imaging (pds-imaging.jpl.nasa.gov) |
Migrated to planetarydata.jpl.nasa.gov (redirects work) |
| Atmospheres (NMSU) | Healthy |
| Geosciences (WashU) | Healthy |
| Rings (SETI) | Healthy (some path restructuring) |
| AWS S3 mirrors | Healthy |
| HiRISE (Arizona) | Healthy |
You can validate URLs in the catalog using the validation module:
# Uncomment to run URL validation (takes a few minutes, checks sample URLs per product type)
# from planetarypy.catalog._validation import validate_urls
# from planetarypy.config import config
# counts = validate_urls(config.storage_root, sample_size=2)
# print(counts)Relationship to PDS indexesThe catalog module complements and integrates with the existing planetarypy.pds index system:| | PDS Indexes (planetarypy.pds) | PDS Catalog (planetarypy.catalog) ||—|—|—|| Scope | ~90 curated index files | ~2000 product types across 200+ instruments || Data | Full observation-level metadata (DataFrames with 30-140 columns) | Product-level metadata (URLs, file lists, product IDs) || Source | PDS archive .lbl + .tab files | pdr-tests repository definitions || Storage | Parquet files | DuckDB database || Use case | Query specific observations | Discover and download data across the PDS |Integration: The catalog’s Tier 2 resolution automatically uses PDS indexes when available. For instruments like CTX, HiRISE, Cassini ISS, Galileo SSI, LROC, and many more, this means you can resolve and download any product by ID — the catalog handles the URL construction, volume lookup, and archive-specific path conventions behind the scenes.
Rebuilding the catalog
To update the catalog with the latest definitions from pdr-tests, use force=True:
build_catalog(force=True)This will re-clone the repository and rebuild the DuckDB database from scratch.