Exploring the PDS Product Catalog

The planetarypy.catalog module provides access to a comprehensive catalog of data products across the entire NASA Planetary Data System (PDS) archive. It is built by ingesting instrument definitions from the MillionConcepts pdr-tests repository, which has cataloged ~200 instruments across 60+ missions.

The catalog follows the mission.instrument.product dotted key convention, consistent with planetarypy’s existing PDS index system.

Building the catalog

The first step is to build the catalog database. This clones the pdr-tests repository (sparse checkout, only the definitions folder) and parses all instrument definitions into a local DuckDB database.

This only needs to be done once. Subsequent calls will skip if the database already exists (use force=True to rebuild).

# Optional: enable logging to see what's happening behind the scenes
from loguru import logger
logger.enable("planetarypy")
from planetarypy.catalog import (
    build_catalog,
    list_missions,
    list_instruments,
    list_products,
    example_products,
    search,
    summary,
    ambiguous_mappings,
)
stats = build_catalog()
stats

Catalog overview

The summary() function gives a bird’s-eye view of what the catalog contains, grouped by mission.

summary()

Retrieving product details

The example_products() function returns a DataFrame with sample product entries for a given product type, including PDS archive URLs, file lists, and product IDs.

# Get sample products for Cassini ISS Saturn EDRs
example_products("cassini.iss.edr_sat")
# LRO Diviner EDR products
example_products("lro.diviner.edr")
# Cassini RADAR BIDR products
example_products("cassini.radar.bidr")

Searching across the catalog

The search() function lets you find products by keyword across missions, instruments, product types, and product IDs.

search("hirise")
# Search for spectrometer-related products
search("spectrometer")
# Find anything related to Rosetta
search("rosetta")

Downloading productsThe catalog isn’t just for discovery — you can download actual PDS data products directly. The fetch_product() function resolves a product to its remote URLs and downloads the files to a local directory.### Resolution tiersProduct resolution works in three tiers:1. Catalog lookup (Tier 1): For the ~1,948 sample products in the catalog database, the URL is known directly. This always works for products returned by example_products().2. Index lookup (Tier 2): For arbitrary product IDs, the system looks up the product in a PDS cumulative index (if one is registered for the instrument). This covers millions of additional products for 58 product types across 29 instruments on 15 missions.3. Pattern-based (Tier 3): For product types where all samples share the same URL directory, new product IDs can be resolved without an index.

from planetarypy.catalog import fetch_product, get_product_urls

Inspecting product URLs without downloadingUse get_product_urls() to see all files and their full URLs:

# First, let's find a sample product to work withproducts = example_products("cassini.iss.edr_sat")sample_pid = products.iloc[0]["product_id"]print(f"Sample product ID: {sample_pid}")# Get all file URLs without downloadingget_product_urls("cassini.iss.edr_sat", sample_pid)

Downloading a product

fetch_product() downloads the product files and returns the local directory path. Files are cached — subsequent calls skip already-downloaded files.

# Download a product — returns the local directory
# Uses dotted key: "mission.instrument.product_type"
# local_dir = fetch_product("cassini.iss.edr_sat", sample_pid)
# print(f"Downloaded to: {local_dir}")
# print(f"Files: {list(local_dir.iterdir())}")

# Options:
#   label_only=True  — download only the label file
#   force=True       — re-download even if files exist
#   files=["specific_file.IMG"]  — download specific files only

# Separate arguments also work:
# local_dir = fetch_product("cassini", sample_pid, instrument="iss", product_key="edr_sat")

Index-backed resolution (Tier 2)

For instruments with PDS cumulative indexes, you can resolve any product ID — not just the sample products in the catalog. This works for CTX, HiRISE, Cassini ISS, Galileo SSI, LROC, Diviner, CRISM, and others.

The system automatically falls back to the index when the product isn’t found in the catalog samples.

from planetarypy.catalog._index_resolver import list_indexed_products, has_index

# Which product types have index-backed resolution?
for mission, instrument, product_key, index_key in list_indexed_products():
    print(f"  {mission}.{instrument}.{product_key}{index_key}")
# For indexed product types, you can resolve arbitrary product IDs:# (This requires the PDS index to be downloaded — first call may take a moment)# Example: get URLs for a specific CTX observation# get_product_urls("mro.ctx.edr", "P02_001916_2221_XI_42N027W")# Example: download a specific HiRISE product# local_dir = fetch_product("mro.hirise.edr", "PSP_001330_2530_RED0_0")# Check if a product type has index supportprint(f"CTX EDR has index: {has_index('mro', 'ctx', 'edr')}")print(f"LORRI EDR has index: {has_index('new_horizons', 'lorri', 'edr')}")

Direct database access

For more advanced queries, you can get a DuckDB connection and write SQL directly. The catalog provides a convenient catalog view that joins instruments, product types, and products.

from planetarypy.catalog import get_catalog

con = get_catalog()
# Which missions have the most product types?
con.sql("""
    SELECT mission, COUNT(DISTINCT product_key) as n_product_types
    FROM catalog
    GROUP BY mission
    ORDER BY n_product_types DESC
    LIMIT 10
""").fetchdf()
# Find all product types that have attached labels
con.sql("""
    SELECT mission, instrument, product_key
    FROM catalog
    WHERE label_type = 'A'
    GROUP BY mission, instrument, product_key
    ORDER BY mission, instrument
    LIMIT 20
""").fetchdf()
# Explore the URL hosting patterns across the archive
con.sql("""
    SELECT
        CASE
            WHEN url_stem LIKE '%pdsimage2.wr.usgs.gov%' THEN 'USGS Imaging'
            WHEN url_stem LIKE '%pds-imaging.jpl.nasa.gov%' THEN 'JPL Imaging'
            WHEN url_stem LIKE '%planetarydata.jpl.nasa.gov%' THEN 'JPL Planetary Data'
            WHEN url_stem LIKE '%pds-atmospheres.nmsu.edu%' THEN 'Atmospheres (NMSU)'
            WHEN url_stem LIKE '%pds-geosciences.wustl.edu%' THEN 'Geosciences (WashU)'
            WHEN url_stem LIKE '%pds-rings.seti.org%' THEN 'Rings (SETI)'
            WHEN url_stem LIKE '%s3.%amazonaws.com%' THEN 'AWS S3 Mirror'
            WHEN url_stem LIKE '%hirise-pds.lpl.arizona.edu%' THEN 'HiRISE (Arizona)'
            WHEN url_stem LIKE '%pds-smallbodies%' THEN 'Small Bodies'
            WHEN url_stem LIKE '%archives.esac.esa.int%' THEN 'ESA PSA'
            ELSE 'Other'
        END as hosting_node,
        COUNT(*) as n_products
    FROM products
    WHERE url_stem IS NOT NULL AND url_stem != ''
    GROUP BY hosting_node
    ORDER BY n_products DESC
""").fetchdf()
con.close()

Mission mapping review

The catalog maps pdr-tests folder names (like diviner, gal_ssi, vg_iss) to proper mission/instrument tuples. Most mappings are automatic (split on underscore) or handled by a curated manual map.

Use ambiguous_mappings() to check if any folder names could not be confidently assigned.

ambiguous_mappings()

URL health notes

Not all product URLs in the catalog are still valid. During initial research, the following was found:

Hosting Node Status
USGS Imaging (pdsimage2.wr.usgs.gov/Missions/*) Broken – all 404
JPL Imaging (pds-imaging.jpl.nasa.gov) Migrated to planetarydata.jpl.nasa.gov (redirects work)
Atmospheres (NMSU) Healthy
Geosciences (WashU) Healthy
Rings (SETI) Healthy (some path restructuring)
AWS S3 mirrors Healthy
HiRISE (Arizona) Healthy

You can validate URLs in the catalog using the validation module:

# Uncomment to run URL validation (takes a few minutes, checks sample URLs per product type)
# from planetarypy.catalog._validation import validate_urls
# from planetarypy.config import config
# counts = validate_urls(config.storage_root, sample_size=2)
# print(counts)

Relationship to PDS indexesThe catalog module complements and integrates with the existing planetarypy.pds index system:| | PDS Indexes (planetarypy.pds) | PDS Catalog (planetarypy.catalog) ||—|—|—|| Scope | ~90 curated index files | ~2000 product types across 200+ instruments || Data | Full observation-level metadata (DataFrames with 30-140 columns) | Product-level metadata (URLs, file lists, product IDs) || Source | PDS archive .lbl + .tab files | pdr-tests repository definitions || Storage | Parquet files | DuckDB database || Use case | Query specific observations | Discover and download data across the PDS |Integration: The catalog’s Tier 2 resolution automatically uses PDS indexes when available. For instruments like CTX, HiRISE, Cassini ISS, Galileo SSI, LROC, and many more, this means you can resolve and download any product by ID — the catalog handles the URL construction, volume lookup, and archive-specific path conventions behind the scenes.

Rebuilding the catalog

To update the catalog with the latest definitions from pdr-tests, use force=True:

build_catalog(force=True)

This will re-clone the repository and rebuild the DuckDB database from scratch.

Fetchability analysisNot all product types can resolve arbitrary product IDs. The fetchability classifier examines URL patterns and categorizes each type:- fixed: All samples share the same url_stem — any product ID can be resolved via pattern (Tier 3)- indexed: Variable url_stem but a PDS cumulative index is registered (Tier 2)- unfetchable: Variable url_stem, no index — only sample products from the catalog work (Tier 1)

from planetarypy.catalog._pattern_resolver import fetchability_summaryfetchability_summary()

Investigating unfetchable productsSome product types have variable URL paths (volume-based, date-based) that prevent constructing URLs from just a product ID. Here are examples:

# Unfetchable WITH PID-encoded path info:# The product ID contains date/orbit info that appears in the URL path.# In theory, a custom derivation rule could reconstruct the path.get_product_urls("cassini.caps.edr", "ION_200107518_U1")
# Second sample — different volume/date path, same instrumentget_product_urls("cassini.caps.edr", "ELS_200929918_U3")
# Unfetchable WITHOUT PID-encoded path info:# The URL contains a volume ID (COCDA_0040) that cannot be derived from# the product ID. An index file would be needed to resolve arbitrary products.example_products("cassini.cda.cda_stat")
# These sample products work (Tier 1 catalog match)get_product_urls("cassini.cda.cda_stat", "CDA_STAT")
# Classify a specific product typefrom planetarypy.catalog._pattern_resolver import classify_product_typecaps = classify_product_type("cassini", "caps", "edr")print(f"Status: {caps.status}")print(f"PID contains variable info: {caps.pid_contains_variable}")print(f"Reason: {caps.reason}")print()cda = classify_product_type("cassini", "cda", "cda_stat")print(f"Status: {cda.status}")print(f"PID contains variable info: {cda.pid_contains_variable}")print(f"Reason: {cda.reason}")