Pipeline IO

IO infrastructure for producing the catalog.

source

get_ground_projection_root


def get_ground_projection_root(
    
)->Path | None:

source

get_data_root


def get_data_root(
    
)->Path | None:

source

set_database_path


def set_database_path(
    dbfolder, # Path to where planet4 will store clustering results by default.
)->None:

Use to write the database path into the config.


source

get_config


def get_config(
    
)->ConfigParser | None: # Dictionary with the content of the configpath file, or None if config
file does not exist.

Read the configfile and return config dict.


source

get_random_tile


def get_random_tile(
    db:str | None=None, # Path to Parquet database. Resolved from config if *None*.
    tolerance:float=0.5, # Fractional tolerance around the mean (default 0.5 = within 50%).
)->dict: # Keys: ``image_id``, ``image_name``, ``markings``, ``avg``.

Pick a random tile with a near-average marking count.


source

get_db_stats


def get_db_stats(
    db:str | None=None, # Path to Parquet database. Resolved from config if *None*.
    top:int=10, # Number of top tiles/obsids to include.
)->dict: # Keys: ``n_markings``, ``n_tiles``, ``n_obsids``,
``avg_per_tile``, ``avg_per_obsid``,
``type_counts`` (dict), ``top_tiles`` (DataFrame),
``top_obsids`` (DataFrame).

Compute summary statistics for a raw Planet Four database.


source

resolve_dbname


def resolve_dbname(
    db:str | None=None, # Explicit path to the Parquet database.  If *None*, the path is read
from ``[planet4_db] dbname`` in the config file.
)->str: # Resolved database path.

Resolve the database path from an explicit argument or the config file.


source

check_and_pad_id


def check_and_pad_id(
    imgid, # The ID of the individual image.
)->str | None: # The padded image ID if it was provided, otherwise None.

Checks the Image ID and pads it if necessary.


source

PathManager


def PathManager(
    id_:str='', datapath:str='clustering', suffix:str='.csv', obsid:str='', cut:float=0.5, extra_path:str=''
):

Manage file paths and folders related to the analysis pipeline.

Level definitions: * L0 : Raw output of Planet Four * L1A : Clustering of Blotches and Fans on their own * L1B : Clustered blotches and fans combined into final fans, final blotches, and fnotches that need to have a cut applied for the decision between fans or blotches. * L1C : Derived database where a cut has been applied for fnotches to become either fan or blotch.


source

DBManager


def DBManager(
    dbname:NoneType=None, # Filename of database file to use.
    obsid:NoneType=None
):

Access class for database activities.

Provides easy access to often used data items.