# Catalog Production ------------------------------------------------------------------------ source ### execute_in_parallel ``` python def execute_in_parallel( func:Callable, # The function to be executed for each element iterable:Iterable, # The iterable over which to execute the function max_workers:int | None=None, # Number of parallel workers. Defaults to ProcessPoolExecutor default. description:str='Processing', # Label for the tqdm progress bar. ): # Successful results in submission order (skipping failures). ``` *Execute a function in parallel over an iterable with per-item error handling.* Unlike pool.map, individual failures do not abort the entire batch. ------------------------------------------------------------------------ source ### blotch_id_generator ``` python def blotch_id_generator( )->Generator: ``` *Generator for blotch marking IDs (prefix ‘B’).* ------------------------------------------------------------------------ source ### fan_id_generator ``` python def fan_id_generator( )->Generator: ``` *Generator for fan marking IDs (prefix ‘F’).* ------------------------------------------------------------------------ source ### marking_id_generator ``` python def marking_id_generator( prefix:str, # Single-character prefix, e.g. "F" for fans or "B" for blotches. )->Generator: ``` *Generator for unique marking IDs with the given prefix.* ------------------------------------------------------------------------ source ### cluster_obsid ``` python def cluster_obsid( obsid:NoneType=None, # HiRISE obsid (= Planet four image_name) savedir:NoneType=None, # Top directory path where the catalog will be stored. Will create folder if it does not exist yet. imgid:NoneType=None, # Convenience parameter: If `obsid` is not given and therefore is None, this `image_id` can be used to receive the respective `obsid` from the TileID class. dbname:NoneType=None, # Path to the database file. ): ``` *Cluster all image_ids for given obsid (=image_name).* ------------------------------------------------------------------------ source ### cluster_obsid_parallel ``` python def cluster_obsid_parallel( obsids:list, # List of the obsids to cluster savedir:str, # path to the save directory which will save the clustering results dbname:str, # The database name ): # Successful results. ``` *Apply the Clustering Algorithm for multiple obsids in parallel.* Individual failures are logged and skipped instead of aborting the batch. ------------------------------------------------------------------------ source ### fnotch_obsid_parallel ``` python def fnotch_obsid_parallel( obsids:list, # List of the Obsids to fnotch savedir:str, # the directory path where to save ): # Successful results. ``` *Applies the fnotching for multiple obsid’s in parallel.* Individual failures are logged and skipped instead of aborting the batch. ------------------------------------------------------------------------ source ### fnotch_obsid ``` python def fnotch_obsid( obsid:NoneType=None, # The observation ID to be processed. savedir:NoneType=None, # The directory where the results will be saved. fnotch_via_obsid:bool=False, # Switch to control if fnotching happens per observation ID (obsid) or per image ID. If True, fnotching is done per observation ID. If False, fnotching is done per image ID. imgid:NoneType=None, # The image ID to be processed. This parameter is currently not used in the function. ): # The observation ID that was processed. ``` *Perform fnotching on HiRISE images based on observation ID or image ID.* ------------------------------------------------------------------------ source ### add_marking_ids ``` python def add_marking_ids( path, # Path to L1A image_id clustering result directory fan_id, blotch_id ): ``` *Add marking_ids for catalog to cluster results.* ------------------------------------------------------------------------ source ### get_L1A_paths ``` python def get_L1A_paths( obsid, # HiRISE observation ID datapath, # Top-level catalog/clustering directory ): # L1A directories for each tile within the obsid ``` *Return all L1A result directories for an obsid.* ------------------------------------------------------------------------ source ### create_roi_file ``` python def create_roi_file( obsids, # List of HiRISE obsids roi_name, # Name for ROI datapath, # Path to the top folder with the clustering output data. ): ``` *Create a Region of Interest file, based on list of obsids.* For more structured analysis processes, we can create a summary file for a list of obsids belonging to a ROI. The alternative is to define to what ROI any final object belongs to and add that as a column in the final catalog. ------------------------------------------------------------------------ source ### ReleaseManager ``` python def ReleaseManager( version, obsids:NoneType=None, overwrite:bool=False, dbname:NoneType=None ): ``` *Class to manage releases and find relevant files.* TODO better description Parameters ———- version : str Version string for this catalog. Same as datapath in other P4 code. obsids : iterable, optional Iterable of obsids that should be used for catalog file. Default is to use the full list of the default database, which is Seasons 2 and 3 at this point. overwrite : bool, optional Switch to control if already existing result folders for an obsid should be overwritten. Default: False ------------------------------------------------------------------------ source ### read_csvfiles_into_lists_of_frames ``` python def read_csvfiles_into_lists_of_frames( folders ): ``` *Reads CSV files from given folders into lists of DataFrames.* This function iterates over a list of folders, reads CSV files within those folders, and categorizes them into two lists: ‘fan’ and ‘blotch’. The categorization is based on the filename ending with ‘fans.csv’ or blotch.csv. Args: folders (list of pathlib.Path): A list of folder paths to search for CSV files. Returns: dict: A dictionary with two keys, ‘fan’ and ‘blotch’, each containing a list of pandas DataFrames read from the CSV files. To create the Catalog from the Planet4 raw dataset we will use the pipeline in the following simple way. ``` python from pathlib import Path newpath = Path("../../../../Data/Downselected.parq")#This should point to the Dataset you wish to Catalogize db = io.DBManager(newpath) #Setting the Logger to Level 10 so we get all the INFO printed out LOGGER.setLevel(10) ``` ``` python ##The Relaese Manager is our main object that stores the important data like the path to the database ##aswell as the name of the Folder to store the data rm = ReleaseManager("p4tools_test", dbname=db.dbname, overwrite=False) #Check for which Images need to be finished and were not analysed yet. rm.check_for_todo() rm.todo ``` ``` python #Finally Launch the production pipeline rm.launch_catalog_production() ```