Tile-activity classification

Per-HiRISE-observation classification of tile marking-count distributions into bimodal (busy/not-busy two-family) or unimodal (uniformly busy / uniformly quiet) patterns, with per-tile busy=True/False labels and busy-restricted surface coverage statistics.

source

ClassifyConfig


def ClassifyConfig(
    delta_bic:float=10.0, min_tiles:int=20, frac_zero_high:float=0.85, frac_zero_low:float=0.15, random_state:int=42,
    global_threshold:float | None=None
)->None:

Decision thresholds for the per-obsid classifier.

source

Pattern


def Pattern(
    args:VAR_POSITIONAL, kwds:VAR_KEYWORD
):

Classification of an obsid’s tile-count distribution.

Data loading

Defaults to the public v3.1 fan/blotch parquets via p4tools.io. The per-tile coverage CSV is external and supplied as an explicit path.

source

load_v3p1_data


def load_v3p1_data(
    coverage_csv:str | pathlib.Path, # Path to a CSV with columns ``[obsid, tile_id, Coverage]``
(e.g. ``FnotchCoverage_Full_v3.1.csv``).
    version:str='v3.1', # Catalog version forwarded to ``p4tools.io.get_*_catalog``.
    fan:pandas.DataFrame | None=None, blotch:pandas.DataFrame | None=None
)->P4Data:

Load the v3.1 fan, blotch, and per-tile coverage frames.

source

P4Data


def P4Data(
    fan:DataFrame, blotch:DataFrame, coverage:DataFrame
)->None:

Container for the three v3.1 dataframes used by this module.

Per-tile marking counts

source

count_markings_per_tile


def count_markings_per_tile(
    data:P4Data
)->DataFrame:

One row per (obsid, tile_id) with total marking counts.

Tiles present in data.coverage but absent from the catalogs appear with n_markings = 0. Returned columns: [obsid, tile_id, n_fans, n_blotches, n_markings, Coverage].

Per-obsid classification

source

classify_obsid


def classify_obsid(
    counts:ndarray, obsid:str='',
    config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None),
    global_threshold:float | None=None
)->ObsidClassification:

Classify one obsid’s tile-count distribution.

global_threshold overrides config.global_threshold for the forced-unimodal small-obsid path.

source

ObsidClassification


def ObsidClassification(
    obsid:str, pattern:Pattern, n_tiles:int, frac_zero:float, mean_count:float, bic1:float, bic2:float,
    cluster_means:tuple[float, float] | None, busy_mask:ndarray
)->None:

Result of classifying a single HiRISE observation’s tile distribution.

All-obsid classification

First pass: classify every obsid; gather bimodal split-points to learn the global threshold T. Second pass: only the small-obsid forced-unimodal entries are re-classified with T.

source

classify_all


def classify_all(
    tile_counts:DataFrame,
    config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None)
)->tuple: # ``[obsid, tile_id, n_fans, n_blotches, n_markings, Coverage,
pattern, busy]``.

Classify every obsid and label every tile.

Coverage on busy tiles

Vectorised aggregation across the cap (no Python-level loop over obsids).

source

summary


def summary(
    labeled:DataFrame, params:dict
)->dict:

Cap-wide one-line summary suitable for printing or logging.

source

coverage_on_busy


def coverage_on_busy(
    labeled:DataFrame
)->DataFrame:

Per-obsid coverage statistics, split by busy/quiet.

CLI

End-to-end pipeline writing per-tile labels and per-obsid summary CSVs.

source

main


def main(
    
)->None: # pragma: no cover

CLI entry; prints the cap-wide summary.

source

run_pipeline


def run_pipeline(
    coverage_csv:str | pathlib.Path, out_dir:str | pathlib.Path='outputs',
    config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None),
    version:str='v3.1'
)->dict:

Run the full classification pipeline and write CSVs to out_dir.

Manual entry point so python -m p4tools.classify_by_activity --coverage-csv ... works: