Tile-activity classification
busy=True/False labels and busy-restricted surface coverage statistics.
ClassifyConfig
def ClassifyConfig(
delta_bic:float=10.0, min_tiles:int=20, frac_zero_high:float=0.85, frac_zero_low:float=0.15, random_state:int=42,
global_threshold:float | None=None
)->None:
Decision thresholds for the per-obsid classifier.
Pattern
def Pattern(
args:VAR_POSITIONAL, kwds:VAR_KEYWORD
):
Classification of an obsid’s tile-count distribution.
Data loading
Defaults to the public v3.1 fan/blotch parquets via p4tools.io. The per-tile coverage CSV is external and supplied as an explicit path.
load_v3p1_data
def load_v3p1_data(
coverage_csv:str | pathlib.Path, # Path to a CSV with columns ``[obsid, tile_id, Coverage]``
(e.g. ``FnotchCoverage_Full_v3.1.csv``).
version:str='v3.1', # Catalog version forwarded to ``p4tools.io.get_*_catalog``.
fan:pandas.DataFrame | None=None, blotch:pandas.DataFrame | None=None
)->P4Data:
Load the v3.1 fan, blotch, and per-tile coverage frames.
P4Data
def P4Data(
fan:DataFrame, blotch:DataFrame, coverage:DataFrame
)->None:
Container for the three v3.1 dataframes used by this module.
Per-tile marking counts
count_markings_per_tile
def count_markings_per_tile(
data:P4Data
)->DataFrame:
One row per (obsid, tile_id) with total marking counts.
Tiles present in data.coverage but absent from the catalogs appear with n_markings = 0. Returned columns: [obsid, tile_id, n_fans, n_blotches, n_markings, Coverage].
Per-obsid classification
classify_obsid
def classify_obsid(
counts:ndarray, obsid:str='',
config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None),
global_threshold:float | None=None
)->ObsidClassification:
Classify one obsid’s tile-count distribution.
global_threshold overrides config.global_threshold for the forced-unimodal small-obsid path.
ObsidClassification
def ObsidClassification(
obsid:str, pattern:Pattern, n_tiles:int, frac_zero:float, mean_count:float, bic1:float, bic2:float,
cluster_means:tuple[float, float] | None, busy_mask:ndarray
)->None:
Result of classifying a single HiRISE observation’s tile distribution.
All-obsid classification
First pass: classify every obsid; gather bimodal split-points to learn the global threshold T. Second pass: only the small-obsid forced-unimodal entries are re-classified with T.
classify_all
def classify_all(
tile_counts:DataFrame,
config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None)
)->tuple: # ``[obsid, tile_id, n_fans, n_blotches, n_markings, Coverage,
pattern, busy]``.
Classify every obsid and label every tile.
Coverage on busy tiles
Vectorised aggregation across the cap (no Python-level loop over obsids).
summary
def summary(
labeled:DataFrame, params:dict
)->dict:
Cap-wide one-line summary suitable for printing or logging.
coverage_on_busy
def coverage_on_busy(
labeled:DataFrame
)->DataFrame:
Per-obsid coverage statistics, split by busy/quiet.
CLI
End-to-end pipeline writing per-tile labels and per-obsid summary CSVs.
main
def main(
)->None: # pragma: no cover
CLI entry; prints the cap-wide summary.
run_pipeline
def run_pipeline(
coverage_csv:str | pathlib.Path, out_dir:str | pathlib.Path='outputs',
config:ClassifyConfig=ClassifyConfig(delta_bic=10.0, min_tiles=20, frac_zero_high=0.85, frac_zero_low=0.15, random_state=42, global_threshold=None),
version:str='v3.1'
)->dict:
Run the full classification pipeline and write CSVs to out_dir.
Manual entry point so python -m p4tools.classify_by_activity --coverage-csv ... works: