Pre-clustering classification cleaning
NaN sweep — drop incomplete markings
filter_nan_required
def filter_nan_required(
df:DataFrame
)->DataFrame:
Drop fans/blotches missing any required column.
Required cols: fans need x, y, distance, angle, spread; blotches need x, y, radius_1, radius_2. Rows with marking != fan|blotch pass through untouched.
Default-marking filter (Zooniverse v1 only)
filter_default_markings
def filter_default_markings(
df:DataFrame
)->DataFrame:
Drop the legacy Zooniverse-v1 auto-spawned default markings.
Verbatim port of legacy planet4.reduction.filter_data steps 2-5:
- Origin-pinned default fan:
|x|<eps & |y|<eps & |angle|<eps & distance~10. - Second-default fan:
|angle|~90 & spread~2.017450 & distance~10. - Origin-pinned 10x10 default ellipse blotch:
|x|<eps & |y|<eps & r1~10 & r2~10. - Origin-pinned
nonerow.
Empirically a no-op for Panoptes-12978 data (the new UI does not auto-spawn these defaults on click-without-drag); kept available for legacy reprocessing.
Out-of-frame filter — drop markings far outside tile
filter_out_of_frame
def filter_out_of_frame(
df:DataFrame, tolerance_px:int=25
)->DataFrame:
Drop markings whose centre falls more than tolerance_px outside the 840x648 tile. marking == "none" rows are exempted (they record ‘volunteer saw nothing’ and have meaningless coords).
Blotch geometry canonicalisation
canonicalize_blotch_geometry
def canonicalize_blotch_geometry(
df:DataFrame
)->DataFrame:
Make blotch ellipses canonical: radius_1 >= radius_2, angle in [0, 180).
Verbatim port of legacy planet4.reduction.convert_ellipse_angles: where radius_1 < radius_2 we swap the radii and add 90 deg to the angle, then take angle % 180 (ellipse symmetry).
Modifies the dataframe in place and returns it.
Fan angle canonicalisation
canonicalize_fan_angles
def canonicalize_fan_angles(
df:DataFrame
)->DataFrame:
Fold fan angles into [0, 360).
Verbatim port of legacy planet4.reduction.normalize_fan_angles. Empirically a no-op for Panoptes-12978 (the UI already produces fan angles in [0, 360)); kept for legacy reprocessing.
Angular components for clustering
compute_angle_components
def compute_angle_components(
df:DataFrame
)->DataFrame:
Add x_angle = cos(deg2rad(angle)) and y_angle = sin(deg2rad(angle)).
These are the angular features the catalog reduction (clustering) reads directly: production.dbscan clusters fans on (x_angle, y_angle) and blotches on y_angle. Must run after the blotch and fan angle canonicalisations so the components reflect the canonical-quadrant angle.
Orchestrator — dispatch per raw source
clean_classifications
def clean_classifications(
df:DataFrame, source:Literal='panoptes', out_of_frame_tolerance_px:int=25
)->DataFrame:
Top-level orchestrator. Dispatches the right cleanup steps per raw source.
Steps run, in order:
[filter_nan_required](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_nan_required)(always)[filter_default_markings](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_default_markings)(zooniverse_v1 only)[filter_out_of_frame](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_out_of_frame)(always)[canonicalize_blotch_geometry](https://michaelaye.github.io/p4tools/production.cleaning.html#canonicalize_blotch_geometry)(always)[canonicalize_fan_angles](https://michaelaye.github.io/p4tools/production.cleaning.html#canonicalize_fan_angles)(zooniverse_v1 only — Panoptes already canonical)[compute_angle_components](https://michaelaye.github.io/p4tools/production.cleaning.html#compute_angle_components)(always)