# Pre-clustering classification cleaning


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## NaN sweep — drop incomplete markings

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L51"
target="_blank" style="float:right; font-size:smaller">source</a>

### filter_nan_required

``` python

def filter_nan_required(
    df:DataFrame
)->DataFrame:

```

*Drop fans/blotches missing any required column.*

Required cols: fans need `x, y, distance, angle, spread`; blotches need
`x, y, radius_1, radius_2`. Rows with `marking != fan|blotch` pass
through untouched.

## Default-marking filter (Zooniverse v1 only)

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L65"
target="_blank" style="float:right; font-size:smaller">source</a>

### filter_default_markings

``` python

def filter_default_markings(
    df:DataFrame
)->DataFrame:

```

*Drop the legacy Zooniverse-v1 auto-spawned default markings.*

Verbatim port of legacy `planet4.reduction.filter_data` steps 2-5:

- Origin-pinned default fan:
  `|x|<eps & |y|<eps & |angle|<eps & distance~10`.
- Second-default fan: `|angle|~90 & spread~2.017450 & distance~10`.
- Origin-pinned 10x10 default ellipse blotch:
  `|x|<eps & |y|<eps & r1~10 & r2~10`.
- Origin-pinned `none` row.

Empirically a no-op for Panoptes-12978 data (the new UI does not
auto-spawn these defaults on click-without-drag); kept available for
legacy reprocessing.

## Out-of-frame filter — drop markings far outside tile

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L109"
target="_blank" style="float:right; font-size:smaller">source</a>

### filter_out_of_frame

``` python

def filter_out_of_frame(
    df:DataFrame, tolerance_px:int=25
)->DataFrame:

```

*Drop markings whose centre falls more than `tolerance_px` outside the*
840x648 tile. `marking == "none"` rows are exempted (they record
‘volunteer saw nothing’ and have meaningless coords).

## Blotch geometry canonicalisation

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L130"
target="_blank" style="float:right; font-size:smaller">source</a>

### canonicalize_blotch_geometry

``` python

def canonicalize_blotch_geometry(
    df:DataFrame
)->DataFrame:

```

*Make blotch ellipses canonical: `radius_1 >= radius_2`, angle in
`[0, 180)`.*

Verbatim port of legacy `planet4.reduction.convert_ellipse_angles`:
where `radius_1 < radius_2` we swap the radii **and** add 90 deg to the
angle, then take `angle % 180` (ellipse symmetry).

Modifies the dataframe in place and returns it.

## Fan angle canonicalisation

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L156"
target="_blank" style="float:right; font-size:smaller">source</a>

### canonicalize_fan_angles

``` python

def canonicalize_fan_angles(
    df:DataFrame
)->DataFrame:

```

*Fold fan angles into `[0, 360)`.*

Verbatim port of legacy `planet4.reduction.normalize_fan_angles`.
Empirically a no-op for Panoptes-12978 (the UI already produces fan
angles in `[0, 360)`); kept for legacy reprocessing.

## Angular components for clustering

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L169"
target="_blank" style="float:right; font-size:smaller">source</a>

### compute_angle_components

``` python

def compute_angle_components(
    df:DataFrame
)->DataFrame:

```

*Add `x_angle = cos(deg2rad(angle))` and
`y_angle = sin(deg2rad(angle))`.*

These are the angular features the catalog reduction (clustering) reads
directly: `production.dbscan` clusters fans on `(x_angle, y_angle)` and
blotches on `y_angle`. Must run **after** the blotch and fan angle
canonicalisations so the components reflect the canonical-quadrant
angle.

## Orchestrator — dispatch per raw source

------------------------------------------------------------------------

<a
href="https://github.com/michaelaye/p4tools/blob/main/p4tools/production/cleaning.py#L184"
target="_blank" style="float:right; font-size:smaller">source</a>

### clean_classifications

``` python

def clean_classifications(
    df:DataFrame, source:Literal='panoptes', out_of_frame_tolerance_px:int=25
)->DataFrame:

```

*Top-level orchestrator. Dispatches the right cleanup steps per raw
source.*

Steps run, in order:

1.  `[`filter_nan_required`](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_nan_required)`
    (always)
2.  `[`filter_default_markings`](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_default_markings)`
    (zooniverse_v1 only)
3.  `[`filter_out_of_frame`](https://michaelaye.github.io/p4tools/production.cleaning.html#filter_out_of_frame)`
    (always)
4.  `[`canonicalize_blotch_geometry`](https://michaelaye.github.io/p4tools/production.cleaning.html#canonicalize_blotch_geometry)`
    (always)
5.  `[`canonicalize_fan_angles`](https://michaelaye.github.io/p4tools/production.cleaning.html#canonicalize_fan_angles)`
    (zooniverse_v1 only — Panoptes already canonical)
6.  `[`compute_angle_components`](https://michaelaye.github.io/p4tools/production.cleaning.html#compute_angle_components)`
    (always)
