Skip to content

Base

Base class for distributors.

BaseDistributor

Bases: BaseEstimator

The base class for distributors.

A distributor sets the proportion of samples to be generated inside each cluster and between clusters. Warning: This class should not be used directly. Use the derive classes instead.

fit(X, y, labels=None, neighbors=None)

Generate the intra-label and inter-label distribution.

Parameters:

Name Type Description Default
X InputData

Matrix containing the data which have to be sampled.

required
y Targets

Corresponding label for each sample in X.

required
labels Labels | None

Labels of each sample.

None
neighbors Neighbors | None

An array that contains all neighboring pairs. Each row is a unique neighboring pair.

None

Returns:

Type Description
Self

The object itself.

Source code in src/clover/distribution/base.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def fit(
    self: Self,
    X: InputData,
    y: Targets,
    labels: Labels | None = None,
    neighbors: Neighbors | None = None,
) -> Self:
    """Generate the intra-label and inter-label distribution.

    Args:
        X:
            Matrix containing the data which have to be sampled.

        y:
            Corresponding label for each sample in X.
        labels:
            Labels of each sample.
        neighbors:
            An array that contains all neighboring pairs. Each row is
            a unique neighboring pair.

    Returns:
        The object itself.
    """
    # Check data
    X, y = check_X_y(X, y, dtype=None)

    # Set statistics
    counts = Counter(y)
    self.majority_class_labels_ = [
        class_label
        for class_label, class_label_count in counts.items()
        if class_label_count == max(counts.values())
    ]
    self.unique_cluster_labels_ = np.unique(labels) if labels is not None else np.array(0, dtype=int)
    self.unique_class_labels_ = np.unique(y)
    self.n_samples_ = len(X)

    # Set default attributes
    self.labels_ = np.repeat(0, len(X)) if labels is None else check_array(labels, ensure_2d=False)
    self.neighbors_ = np.empty((0, 2), dtype=int) if neighbors is None else check_array(neighbors, ensure_2d=False)
    self.intra_distribution_: IntraDistribution = {
        (0, class_label): 1.0 for class_label in np.unique(y) if class_label not in self.majority_class_labels_
    }
    self.inter_distribution_: InterDistribution = {}

    # Fit distributor
    self._fit(X, y, labels, neighbors)

    # Validate fitting procedure
    self._validate_fitting()

    return self

fit_distribute(X, y, labels, neighbors)

Return the intra-label and inter-label distribution.

Parameters:

Name Type Description Default
X InputData

Matrix containing the data which have to be sampled.

required
y Targets

Corresponding label for each sample in X.

required
labels Labels | None

Labels of each sample.

required
neighbors Neighbors | None

An array that contains all neighboring pairs. Each row is a unique neighboring pair.

required

Returns:

Name Type Description
distributions tuple[IntraDistribution, InterDistribution]

A tuple with the two distributions.

Source code in src/clover/distribution/base.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
def fit_distribute(
    self: Self,
    X: InputData,
    y: Targets,
    labels: Labels | None,
    neighbors: Neighbors | None,
) -> tuple[IntraDistribution, InterDistribution]:
    """Return the intra-label and inter-label distribution.

    Args:
        X:
            Matrix containing the data which have to be sampled.
        y:
            Corresponding label for each sample in X.
        labels:
            Labels of each sample.
        neighbors:
            An array that contains all neighboring pairs. Each row is
            a unique neighboring pair.

    Returns:
        distributions:
            A tuple with the two distributions.
    """
    self.fit(X, y, labels, neighbors)
    return self.intra_distribution_, self.inter_distribution_