Transform method#

The transform method alows user to convert original dataframe into it’s numerical representation. Thanks to that, we can use our metric with external libraries such as scipy or scikit-learn. More on that in External Api Compatibility section.

Transform#

It is done by calling the transform method on a fitted Gower instance.

import numpy as np

from gower_metric import Gower

data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)

feature_types = {
   0: "ratio_scale_interval",
   1: "categorical_nominal",
}

gower = Gower(feature_types=feature_types).fit(data)

transformed_data = gower.transform(data)

All numerical, ratio scale interval features are intact. Binary features are represented as 0 and 1, depending on the value. Categorical nominal and ordinal features are encoded using ordinal encoding from scikit-learn.

Fit transform#

For convenience, Gower also implements fit_transform method, which combines fit and transform in one call.

import numpy as np

from gower_metric import Gower

data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)

feature_types = {
   0: "ratio_scale_interval",
   1: "categorical_nominal",
}

gower = Gower(feature_types=feature_types)

transformed_data = gower.fit_transform(data)

Warning

Under the hood, the fit method learns the mapping of categorical ordinal values to their numerical representation. Therefore, calling transform on the same data before and after re-fitting the instance may result in different numerical representations or even NaN values. The same applies to the fit_transform method. If your data does not contain any categorical ordinal features, this warning may not apply (we have not implemented tests for this scenario).