Advanced examples#
In this section we will show how to use weighting of features, calculating similarities instead of distances, Podani’s optimization and more.
Calculating similarities#
Package provides an option to calculate similarities instead of distances.
import numpy as np
from gower_metric import Gower
data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_nominal",
}
gower = Gower(feature_types=feature_types).fit(data)
similarity = gower.similarity(data[0], data[1])
Passing weights#
You can pass weights to features when calculating distances or similarities. This allows you to give more importance to certain features over others.
import numpy as np
from gower_metric import Gower
data = np.array([[1, 'a', 3.5], [2, 'b', 4.0], [3, 'a', 2.5], [4, 'c', 5.0]], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_nominal",
2: "ratio_scale_interval"
}
weights = {
0: 0.5,
1: 2.0,
2: 1.0
}
gower = Gower(feature_types=feature_types, feature_weights=weights).fit(data)
Categorical ordinal example#
Here things get a bit more tricky. When dealing with categorical ordinal data, we need to provide an additional mapping that defines the order of the categories.
import numpy as np
from gower_metric import Gower
data = np.array([
[1, 'low', 3.5],
[2, 'medium', 4.0],
[3, 'high', 2.5],
[4, 'medium', 5.0]
], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_ordinal",
2: "ratio_scale_interval"
}
ordinal_mappings = {
1: ['low', 'medium', 'high']
}
gower = Gower(feature_types=feature_types, categorical_ordinal_values_order=ordinal_mappings)
gower.fit(data)
More class functionality#
On top of the examples before, we can also play with other class functionalities, mostly for numerical data but not only.
import numpy as np
from gower_metric import Gower
data = np.array([
[1, 'low', 3.5],
[2, 'medium', 4.0],
[3, 'high', 2.5],
[4, 'medium', 5.0]
], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_ordinal",
2: "ratio_scale_interval"
}
ordinal_mappings = {
1: ['low', 'medium', 'high']
}
gower = Gower(
feature_types=feature_types,
categorical_ordinal_values_order=ordinal_mappings,
categorical_ordinal_calculation_type="podani",
scale="iqr",
missing_strategy="max_dist",
scale_window="kde",
scale_window_type="silverman",
conditional_distances=True
)
gower.fit(data)