Matrix support#
We also provide endpoint support for calculating various types of matrices. We use joblib library to parallelize computations and speed up the whole process. By default, backend is set to loky.
import numpy as np
from gower_metric import Gower
data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_nominal",
}
gower = Gower(feature_types=feature_types).fit(data)
distance_matrix = gower.matrix(data)
similarity_matrix = gower.matrix(data, matrix_type='similarity')
Endpoint creates a square matrix where each element (i, j) represents the distance or similarity between data points i and j. You can specify the type of matrix you want to compute using the matrix_type parameter, which can be either distance or similarity. By default, it computes the distance matrix.
Sparse matrix support#
On top of that, user can also compute sparse matrices using SciPy’s sparse matrix format. This is particularly useful when dealing with large datasets where many distances or similarities are zero, allowing for more efficient storage and computation.
import numpy as np
from gower_metric import Gower
data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_nominal",
}
gower = Gower(feature_types=feature_types).fit(data)
sparse_distance_matrix = gower.matrix(data, convert_to_sparse=True, sparse_type="csc")
sparse_similarity_matrix = gower.matrix(data, matrix_type='similarity', convert_to_sparse=True, sparse_type="coo")
Here, the convert_to_sparse parameter is set to True to indicate that we want the output in a sparse format. The sparse_type parameter allows you to choose the specific type of sparse matrix representation, such as csc (Compressed Sparse Column), csr (Compressed Sparse Row) , or coo (Coordinate List).
By default, n_jobs is set to -1, which means that all available CPU cores will be used for parallel computation. You can adjust this parameter based on your system’s capabilities and the size of your dataset. Default sparse type is csr.
Custom created matrix#
User can also create matrix by hand and fill it with values.
import numpy as np
from gower_metric import Gower
data = np.array([[1, 'a'], [2, 'b'], [3, 'a'], [4, 'c']], dtype=object)
feature_types = {
0: "ratio_scale_interval",
1: "categorical_nominal",
}
gower = Gower(feature_types=feature_types).fit(data)
matrix = np.zeros((data.shape[0], data.shape[0]))
for i in range(data.shape[0]):
for j in range(data.shape[0]):
matrix[i, j] = gower(data[i], data[j])
This approach allows for more flexibility, as you can define your own logic for populating the matrix based on specific criteria or conditions. It could be improved by joblib parallelization if needed.