| Title: | Fuzzy Unsupervised and Semi-Supervised Clustering |
| Version: | 0.1.0 |
| Description: | Methods for distance-based fuzzy unsupervised and semi-supervised clustering, including fuzzy and possibilistic models based on alternating optimization (AO) algorithm. The package introduces a vectorized estimation framework for prototype-based fuzzy clustering algorithms, enabling modular algorithm design and extensibility. It also supports storage and retrieval of intermediate AO optimization results for downstream analysis and processing. For more details see Kmita et al. (2024) <doi:10.1109/TFUZZ.2024.3370768>. |
| License: | MIT + file LICENSE |
| LazyData: | true |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.1.0) |
| Imports: | rdist |
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-05-29 11:52:20 UTC; user |
| Author: | Kamil Kmita |
| Maintainer: | Kamil Kmita <kamil.kmita17@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-02 08:30:02 UTC |
Fuzzy C-Means clustering model
Description
Fits a Fuzzy C-Means (FCM) clustering model using the Alternating Optimization algorithm.
Usage
FCM(
X,
C,
U = NULL,
max_iter = 200,
conv_criterion = 1e-04,
function_dist = rdist::cdist,
store_history = FALSE
)
Arguments
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
Value
An object of class fcm containing:
- U
An
N \times Cmembership matrix.- V
A
C \times pmatrix of cluster prototypes.- function_dist
The distance function used by the model.
- counter
Number of iterations performed until convergence.
- U_history
If
store_history = TRUE, a list of lengthcountercontaining membership matrices estimated at each iteration; otherwiseNULL.- V_history
If
store_history = TRUE, a list of lengthcountercontaining prototype matrices estimated at each iteration; otherwiseNULL.- Phi_history
If
store_history = TRUE, a list of lengthcountercontaining phi-weight matrices estimated at each iteration; otherwiseNULL.
References
Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Springer US. https://doi.org/10.1007/978-1-4757-0450-1
Examples
X <- matrix(rnorm(100), ncol = 2)
model_fcm <- fussclust::FCM(
X = X,
C = 2
)
print(model_fcm$V)
Possibilistic C-Means clustering model
Description
Fits a Possibilistic C-Means (PCM) clustering model using the Alternating Optimization algorithm.
Usage
PCM(
X,
C,
U = NULL,
gammas = NULL,
initFCM = NULL,
max_iter = 200,
conv_criterion = 1e-04,
function_dist = rdist::cdist,
store_history = FALSE
)
Arguments
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
gammas |
Optional vector of cluster-specific gamma hyperparameters.
If If |
initFCM |
Optional fitted Fuzzy C-Means model used to initialize
cluster-specific gamma hyperparameters via weighted averaging.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
Value
An object of class pcm containing:
- U
An
N \times Cmembership matrix.- V
A
C \times pmatrix of cluster prototypes.- function_dist
The distance function used by the model.
- counter
Number of iterations performed until convergence.
- gammas
Vector of cluster-specific gamma hyperparameters.
- U_history
If
store_history = TRUE, a list of lengthcountercontaining membership matrices estimated at each iteration; otherwiseNULL.- V_history
If
store_history = TRUE, a list of lengthcountercontaining prototype matrices estimated at each iteration; otherwiseNULL.- Phi_history
If
store_history = TRUE, a list of lengthcountercontaining phi-weight matrices estimated at each iteration; otherwiseNULL.
References
Krishnapuram, R., & Keller, J. (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems, 1(2), 98–110. https://doi.org/10.1109/91.227387
Examples
X <- matrix(rnorm(100), ncol = 2)
model_pcm <- fussclust::PCM(
X = X,
C = 2,
initFCM = TRUE
)
print(model_pcm$V)
Semi-Supervised Fuzzy C-Means clustering model
Description
Fits a Semi-Supervised Fuzzy C-Means (SSFCM) clustering model using the Alternating Optimization algorithm.
Usage
SSFCM(
X,
C,
U = NULL,
max_iter = 200,
conv_criterion = 1e-04,
function_dist = rdist::cdist,
store_history = FALSE,
alpha = NULL,
superF = NULL
)
Arguments
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
alpha |
Positive scaling factor regulating the impact of partial supervision. |
superF |
Binary supervision matrix of the same dimensions as |
Value
An object of class sspcm containing:
- U
An
N \times Cmemberships matrix.- V
A
C \times pmatrix of cluster prototypes.- function_dist
The distance function used by the model.
- counter
Number of iterations performed until convergence.
- alpha
Value of scaling factor.
- U_history
If
store_history = TRUE, a list of lengthcountercontaining membership matrices estimated at each iteration; otherwiseNULL.- V_history
If
store_history = TRUE, a list of lengthcountercontaining prototype matrices estimated at each iteration; otherwiseNULL.- Phi_history
If
store_history = TRUE, a list of lengthcountercontaining phi-weight matrices estimated at each iteration; otherwiseNULL.
References
Kmita, K., Kaczmarek-Majer, K., & Hryniewicz, O. (2024). Explainable Impact of Partial Supervision in Semi-Supervised Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 1–10. https://doi.org/10.1109/TFUZZ.2024.3370768
Examples
X <- matrix(rnorm(100), ncol = 2)
superF <- matrix(0, nrow = nrow(X), ncol = 2)
superF[1:10, 1] <- 1
superF[11:20, 2] <- 1
model_ssfcm <- SSFCM(
X = X,
C = 2,
superF = superF,
alpha = 1
)
print(model_ssfcm$V)
Semi-Supervised Possibilistic C-Means clustering model
Description
Fits a Semi-Supervised Possibilistic C-Means (SSPCM) clustering model using the Alternating Optimization algorithm.
Usage
SSPCM(
X,
C,
U = NULL,
gammas = NULL,
initFCM = NULL,
max_iter = 200,
conv_criterion = 1e-04,
function_dist = rdist::cdist,
store_history = FALSE,
alpha = NULL,
superF = NULL
)
Arguments
X |
A numeric feature matrix. |
C |
Integer specifying the number of clusters. |
U |
Optional initial membership matrix.
Primarily intended for reproducibility purposes.
If |
gammas |
Optional vector of cluster-specific gamma hyperparameters.
If If |
initFCM |
Optional fitted Fuzzy C-Means model used to initialize
cluster-specific gamma hyperparameters via weighted averaging.
If |
max_iter |
Maximum number of iterations.
Defaults to |
conv_criterion |
Convergence threshold used at the end of each iteration of the Alternating Optimization algorithm. |
function_dist |
Optional distance function.
The function must accept two matrices, For the Euclidean distance, the returned distances should not be squared.
Defaults to |
store_history |
Logical indicating whether optimization
histories should be stored. If |
alpha |
Positive scaling factor regulating the impact of partial supervision. |
superF |
Binary supervision matrix of the same dimensions as |
Value
An object of class sspcm containing:
- U
An
N \times Ctypicalities matrix.- V
A
C \times pmatrix of cluster prototypes.- function_dist
The distance function used by the model.
- counter
Number of iterations performed until convergence.
- gammas
Vector of cluster-specific gamma hyperparameters.
- alpha
Value of scaling factor.
- U_history
If
store_history = TRUE, a list of lengthcountercontaining membership matrices estimated at each iteration; otherwiseNULL.- V_history
If
store_history = TRUE, a list of lengthcountercontaining prototype matrices estimated at each iteration; otherwiseNULL.- Phi_history
If
store_history = TRUE, a list of lengthcountercontaining phi-weight matrices estimated at each iteration; otherwiseNULL.
References
Kmita, K., Kaczmarek-Majer, K., & Hryniewicz, O. (2024). Explainable Impact of Partial Supervision in Semi-Supervised Fuzzy Clustering. IEEE Transactions on Fuzzy Systems, 1–10. https://doi.org/10.1109/TFUZZ.2024.3370768
Examples
X <- matrix(rnorm(100), ncol = 2)
superF <- matrix(0, nrow = nrow(X), ncol = 2)
superF[1:10, 1] <- 1
superF[11:20, 2] <- 1
model_sspcm <- SSPCM(
X = X,
C = 2,
superF = superF,
alpha = 1
)
print(model_sspcm$V)
Initialization matrix to analyze underimpact in iris data.
Description
This dataset provides a concrete initialization of membership matrix specific to the iris data that exhibits the phenomenon of underimpact of partial supervision in semi-supervised fuzzy clustering.
Usage
data(U_underimpact)
Format
A matrix of size 150 x 3.
Calculates data evidence matrix E from distances matrix D.
Description
Calculates data evidence matrix E from distances matrix D.
Usage
calculate_evidence(D)
Arguments
D |
Distances matrix of size N x c. |
Value
Matrix of size N x c.
Creates DHE (stands for "distances horizontally exploded") and DVE (stands for "distances vertically exploded") matrices.
Description
Creates DHE (stands for "distances horizontally exploded") and DVE (stands for "distances vertically exploded") matrices.
Usage
dheve(A, vertical)
Arguments
A |
Matrix of size N x c. |
vertical |
Boolean switch.
If |
Value
Matrix of size Nc x c
Estimated T matrix with typicalities in unsupervised case.
Description
Estimated T matrix with typicalities in unsupervised case.
Usage
estimate_T(D, gammas)
Arguments
D |
Distances matrix of size N x c. |
gammas |
a c-vector of cluster-specific gamma hyperparameters. |
Estimated U matrix with memberships in semi-supervised case.
Description
Estimated U matrix with memberships in semi-supervised case.
Usage
estimate_U(D, superF, alpha)
Arguments
D |
Distances matrix of size N x c. |
superF |
Binary supervision matrix of size N x c. |
alpha |
Scaling factor, a floating point > 0 regulating the impact of partial supervision. |
Equation to calculate clusters' prototypes matrix \hat{V}.
Description
Equation to calculate clusters' prototypes matrix \hat{V}.
Usage
estimate_V(Phi, X)
Arguments
Phi |
Matrix with weights of size N x c. |
X |
Matrix with predictors of size N x p. |
Value
Clusters' prototypes matrix of size c x p.
Estimated T matrix with typicalities in semi-supervised case.
Description
Estimated T matrix with typicalities in semi-supervised case.
Usage
estimate_super_T(D, superF, alpha, gammas, b = 1)
Arguments
D |
Distances matrix of size N x c. |
superF |
Binary supervision matrix of size N x c. |
alpha |
Scaling factor, a floating point > 0 regulating the impact of partial supervision. |
gammas |
a c-vector of cluster-specific gamma hyperparameters. |
b |
a scalar weighting the contribution of possibilistic membership in SPFCM (semi-supervised possibilistic fuzzy c-means) model. It is set to 1 by default for other semi-supervised models. |
Aggregates elements of DHE and DVE matrices in a step to build evidence matrix E.
Description
Aggregates elements of DHE and DVE matrices in a step to build evidence matrix E.
Usage
gamma_fcm(dhe, dve)
Arguments
dhe |
DHE matrix of size Nc x c. |
dve |
DVE matrix of size Nc x c. |
Value
Matrix of size Nc x 1.
Initialization procedure to calculate values of gamma hyperparameters.
Description
Initialization procedure to calculate values of gamma hyperparameters.
Usage
init_gamma(.model, .X)
Arguments
.model |
estimated model of class |
.X |
features matrix of size N x c |
Predict method for ssfcm objects
Description
Predicts cluster memberships for new observations using a fitted Semi-Supervised Fuzzy C-Means model.
Usage
## S3 method for class 'ssfcm'
predict(object, X, ...)
Arguments
object |
An object of class |
X |
A numeric matrix of new observations with |
... |
Additional arguments. Currently ignored. |
Value
A matrix of size N \times C containing predicted
cluster memberships, where C is the number of clusters.
Examples
X <- matrix(rnorm(100), ncol = 2)
superF <- matrix(0, nrow = nrow(X), ncol = 2)
superF[1:10, 1] <- 1
superF[11:20, 2] <- 1
model_ssfcm <- SSFCM(
X = X,
C = 2,
superF = superF,
alpha = 1
)
predict(model_ssfcm, matrix(rnorm(2), ncol = 2))
Predict method for sspcm objects
Description
Predicts cluster memberships for new observations using a fitted Semi-Supervised Possibilistic C-Means model.
Usage
## S3 method for class 'sspcm'
predict(object, X, ...)
Arguments
object |
An object of class |
X |
A numeric matrix of new observations with |
... |
Additional arguments. Currently ignored. |
Value
A matrix of size N \times C containing predicted
cluster memberships, where C is the number of clusters.
Examples
X <- matrix(rnorm(100), ncol = 2)
superF <- matrix(0, nrow = nrow(X), ncol = 2)
superF[1:10, 1] <- 1
superF[11:20, 2] <- 1
model_sspcm <- SSPCM(
X = X,
C = 2,
superF = superF,
initFCM = TRUE,
alpha = 1
)
predict(model_sspcm, matrix(rnorm(2), ncol = 2))
Binary supervision structure to reconstruct the issue of underimpact of partial supervision.
Description
This dataset provides a concrete superivison structure: - 'superF' matrix of size 150 x 3 with partial supervision, - 'ind' vector with indices of unsupervised observations, - 'tind' vector with indicies of observations selected to be in the test dataset, - 'tclass' vector with class membership of the observations selected to be in the test dataset.
This supervision structure is meant to reproduce a particular realization of phenomenon of underimpact of partial supervision specific to the iris dataset.
Usage
data(superFstruct_underimpact)
Format
A list with: a matrix of size 150 x 3, and three vectors.
Rearranges elements of input matrix from a block matrix with vertical blocks (column vectors) to a block matrix with horizontal blocks (row vectors).
Description
Rearranges elements of input matrix from a block matrix with vertical blocks (column vectors) to a block matrix with horizontal blocks (row vectors).
Usage
xi_fcm(A, c)
Arguments
A |
Matrix of size Nc x 1. |
c |
Number of columns in the wanted matrix. Associated with the number of clusters. |
Value
Matrix of size N x c.