Week-7 (10/23) K-Means.

lemniscate

In quantitative analysis, a lemniscate is a mathematical curve that resembles a figure-eight (∞).

It denotes a distinct link between two variables that exhibits a balanced, symmetric correlation. This form denotes a complex, entangled relationship between the variables, which frequently necessitates the use of specialized approaches for proper modeling and interpretation. It’s an important idea to understand when dealing with non-linear connections and improving the accuracy of statistical models in a variety of analytical domains.

Dbscan

DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a popular unsupervised clustering algorithm in data analysis.

It classifies data points according to their density, defining clusters as locations with many close points and labeling solitary points as noise. It does not require a prior specification of the number of clusters, making it suitable for a wide range of data types and sizes. DBSCAN distinguishes between core points (in dense areas), border points (on cluster edges), and noise points. It handles irregularly shaped data effectively and is excellent for locating clusters in spatial or density-related datasets, such as identifying hotspots in geographic data.

Clustering

Clustering is a technique that groups data points based on their similarity. It aims to discover patterns, structures, or correlations in data. It helps to get insights, and trends, and simplify complex datasets and decision-making by combining related data points into clusters.

    1. K-Means: Divides data into ‘k’ clusters.
    2. Hierarchical: Organizes data into a tree-like structure.
    3. DBSCAN: Identifies clusters based on data point density.
    4. Mean Shift: Finds density peaks in the data. 

K-means

K-means is a popular clustering technique used in machine learning algorithms, and when K (the number of clusters) is set to 2 it will partition data into two distinct groups

The technique begins by selecting two starting cluster centers at random and then allocates each data point to the nearest center. It computes the (typically Euclidean) distance between each point and the cluster centers, then assigns each point to the cluster with the closest center. This method is repeated until there is little variation in awarding points to clusters.

The approach attempts to minimize within-cluster variation by bringing data points inside the same cluster as close together as possible. It optimizes by recalculating cluster centers as the mean of each cluster’s data points. Because K-means can converge to a local minimum, resulting in different results on various runs, it’s typical to run it numerous times and choose the best result.

The K=2 case is handy for binary clustering (0 & 1), such as categorizing email as spam or not spam in short Yes or No. It’s also used in market segmentation, detecting customer preferences, and any other situation where data can be divided into two different groups, making it a key tool.

with K-means K=4 is a machine learning algorithm that separates a dataset into four independent clusters. It works by assigning data points based on similarity to the nearest cluster center and then recalculating the centers as the mean of the data points in each cluster. This method is effective for categorizing data into four separate groups, which can help with customer segmentation, image compression, or any other situation where categorizing data into four relevant categories is critical for analysis and decision-making.

One Reply to “Week-7 (10/23) K-Means.”

  1. This design is incredible! You most certainly know how
    to keep a reader amused. Between your wit and your videos, I was almost moved to start my own blog (well,
    almost…HaHa!) Excellent job. I really enjoyed what you had to say, and more than that,
    how you presented it. Too cool!

    Here is my website – vandavasi.in

Leave a Reply

Your email address will not be published. Required fields are marked *