wisemonkeys logo
FeedNotificationProfileManage Forms
FeedNotificationSearchSign in
wisemonkeys logo

Blogs

K-means use cases

profile
Mathew Thomas
Mar 15, 2022
0 Likes
0 Discussions
384 Reads

Let’s get started!

Clustering is the process of splitting a population or set of data points into many groups so that data points in the same group are more similar than data points in other groups. In other words, the goal is to sort groups with similar characteristics into clusters. The data points used are unlabelled and thus clustering relies on unsupervised machine learning algorithms. Assigning a data point to a cluster by analysing its features is the fundamental logic behind a Clustering Algorithm. There are various types of Clustering Algorithm of which k-means is discussed in this article.

 

Digging up the past.

James Macqueen coined the term "k-means" in 1967 as part of a work titled "Some approaches for categorization and analysis of multivariate observations." In 1957, the standard algorithm was utilised in Bell Labs as part of a pulse code modulation approach. E. W. Forgy published it in 1965, and it is commonly referred to as the Lloyd-Forgy approach.

 

K-Means?

“k” is a number. It’s a variable that represents the number of clusters that is needed. For example, k = 2 refers to two clusters. Based on the attributes provided, the algorithm assigns each data point to one of the k groups iteratively. In the reference image below, k = 2 and two clusters from the source dataset have been found.

The outputs of executing a k-means on a dataset are:

  • k centroids: centroids for each of the k clusters identified from the dataset.
  • Complete dataset labelled to ensure each data point is assigned to one of the clusters.

 

Where can you see it?

K-Means clustering algorithm works well with small number of dimensions, which is numeric and continuous. It works well when you have small scenarios with data points that are randomly distributed.

Following are some use cases of k-means algorithm:

Document Classification:

Documents are grouped into several categories based on tags, subjects, and the documents content. This is a relatively common classification problem, and k-means is an excellent technique for it. Initial document processing is required to represent each document as a vector, and term frequency is utilised to find regularly used terms that aid in document classification. The document vectors are then grouped to aid in the identification of document group commonalities.

Delivery Store Optimization:

Utilizing a combination of k-means to discover the ideal number of launch locations and a genetic algorithm to solve the truck route as a travelling salesman problem, optimise the process of good delivery using truck drones.

 

Identifying Crime Localities:

The category of crime, the area of the crime, and the relationship between the two can provide qualitative insight into crime-prone areas within a city or a locality when data relating to crimes is accessible in specific locales within a city.

 

Insurance Fraud Detection:

Machine learning plays an important role in fraud detection and has a wide range of applications in the automotive, healthcare, and insurance industries. It is possible to separate new claims based on their proximity to clusters that signal fraudulent trends using historical data on fraudulent claims. Because insurance fraud has the potential to cost a company millions of dollars, the ability to detect fraud is critical.


Comments ()


Sign in

Read Next

Caching windows

Blog banner

Deadlock and Starvation

Blog banner

Rules and Regulations of Networking: "Standards and Protocols" - Part 2

Blog banner

GIS REMOTE SENSING

Blog banner

Digital Marketing Ethics Transparency Trust And Brand Reputation digital

Blog banner

?What Children Learn Between Activities: The Hidden Learning Moments in a Preschool Day

Blog banner

DISK SCHEDULING

Blog banner

How to Prepare Your Child for Their First Day of School?

Blog banner

(Input/Output) in os

Blog banner

Deadlocks in Operating System

Blog banner

Steganography and Steganalysis

Blog banner

How to Manage Employees and Tasks in One System (Without Excel)

Blog banner

Threats To Computer System

Blog banner

ITIL Version 3 and 4 differenciation?

Blog banner

Blockchain Security Technique

Blog banner

Deadlock

Blog banner

Challenges and risks in service operations

Blog banner

Scheduling in Operating Systems

Blog banner

Network Footprinting in Cybersecurity

Blog banner

Starvation

Blog banner

DBMS and various career options related to it.

Blog banner

Human factor, a critical weak point in the information security of an organization’s IOT

Blog banner

How I use google in my daily life

Blog banner

Process Creation

Blog banner

Atlantis - The Lost Island.........

Blog banner

Odoo

Blog banner

Objectives and Functions of Operating System

Blog banner

A Tourist’s Guide To Florida’s Rodeo Culture: What To Expect At The Arcadia Championship Rodeo

Blog banner

Cyber-crime Investigation and Future Directions

Blog banner

Binary Search Tree (BST) in Data Structure

Blog banner

Nature’s Brush on Silk: The Secret Behind Patola Colours

Blog banner

How to grow followers on Instagram business account?

Blog banner

Not anti-social, but pro-solitude

Blog banner

OPERATING SYSTEM OBJECTIVES AND FAULT TOLERENCE.

Blog banner

Data Security must be your Priority!

Blog banner

Disk scheduling

Blog banner

CYBER SECURITY CHALLENGES

Blog banner

Modern operating system

Blog banner

Cache memory

Blog banner

The Procedural Framework for Corporate High-Tech Investigations

Blog banner

MEMORY MANAGEMENT (techniques)

Blog banner

The Future of Patola Weaving in a Sustainable Fashion World

Blog banner