Статья опубликована в рамках: Научного журнала «Студенческий» № 24(44)

Рубрика журнала: Информационные технологии

Скачать книгу(-и): скачать журнал часть 1, скачать журнал часть 2, скачать журнал часть 3, скачать журнал часть 4, скачать журнал часть 5, скачать журнал часть 6, скачать журнал часть 7

Библиографическое описание:

Akhmedyarov R., Kouros B. OVERVIEW OF MACHINE LEARNING ALGORITHMS // Студенческий: электрон. научн. журн. 2018. № 24(44). URL: https://sibac.info/journal/student/44/124920 (дата обращения: 10.08.2025).

OVERVIEW OF MACHINE LEARNING ALGORITHMS

Akhmedyarov Rustem

Master Student, Department of Computer Engineering and Telecommunications, IITU,

Republic of Kazakhstan, Almaty

Kouros Basiri

Associate professor, Department of Computer Engineering and Telecommunications, IITU,

Republic of Kazakhstan, Almaty

Abstract. In this paper, we review machine learning algorithms and their usage in everyday life. ML rapidly grew only in last decades despite it takes start from 1960. Nowadays machine learning, data science and artificial intelligence becomes more popular, because it could be useful in all spheres of people’s life.

Machine learning algorithms can be used to solve different kind of tasks such as classification and regression. Today ML algorithms are widely used in various areas of our life. Self-driving car is not something fantastic because of machine learning. Machine learning algorithms also can be useful in prediction, recommendation systems and image analysis.

Introduction

Machine learning is an exciting part of computer science and engineering. ML can extract meaningful data through discovering hidden patterns from data. Today due to ML algorithms computer has probability to learn and upgrade their efficiency. Even more exciting is that computer in some cases can found more helpful data from patterns in comparison with humans. Nowadays there are a lot of data which is using machine learning easily converts into knowledge. In virtue of development open source libraries in machine learning area the threshold entry to that computer science field are lower than is was many years before.

In this work we will review most popular and useful algorithms and we will consider cases when such algorithms was used in real life.

Machine learning algorithms are usually can be divided into two main groups like supervised and unsupervised. Supervised algorithms require a data scientist who has machine learning understanding to provide correct input. And then get desirable output. Trainer decide which features, the model should take into account and use to develop predictions.

Unsupervised algorithms hasn’t requirement like to be trained with desired outcome data. Instead, they use an iterative way called deep learning to understand data and arrive at conclusions. Unsupervised learning algorithms are usually used to solve more complicated processing issues in comparison with supervised learning systems, including image recognition, speech-to-text and natural language generation. These neural networks work by combing through millions of examples of training data and automatically finding correlations between many elements.

Supervised algorithms

The main aim of supervised algorithms is to learn from training dataset, then make prediction and give desirable output.

Supervised learning issues can be categorized as classification or regression. Classification is when the desirable output is item of defined categories such as “bird”, “cat”, “dog” and etc. Regression, in which the algorithm returns a numerical target for each instance, such as how much revenue will be generated from a new marketing campaign.

Linear regression is one of the most popular and well-known supervised algorithm. Predictive modeling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. We will borrow, reuse and steal algorithms from many different fields, including statistics and use them towards these ends.

Logistic Regression is a function that uses class for building and uses a single multinomial logistic regression model with a single estimator. Logistic regression is similar with linear regression in that the aim is to find the values for the indexes that weight each input variable. Unlike linear regression, the prediction for the output is transformed using a non-linear function called the logistic function. With logistic regression, the researcher is predicting a dichotomous outcome.

Picture 1. Decision Tree representation

Decision Tree. One of the important parts of predictive algorithms. The leaf nodes of the tree contain an output variable (y) which is our output category. Predictions are made by walking the splits of the tree until arriving at a leaf node and output the class value at that leaf node. Machine can learn and give correct output faster using Decision Tree algorithm.

Naive Bayes is algorithm used to solve classification tasks. It is uses Bayes’ theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the existence of a particular feature in a class is unrelated to the presence of any other feature. The model of this algorithm is easy to build and helpful in working with very large data sets.

K-Nearest Neighbors. It can be applied to solve both problems. But it is more popular to solve classification problems. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function. KNN can require a lot of memory or space to store all of the data, but only performs a calculation (or learn) when a prediction is needed, just in time. You can also update and curate your training instances over time to keep predictions accurate.

Unsupervised algorithms

Unsupervised Learning is a class of Machine Learning techniques to find the patterns in datasets. The data given to unsupervised algorithm are not labelled, which means only the input variables(X) are given with no corresponding output variables. In unsupervised learning, the algorithms are left to themselves to discover interesting structures in the data.

K-means clustering is a type of unsupervised learning, which is used when you have not labeled data. The goal of this algorithm is to find groups in the data, with the number of categories represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided. Data points are clustered based on feature similarity. The results of the K-means clustering algorithm are:

The centroids of the K clusters, which can be used to label new data
Labels for the training data (each data point is assigned to a single cluster)

The K-means clustering algorithm is used to find groups which have not been explicitly labeled in the data. This can be used to confirm business assumptions about what types of groups exist or to identify unknown groups in complex data sets.

Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. There are two category of hierarchical clustering: Divisive and Agglomerative. In divisive or top-down clustering method we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters. Finally, we proceed recursively on each cluster until there is one cluster for each observation. In agglomerative or bottom-up clustering method we assign each observation to its own cluster. Then, compute the similarity between each of the categories and join the two most similar clusters. Finally, repeat steps 2 and 3 until there is only a single label left. The related algorithm is shown below.

The Self-Organizing Map is one of the most popular neural network models. It belongs to the category of competitive learning networks. The Self-Organizing Map is based on unsupervised learning, which means that no human intervention is needed during the learning and that little needs to be known about the characteristics of the input data. We could, for example, use the SOM for clustering data without knowing the class memberships of the input data. The SOM can be used to detect features inherent to the problem and thus has also been called SOFM, the Self-Organizing Feature Map.

Dimensionality reduction. It is looks a lot like compression. This is about trying to reduce the complexity of the data while keeping as much of the relevant structure as possible. If you take a simple 128 x 128 x 3 pixels image (length x width x RGB value), that’s 49,152 dimensions of data. If you’re able to reduce the dimensionality of the space in which these images live without destroying too much of the meaningful content in the images, then you’ve done a good job at dimensionality reduction.

ML algorithms usage examples

Online recommendations. Machine learning allows retailers to offer you personalized based on your previous purchases or activity (Google search, Amazon, etc.).
Better customer service and delivery systems. In large companies where response time is limited by staff resources, machine learning can help ease some of the burden. Smart machines can decipher the intent and meaning behind emails and delivery notes to prioritise tasks and ensure sustained satisfaction.
Tracking price changes. The price of retail items tends to fluctuate over a certain period of time. Machine learning is helping ecommerce companies track patterns in these fluctuations and set their prices according to demand.
Voice recognition systems such as Siri and Cortana use machine learning and deep neural networks to imitate human interaction. As they progress, these apps will learn to ‘understand’ the nuances and semantics of our language.
Google Maps analyzes the speed of traffic through anonymous location data from the smartphone. Using such data Google can suggest fastest routes.
PayPal uses machine learning algorithms to detect and combat fraud. By implementing deep learning techniques, PayPal can analyze vast quantities of customer data and evaluate risk in a far more efficient manner.

References:

Osisanwo F.Y., Akinsola J.E.T., Awodele O., Hinmikaiye J. O., Olakanmi O., Akinjobi J. Supervised Machine Learning Algorithms: Classification and Comparison, 2017
Sebastian Raschka, Vahid Mirjalili. Python Machin Learning, 2015, pp 3-4
Alex S.& Vishwanathan S.V.N. Introduction to Machine Learning, 2008, pp 24-26
Taiwo, O. A., Yagang Z. Types of Machine Learning Algorithms, New Advances in Machine Learning, 2010, pp 24-25
Newsom. I. Data Analysis II: Logistic Regression, 2015, pp 1-2

OVERVIEW OF MACHINE LEARNING ALGORITHMS

Оставить комментарий