An Introduction to Machine Learning

Mamta Singhal
DataDrivenInvestor
Published in
5 min readNov 8, 2020

--

Photo by h heyerlein on Unsplash

The words, ‘Machine learning’ can sound either very exciting or very complex to a beginner. For me personally, it has changed from something complex on the surface to exciting when explored more over a period of time. It is very important to understand the meaning of machine learning in simple terms when you start learning about machine learning algorithms altogether.

Machine learning has been growing over the last two decades. You have used machine learning and have been living in it without even knowing when it was already there. Simple things like tagging your friends on Facebook photos or maybe searching a simple thing over the web, which would give results based on the ranking of the pages or your email client such as Yahoo, Gmail filtering out your spam mail in the spam folders are some classic examples of machine learning.

The science of getting computers to know what to do without being explicitly programmed is what machine learning is in very simple terms.

Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms studied in the machine learning field: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

Take an example of your email client filtering out the spam mails. So, here when you click the Spam button for an email to report spam, the email client learns better to filter out the ones that you haven’t even reported as spam. This way it learns better how to filter out spam emails. That means the task of filtering out spam is T in this example and watching you label emails as spam or not spam is the experience E and the performance measurement, P would be the fraction of the emails correctly identified. So, the performance P would improve with experience E.

In this blog, I would present you with a basic understanding of various types of machine learning algorithms. In machine learning, the tasks are broadly classified into two types of learning: Supervised learning and Unsupervised learning. In supervised learning, we teach the machines how to learn and in unsupervised learning, the machines learn by themselves. There are other types of machine learning that I will cover in one of my next blogs. But for now, let's just learn about these two.

Supervised Learning

Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output. The goal of supervised learning is to analyze the relationship between the input and output values and predict the outcome of the other input values.

Supervised learning problems are categorized into “regression” and “classification” problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example 1:

If we are given the data about the size of houses on the real estate market, we can try to predict their price. Its price will be a function of size as a continuous output, so this is a regression problem.

We could turn this example into a classification problem by instead making our output about whether the house “sells for more or less than the asking price.” Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression — Given a picture of a person, we have to predict their age based on the given picture.

(b) Classification — Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

For some other examples of regression and classification, you may probably want to visit my blogs on simple regression and decision trees, respectively.

Unsupervised Learning

Unsupervised learning allows us to approach problems without any given output. We are just given unstructured data and we can make structures within the data but we don’t necessarily know the effect of the variables on the results.

That means in Unsupervised Learning, we’re given data that looks different than data that looks like in supervised that doesn’t have any labels or that all has the same label or really no labels. So we’re given the data set and we’re not told what to do with it and we’re not told what each data point is. Instead, we’re just told, here is a data set. Can you find some structure in the data?

Let's take an example of social network analysis. So given knowledge about which friends you email the most or given your Facebook friends, we can automatically identify which are cohesive groups of friends, also which are groups of people that all know each other?

Another example can be Market segmentation. Many companies have huge databases of customer information. So, can you look at this customer data set and automatically discover market segments and automatically group your customers into different market segments so that you can automatically and more efficiently sell or market your different market segments together?

Again, this is Unsupervised Learning because we have all this customer data, but we don’t know in advance what are the market segments and for the customers in our data set, you know, we don’t know in advance who is in the market segment one, who is in market segment two, and so on. But we have to let the algorithm discover all this just from the data.

All of these are examples of clustering, which is just one type of Unsupervised Learning.

In my next blogs, I will try to cover more about the other types of machine learning.

--

--