Dimensionality Reduction

Dimension of a tabular dataset is the number of columns it has. If we have a csv/xlsx file with 14 columns in it, then we say this file is 14 dimensional.

At times it becomes difficult to deal with datasets which have a huge number of features/columns. The more the number of dimensions, the longer it would take for algorithms to run on the dataset & higher the time & space complexity will be. The performance of algorithms gets impacted. This problem has a popular name, "Curse of Dimensionality".

In such cases we try to shrink the dataset from its original number of columns to a lower number of columns so that it becomes easier for us to handle and make analysis.

Say we have a dataset of 30 columns. If we could somehow reduce this to a dataset of say 15 columns without actually loosing any information, that would make our life much easier. In machine learning there are certain "dimensionality reduction" techniques which do just that. However in reality reducing the number of dimensions (also called feature reduction) do cost us some loss of information. But we can still afford that if the loss is negligible. For an even more intuitive understanding on this lowering of dimensions, I recommend you watch the essence of linear algebra video series by 3Blue1Brown on YouTube.

A few of the popular method to reduce dimensions are, Principal Component Analysis (PCA), t-SNE (t-distributed Stochastic Neighbor Embedding), Feature Selection etc. We have discussed each of these methods in details in their respective articles.

Dimensionality Reduction

0 Comments

Let us remind u of the next big concept

Subscribe to never miss a new chapter

Popular Posts

What is dummy variable and how to introduce dummy variable in regression using python?

Data Science Process End to End

Symmetric distribution, skewness & kurtosis

Categories

Comments

Recent

Bottom Ad [Post Page]

Search this uni

Archive

Labels

Business

Travel everywhere!

Author Description

Post Page Advertisement [Top]

Contact Info

Contact List

Contact Form

Dimensionality Reduction

You may like these posts

0 Comments

Let us remind u of the next big concept

Subscribe to never miss a new chapter

Popular Posts

What is dummy variable and how to introduce dummy variable in regression using python?

Data Science Process End to End

Symmetric distribution, skewness & kurtosis

Categories

Comments

Recent

Bottom Ad [Post Page]

Search this uni

Archive

Labels

Business

Travel everywhere!

Author Description

Post Page Advertisement [Top]

Contact Info

Contact List

Contact Form