Machine Learning and Data Science have undoubtedly become one of the hottest topics in technology in the past few years. They can solve problems that nobody would ever imagine that a computer could solve before a decade or so. Many of the applications of Machine Learning include voice recognition, image recognition, malware filtering, online fraud detection, search engine result filtering, product recommendations and so on.
Machine Learning and AI are used in almost every aspect of the digital world like in e-commerce, human resource management, healthcare, cybersecurity, etc.
We all know that the Machine Learning algorithms are pretty complex and include a lot of complex mathematics, so writing them from scratch would be a tedious process.
Luckily for us, there are plenty of open-source libraries that may help us writing our first Machine Learning project with ease. I will be listing only a few of the well known and widely used libraries. Now, on to the list.
Also Read: Why Should You Learn Python Programming?
Python libraries for Machine Learning
Today we will be looking at the most widely used Python libraries that are used for Machine Learning projects. I have chosen Python as it is the most widely used open-sourced programming language.
Python is preferred by most people as it is easy to read, simple to understand, very powerful, has lots of Machine Learning libraries and has a huge and active community.
Also Read: How to Become a Certified Python Programmer
There are tons of python libraries being created every day as it is very easy for even a “non-coder” to understand and implement.
Tensorflow is an open-source library created by Google Brain. It was released in 2017 and has since then been received well by data scientists all across the world.
Tensorflow was created to conduct machine learning and deep neural networks research, but today it is used in a wide variety of other domains as well. It is a software library for numerical computation using data flow graphs.
It is no doubt that TensorFlow has quite certainly become the favourite for beginners as it is easy to use and provide multiples APIs (Application Programming Interfaces). However, we all know that each library has its pros and cons, and the same is the case with TensorFlow.
- Graph visualizations in TensorFlow are much better as compared to other libraries like Torch and Theano.
- Since it is backed by Google, Tensorflow has the advantage of frequent new updates and new features.
- As of now, TensorFlow supports only Nvidia GPUs mostly. Since Machine Learning tasks are pretty complex, we often require GPUs to train our models and although TensorFlow supports GPU it supports only Nvidia GPUs, which is a disadvantage for users of other GPUs.
- Tensorflow lags behind other libraries in terms of computational speed. It also does not support scalability beyond a single machine like Microsoft Cognitive Toolkit (CNTK).
- It is not much user friendly and hence comes the rise of other libraries like Keras which runs TensorFlow in its backend and is modular and extensible.
Keras is an open-sourced neural network library written in python. It can run on top of Tensorflow, Microsoft Cognitive Toolkit, Theano, etc. It is pretty much user-friendly and is built on top of TensorFlow, keeping TensorFlow in its backend. Tensorflow started including Keras in its core library from 2017, and since then it comes packaged together when we install TensorFlow on our computers.
We do not need to create our custom network building blocks from scratch in Keras such as layers, activation functions, optimizers, etc. They are already created and calling them in Keras is as simple as calling a function in python and mentioning the parameters.
Keras also supports convolutional neural networks and recurrent neural networks apart from standard neural networks. It supports utility layers like dropout regularisation, batch normalization, and pooling. It also allows the training on Graphical Processing Units (GPUs) and Tensor Processing Units (TPUs) along with Cuda for faster processing which takes lots of time when computed on a Central Processing Unit (CPU).
Now, let us look at a simple Keras model called the sequential model and how to implement it as shown in the official Keras documentation.
from keras.models import Sequential model = Sequential() #stacking layers from keras.layers import Dense model.add(Dense(units=64, activation='relu', input_dim=100)) model.add(Dense(units=10, activation='softmax')) #compiling model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy']) #fitting the model model.fit(x_train, y_train, epochs=5, batch_size=32) #training model.train_on_batch(x_batch, y_batch) #evaluation loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128) #predictions classes = model.predict(x_test, batch_size=128)
NumPy is an array processing package in python. It is suitable for mathematical and scientific computations on multidimensional arrays. Machine Learning requires computation on high dimensional matrices and vectors. Numpy has made it handy to execute these tasks faster than normal Python code. It is the fundamental package for scientific computing with Python.
Numpy is also useful for linear algebra, Fourier transform, and random number capabilities. The most powerful feature of Numpy is that arbitrary data types can be defined in Numpy. This feature of Numpy makes it seamless and it can integrate with a wide variety of databases. Now we look at the most common array operations in Numpy:
- Array creation – Array is created in Numpy by using numpy.array(). Hence, we can create arrays of any dimension with the help of this function.
- Fill Array with a range of values – To create an array and initialize it with evenly spaced values, we use numpy.arange().
- Addition of Arrays – To add two arrays or matrices, we can use the numpy.add() function.
- Subtraction of Arrays – To find the difference of two arrays or matrices element-wise, we can use the numpy.subtract() function.
- Multiply element-wise – To multiply two arrays or matrices element-wise, we can use the numpy.multiply() function.
- Dot product – To find the dot product of two matrices or vectors, we can use the numpy.dot() function.
Pandas is a Berkeley Software Distribution (BSD) licensed open-sourced library. It provides easy to use data structures and data analysis tools for Python. Python is not a great data analysis and modeling programming language like R.
Although it is a great language for data preparation. This is where pandas come to play. We can use pandas to carry out our entire data analysis workflow to python. We do not need to switch to other languages like R for this purpose.
Pandas have been one of the ways to make Python a first-class statistical modeling language. Although we need other tools like statsmodels and scikit-learn too. But we are on our way towards the goal of making Python a statistical modeling language. The properties of Pandas are as follows.
- Pandas is a fast and efficient DataFrame object for data manipulation.
- It has different tools for reading and writing data in a variety of formats like CSV, text files, Microsoft Excel, SQL databases, and the most important of all, HDF5 format.
- It has tools for integrated handling of missing data. Datasets have a lot of missing data in clusters and other ways as well. Pandas have tools to handle nan values and missing data.
- Pandas can easily reshape any datasets which are very efficient.
- We can insert and delete columns easily in Pandas.
- We can easily merge and join datasets as per our convenience.
- Pandas can group data for time series. Time series analysis is a very important part of Deep Learning for Recurrent Neural Networks. Pandas can group values easily based on time series.
- Pandas are highly optimized for complex operations as the critical code paths are written in cython.
Scikit-learn is a free open source machine learning library built on python. It features various techniques and algorithms required for building machine learning projects. Scikit-learn features various classification, regression and clustering algorithms for both supervised and unsupervised learning. It also includes support vector machines, random forests, k-means, etc.
Scikit-learn is designed to interoperate with the various numerical and scientific libraries such as NumPy and SciPy. It can also be interoperated with matplotlib which is a visualization library.
Scikit-learn is based on some of the already existing libraries. Hence we need to ensure that we install the following packages before we can install scikit-learn.
- matplotlib (for plotting purposes)
If we have the above-mentioned packages already installed in our system, we can install scikit-learn using pip or conda.
pip install -U scikit-learn conda install scikit-learn
Scikit-learn provides us with all sorts of algorithms that we need to build machine learning projects. We do not need to write the algorithms from scratch which is a very time-consuming task. Instead, we can focus on other aspects and try to increase the accuracy of our model. In short, Scikit-learn is the go-to package for all our machine learning needs. Here we will see a demonstration of a simple classifier built on scikit-learn.
#splitting the dataset into train and test sets from sklearn.model_selection import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X,Y, random_state=0, train_size=0.7) #Feature scaling from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) #Training from sklearn.svm import SVC classifier = SVC(random_state=0) classifier.fit(X_train, Y_train) y_pred = classifier.predict(X_test) #Evaluation from sklearn.metrics import confusion_matrix, f1_score cm = confusion_matrix(Y_test, y_pred) f1 = f1_score(Y_test, y_pred, average = 'micro') print(cm) print(f1)
To check out the entire project along with the dataset, head on to my github repository.
Here we concluded our list of “Top 5 Python Libraries for Machine Learning“. If you use other Python Libraries then please let us know in the comment section, so that we can add to our list.