NumPy in Python: An essential tutorial for beginners

Mamta Singhal
DataDrivenInvestor
Published in
9 min readOct 24, 2020

--

Photo by Kelly Sikkema on Unsplash

One of the main reasons Python is being so vastly used in machine learning and AI is the fact that it has a great choice of libraries. These libraries provide the base to the data scientists and developers who don’t have to code from the scratch their projects and in fact, can concentrate more on the complex algorithms and work with large datasets.

One such library is the “NumPy” library. It is short for Numerical Python and is the fundamental package for scientific computing with Python. NumPy guarantees great execution speed comparing it with python standard libraries. It comes with a great number of built-in functions. It provides a high-performance multidimensional array object and tools for working with these arrays. These arrays are called NumPy Arrays.

In this article, I am going to show you some ways you can work with the NumPy Arrays.

What is NumPy Array?

A NumPy array is a multi-dimensional matrix of numerical data values (integers or floats). A NumPy array allows only for numerical data values. NumPy arrays are different from the lists in Python that allow arbitrary data types. It is even more restrictive than focusing only on numerical data values. It has to be of homogeneous data values as well. This means that a NumPy array contains either integer or float values, but not both at the same time.

Installation of NumPy

To start working with NumPy Arrays, we have to first install the NumPy package as it doesn’t come with basic Python by default. Use pip to install NumPy package on your command prompt.

pip install numpy

To get started, we have to first import the NumPy library to start using in our program.

import numpy as np

This statement will load all the modules and functions available in the NumPy package to the memory and we start using them. Here, we are giving NumPy, a shortened name, ‘np’, to make our code easier to read and work with. Every time we will use ‘NumPy.function’ we can write ‘np.function’

Getting Started

Let’s start working with NumPy arrays.

NumPy Arrays is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy, dimensions are called axes.

For example, the coordinates of a point in 3D space [1, 2, 1] have one axis. That axis has 3 elements in it, so we say it has a length of 3. In the example below, the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3.

[[1 , 0 , 2],
[3 , 2 , 1]]

NumPy’s array class is called ndarray. It is also known by the alias array.

Different ways of creating a NumPy array:

Create 1 D array

import numpy as np
nump_arr = np.array([1,2,3])

This will create a one dimensional from the already existing python array:

import numpy as np
py_arr = [1,2,3]
nump_arr = np.array(py_arr)

Create a 2D array:

import numpy as np
nump_arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]])

Another way to create an array in NumPy is by using zeros, ones, or empty functions:

arr_zeros = np.zeros(3)
print(arr_zeros)

arr_ones = np.ones((3,2))
print(arr_ones)

arr_empty = np.empty(5)
print(arr_empty)

arr_full = np.full((2,3),4)
print(arr_full)

The function zeros create an array full of zeros, the function ones create an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64. Function full will create an array of 2 * 3, full of 4s.

Output:
[0. 0. 0.]
[[1. 1.]
[1. 1.]
[1. 1.]]
[0. 0.25 0.5 0.75 1. ]
[[4 4 4]
[4 4 4]]

Another cool feature is the ability to create different arrays like random arrays:

np.random.rand(2,3)

It will create a 2 * 3 array of random numbers between 0 and 1

array([[0.32482924, 0.51608091, 0.21548059],
[0.6142247 , 0.2120958 , 0.39264305]])

while

np.random.rand(2,3)* 100

will create a 2 * 3 array of random numbers between 0 to 100.

Output:
array([[94.20171774, 19.99361621, 90.99119916],
[60.80955174, 48.8203693 , 84.60342955]])

You can also define the size of the array in a different way:

np.random.randint(10,size=(2,3))

It creates an array the size of 2 * 3 with random numbers between 0 and 9 where 10 here is not inclusive.

Output:
array([[5, 3, 3],
[5, 5, 5]])

To create sequences of numbers, NumPy provides the arange function which is analogous to the Python built-in range but returns an array.

np.arange(0,200,10)
Output:
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120,
130, 140, 150, 160, 170, 180, 190])

This function will return a one-dimensional array with values between 0 and 200, with 10 numbers apart. Here it is again important to note that 200 is not inclusive.

There is a function linspace if you want to specify the number of elements returned instead of the step as with arange, it becomes nearly impossible to predict the number of elements obtained when working with floating-point arguments due to the finite floating-point precision.

np.linspace(0,2,9)
Output:
array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])

Inspecting Arrays

The more important attributes of a ndarray object are:

ndarray.ndim: the number of axes (dimensions) of the array.

ndarray.shape: the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, the shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

ndarray.size: the total number of elements of the array. This is equal to the product of the elements of shape.

ndarray.dtype: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally, NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

ndarray.itemsize: the size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

ndarray.data: the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.

Now that you have created and loaded the array, you can do Inspect its size, shape, and data type through the above-mentioned attributes.

import numpy as np
a = np.arange(15).reshape(3, 5)
a
Output:
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])

a.shape
Output:
(3, 5)

a.ndim
Output:
2

a.dtype.name
Output:
'int64'

a.itemsize
Output:
8

a.size
Output:
15

type(a)
Output:
<class 'numpy.ndarray'>

b = np.array([6, 7, 8])
b
Output:
array([6, 7, 8])

type(b)
Output:
<class 'numpy.ndarray'>

b = np.array([1.2, 3.5, 5.1])
b.dtype
Output:
dtype('float64')

If you need to convert the data type you can use the array.astype(dtype) and if you need to convert a NumPy array to a Python list, there is a command for that too: array.tolist().

Indexing and Slicing

Indexing and slicing NumPy arrays works very similarly to working with Python lists:

array[0]

will return the element in the 0th index, and

array[2,4]

will return the element in index[2][4].

You can also select the first five elements, for example, by using a colon (:).

array[0:5]

will return the first five elements (index 0–4) and

array[0:5,4]

will return the first five elements in column 4 in a 2D Array.

You can use

array[:2]

to get elements from the beginning until index 2 (not including index 2) or

array[2:]

to return from the 2nd index until the end of the array.

array[:,1]

will return the elements at index 1 on all rows, again in a 2D array.

So, the indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.

Some more examples of indexing and slicing:

import numpy as np
def f(x,y):
return 10*x+y

b = np.fromfunction(f,(5,4),dtype=int)
b
Output:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
b[2,3]
Output:
23
b[0:5, 1] # each row in the second column of b
Output:
array([ 1, 11, 21, 31, 41])
b[ : ,1] # equivalent to the previous example
Output:
array([ 1, 11, 21, 31, 41])
b[1:3, : ] # each column in the 2nd and 3rd row of b
Output:
array([[10, 11, 12, 13],
[20, 21, 22, 23]])

Shaping and Sorting

To sort a NumPy Array, we use Array.sort(array, axis, kind, orderby ) function.

Example:

np.sort(array1,axis = 1, kind = 'mergesort') # sort the array along axis 1

Some more examples:

import numpy as np
a = np.array([[1,9,3],
[8,5,6],
[3,4,2],
[4,3,7]]) # Creates 2D Array
np_sorted = a[:,a[1].argsort()] # Sorts by second row
print(np_sorted)
Output:
[[9 3 1]
[5 6 8]
[4 2 3]
[3 7 4]]

Some functions used in shaping are:

a.flatten() or a.ravel()

It will flatten a 2-dimensional array to a 1-dimensional array. Here, flatten() will return a copy of the array while ravel() will return a view of the original array. I f you modify the array returned by ravel(), it may modify the entries in the original array but if you modify the entries in an array returned from flatten() this will never happen. ravel() will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.

And to reshape the array back to a new dimension, you can use

array.reshape(x,y)

It would reshape your array to the size you set with x and y.

Example:

import numpy as np
a = np.arange(1001,1241,10).reshape(6,4)
a
Output:
array([[1001, 1011, 1021, 1031],
[1041, 1051, 1061, 1071],
[1081, 1091, 1101, 1111],
[1121, 1131, 1141, 1151],
[1161, 1171, 1181, 1191],
[1201, 1211, 1221, 1231]])

It will reshape the array with 6 * 4 having values between 1001 and 1241 with a difference of 10.

Joining and Splitting

You can use np.concatenate((array1,array2),axis=0) to combine two NumPy arrays — this will add array 2 as rows to the end of array 1 while np.concatenate((array1,array2),axis=1) will add array 2 as columns to the end of array 1. np.split(array,2) will spilt the array into two sub-arrays and np.hsplit(array,5) will split the array horizontally on the 5th index.

Adding and Removing Elements

There are, of course, commands to add and remove elements from NumPy arrays:

  • np.append(array, values) will append values to the end of the array.
  • np.insert(array, 4, values) will insert values into array before index 4
  • np.delete(array, 3, axis=0) will delete row on index 3 of array
  • np.delete(array, 6, axis=1) will delete column on index 6 of array

Basic Arithmetic with NumPy Arrays

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the NumPy module:

import numpy as np
nump_arr = np.array([[1,2,3,4,5],
[6,7,8,9,10]])
nump_md_arr = np.array([[2,1,4,3,2],[2,3,4,2,2]])
nump_mul = nump_arr * 2
nump_mul
Output:
array([[ 2, 4, 6, 8, 10],
[12, 14, 16, 18, 20]])

nump_add_new = nump_arr + nump_md_arr
nump_add_new
Output:
array([[ 3, 3, 7, 7, 7],
[ 8, 10, 12, 11, 12]])

nump_sub_arr = nump_mul - nump_md_arr
Output:
array([[ 0, 3, 2, 5, 8],
[10, 11, 12, 16, 18]])

Basic arithmetic functions in Numpy:

np.add(array ,1) will add 1 to each element in the array and np.add(array1,array2) will add array 2 to array 1. The same is true to np.subtract(), np.multiply(), np.divide() and np.power() — all these commands would work in exactly the same way as described above.You can also get NumPy to return different values from the array, like:

  • np.sqrt(array) will return the square root of each element in the array
  • np.sin(array) will return the sine of each element in the array
  • np.log(array) will return the natural log of each element in the array
  • np.abs(arr) will return the absolute value of each element in the array
  • np.array_equal(arr1,arr2) will return True if the arrays have the same elements and shape

It is possible to round different values in an array: np.ceil(array) will round up to the nearest integer, np.floor(array) will round down to the nearest integer, and np.round(array) will round to the nearest integer.

You can use NumPy methods to get descriptive statistics on NumPy arrays:

  • np.mean(array,axis=0) will return mean along specific axis (0 or 1)
  • array.sum() will return the sum of the array
  • array.min() will return the minimum value of the array
  • array.max(axis=0) will return the maximum value of specific axis
  • np.var(array) will return the variance of the array
  • np.std(array,axis=1) will return the standard deviation of specific axis
  • array.corrcoef() will return the correlation coefficient of the array
  • numpy.median(array) will return the median of the array elements

And this is just the tip of the iceberg. These are just a few features and functions that I have tried to list here. NumPy is definitely a library worth exploring for budding Python programmers/learners; like me :-). It has gained huge popularity and is considered to be one of the key Python libraries to use.

Originally published at https://www.numpyninja.com on October 24, 2020.

--

--