Data visualization¶

The most common usecases are presented

matplotlib : To visualize results of computations

pandas : Visualize data from files

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

Simple X[Y1, Y2,..] plots¶

Used to visualize results of computations

x = np.arange(0,10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = x**2
y1 = x**1.8

plt.plot(x,y,y1)

[<matplotlib.lines.Line2D at 0x7f88014232b0>,
 <matplotlib.lines.Line2D at 0x7f8801423470>]

Change styles

plt.plot(x,y,'r*')

[<matplotlib.lines.Line2D at 0x7f8801a884a8>]

Zoom, title, axis labels

#Viewing specific portion of graph
plt.xlim(0,4)
plt.ylim(0,10)

#Specifying labels and title
plt.title("TITLE")
plt.xlabel('X LABEL')
plt.ylabel('Y LABEL')
plt.xticks([0,1,2,3,4],['A','B','C','D','E'])

plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x7f88014630b8>]

Scatter plot to visualize features from two classes

feature = np.random.randint(0,10,(10,2))
labels = np.random.randint(0,2,10)
print(feature)
print(labels)

[[7 9]
 [6 7]
 [1 6]
 [0 3]
 [6 5]
 [5 9]
 [1 3]
 [6 1]
 [7 3]
 [4 2]]
[0 0 0 1 0 0 0 1 1 0]

Each point on the plot is (feature[sample,0],feature[sample,1])

#plt.plot(feature[:,0],feature[:,1],'*') #same, except we dont no if there is an option to include labels
plt.scatter(feature[:,0],feature[:,1],c=labels)

<matplotlib.collections.PathCollection at 0x7f88010814e0>

Heatmaps¶

Used to visualize CNN filetrs

mat = np.arange(0,100).reshape(10,10)
mat

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

plt.imshow(mat)

<matplotlib.image.AxesImage at 0x7f880195bcf8>

plt.imshow(mat,cmap='coolwarm')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7f88018ea780>

#For non sqaure matrices, we may have to change the aspect to get the right visualization
mat2 = np.random.randint(0,101,(100,5))
plt.imshow(mat2)

<matplotlib.image.AxesImage at 0x7f880126d630>

#set correct aspect
plt.imshow(mat2,aspect=0.08)

<matplotlib.image.AxesImage at 0x7f8801188b38>

Visualization using Pandas¶

Use pandas to visualize from files. You dont have to extract columns, set axis labels manually

df = pd.read_csv('files/data_csv.csv')
df

df.plot(x='Age',y='Salary',kind='scatter',title="TITLE")

<matplotlib.axes._subplots.AxesSubplot at 0x7f88015e2208>

df.plot(x='Country',y='Salary',kind='bar',title="TITLE")

<matplotlib.axes._subplots.AxesSubplot at 0x7f88015a5240>

df1 = pd.read_csv("files/pima-indians-diabetes.csv")
df1.head()

Histograms¶

df1["Age"].hist(bins=20)

<matplotlib.axes._subplots.AxesSubplot at 0x7ff695179588>

The same done using matplotlib

plt.hist(df1["Age"].values,bins=20)

(array([173., 127.,  96.,  61.,  41.,  51.,  47.,  39.,  34.,  18.,  21.,
         13.,  15.,  11.,   8.,   8.,   3.,   1.,   0.,   1.]),
 array([21., 24., 27., 30., 33., 36., 39., 42., 45., 48., 51., 54., 57.,
        60., 63., 66., 69., 72., 75., 78., 81.]),
 <a list of 20 Patch objects>)

	Country	Age	Salary	Purchased
0	France	44.0	72000.0	No
1	Spain	27.0	48000.0	Yes
2	Germany	30.0	54000.0	No
3	Spain	38.0	61000.0	No
4	Germany	40.0	NaN	Yes
5	France	35.0	58000.0	Yes
6	Spain	NaN	52000.0	No
7	France	48.0	79000.0	Yes
8	Germany	50.0	83000.0	No
9	France	37.0	67000.0	Yes

	Number_pregnant	Glucose_concentration	Blood_pressure	Triceps	Insulin	BMI	Pedigree	Age	Class	Group
0	6	0.743719	0.590164	0.353535	0.000000	0.500745	0.234415	50	1	B
1	1	0.427136	0.540984	0.292929	0.000000	0.396423	0.116567	31	0	C
2	8	0.919598	0.524590	0.000000	0.000000	0.347243	0.253629	32	1	B
3	1	0.447236	0.540984	0.232323	0.111111	0.418778	0.038002	21	0	B
4	0	0.688442	0.327869	0.353535	0.198582	0.642325	0.943638	33	1	C