DEFINITIONS
Interference – Using the trained ML model, deduce to which class a test input pertains.
THEORY

Clearly, a line between the data points for the two classes (X’s and O’s) would serve as a reasonable divider for the data points. But, what’s the equation of that line? And what does it look like in higher dimensions?

The goal is to find where to draw the thick red line above in Fig. 2. Our goal is to maximize the margin. The data points (X’s and O’s above) closest to the thin red lines are called the support vectors.
2). Polynomial (‘poly’)
3). Radial (‘rdf’)
Mathematically, we can write the SVM training equation, according to [1]:

In Eq [1] above, K is the kernel function, x is a matrix containing inputs we’d like to train, t represents targets, and the second term is added to help make the equation linearly separable in higher dimensions. We’ll use the Sklearn [2] library in python solve this equation for us. Other packages, such as cvxopt [3], would use a form similar to Eq [1], whose form is the same as the Lagrange Multiplier solutions.
IMPLEMENTATION
1. Import Libraries
First, we import the sklearn, numpy, matplotlib, and math libraries into our Python program.
from sklearn import svm import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer import math
2. Load Data
Secondly, we’ll load the breast cancer data set and also calculate the number of data points we have. We’ve got around 569 samples.
dataset = load_breast_cancer() sampleSize = dataset.data.shape[0] #sample size trainSize = math.floor(0.9*sampleSize) #90% of dataset is used for training #Thus, remaining 10% used for testing
3. Select Featuers
Next, we need to select a couple features to analyze.
#Choose fourth and fift columns as features 1 and 2, respectively #Off by one because of zero indexing feat1Index = 3 feat2Index = feat1Index + 1 feat1Name = (dataset['feature_names'][feat1Index]) feat2Name = (dataset['feature_names'][feat2Index])
4. Structure Data for SVM Input
Additionally, we’ll have three sets of variables housing our data to make the example clear. First, we’ll get all of the data, then we’ll designate about 90% of our data for training, and the rest will be reserved for testing. For analysis and plotting purposes later, we further split the data depending on whether the target is malignant or benign (XMal and XBen, respectively).
[f1, f2, y] = sliceData(dataset, 0, sampleSize, feat1Index, feat2Index) #all data X, XBen, XMal = separateFeaturesViaClasses(f1,f2,y) [f1Tr, f2Tr, yTr] = sliceData(dataset, 0, trainSize, feat1Index, feat2Index) #train data XTr, XBenTr, XMalTr = separateFeaturesViaClasses(f1Tr,f2Tr,yTr) [f1Te, f2Te, yTe] = sliceData(dataset, trainSize, sampleSize, feat1Index, feat2Index) #Test Data XTe, XBenTe, XMalTe = separateFeaturesViaClasses(f1Te,f2Te,yTe) def separateFeaturesViaClasses(f1, f2, y): # Creates and returns TWO (2) separate input features matrices - each # pertaining to one of either target classes as well # ONE (1) input features matrix pertaining to both target classes assert((len(f1) == len(f2) == len(y))) #Create scatter plot inputs for each class X = [[f1[i],f2[i]] for i in range(len(f1))] XBen = np.array([X[i] for i in range(len(f1)) if y[i] == 1]) #Class 1 - Benign XMal = np.array([X[i] for i in range(len(f1)) if y[i] == 0]) #Class 2 - Malignant return X, XBen, XMal def sliceData(dataset, start, end, feat1Index, feat2Index): #Slices features and output arrays based on indicies f1 = dataset.data[start:end,feat1Index] f2 = dataset.data[start:end,feat2Index] y = dataset.target[start:end] #same as the outcome ("Correct Answers") return f1, f2, y
5. Invoke SVM Algorithm
To have Python solve Eq. [1] for us, we’ll need to provide our training data set and correct target labels.
#Fit the input parameters to an SVM model. Assume a linear kernel #We only want to provide the training data so we'll have some #left for testing clf=svm.SVC(kernel='linear') clf.fit(XTr,yTr)
6. Analyze Results
We’ll set the accuracy to the ratio of correct test outputs divided by the total number of test attempts. We’ll see that we got 2 samples wrong out of about 60 test attempts.
#Now we perform the inferencing step and analyze accuracy results modelOutput = clf.predict(XTe) correctOutput = y[trainSize:] result = modelOutput == correctOutput #get indices for misclassified samples wrongIndices = [i for i in range(len(result)) if (result[i] == False)] xWrong = np.array(XTe)[wrongIndices] accuracy = sum(result)/len(result) accuracyStr = "Accuracy is: " + str(round(accuracy*100,2)) + "%" print(accuracyStr)
7. Plot Data
Lastly, we plot our data. We also draw the SVM decision curve by extracting the line’s slope and intercept points.
# Calculate SVM Curve for plotting w = clf.coef_[0] a = -w[0]/w[1] xx = np.linspace(650,700) TERM = (clf._intercept_[0]/w[1]) yy = a*xx + TERM plt.plot(xx,yy) #Plot Data points plt.scatter(XBenTr[:,0],XBenTr[:,1], label='Benign - Train Data', marker='o', color='blue') plt.scatter(XBenTe[:,0],XBenTe[:,1], label='Benign - Test Data', marker='o', color='orange') plt.scatter(XMalTr[:,0],XMalTr[:,1], label='Malignant - Train Data', marker='x', color='blue') plt.scatter(XMalTe[:,0],XMalTe[:,1], label='Malignant - Test Data', marker='x', color='orange') plt.scatter(xWrong[:,0],xWrong[:,1], label='Incorrect Test Outputs', marker='+',color='red') plt.legend() plt.xlabel(feat1Name) plt.ylabel(feat2Name) plt.title("Support Vector Machine Example for Cancer Cell Classification") plt.text(400, 0.22, accuracyStr,bbox=dict(facecolor='red', alpha=0.5)) plt.show()
Below, we have the plot from our work. We achieved about a 96.5% accuracy.

NEXT QUESTIONS
In production, we would optimize our accuracy further and consider the computation resources for the training and inference stages. Here are some questions to consider.
- How does varying the kernel function affect performance?
- How would the code example be modified to accommodate higher dimensions, such as three features? Would that change improve accuracy?
- What features are optimal for the above problem?
- How does the training and inference time grow with the number of features? Does this agree with theoretical estimates?
- What’s the optimal value of gamma and C, as defined in [2]?
REFERENCES
[1] – Machine Learning An Algorithmic Perspective. 2nd Edition. Stephen Marsland.
[2] – https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
[3] – https://cvxopt.org/
Let us know when you’d like to discuss how the learning in this tutorial may be applicable to the technical problem you’re trying to solve. Our fresh view of your problem may give you a different, valuable perspective to consider.