gradient descent using python and numpy -
def gradient(x_norm,y,theta,alpha,m,n,num_it): temp=np.array(np.zeros_like(theta,float)) in range(0,num_it): h=np.dot(x_norm,theta) #temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*x_norm[:,j][np.newaxis,:] ) ) temp[0]=theta[0]-(alpha/m)*(np.sum(h-y)) temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*x_norm[:,1])) theta=temp return theta x_norm,mean,std=featurescale(x) #length of x (number of rows) m=len(x) x_norm=np.array([np.ones(m),x_norm]) n,m=np.shape(x_norm) num_it=1500 alpha=0.01 theta=np.zeros(n,float)[:,np.newaxis] x_norm=x_norm.transpose() theta=gradient(x_norm,y,theta,alpha,m,n,num_it) print theta
my theta above code 100.2 100.2
, should 100.2 61.09
in matlab correct.
i think code bit complicated , needs more structure, because otherwise you'll lost in equations , operations. in end regression boils down 4 operations:
- calculate hypothesis h = x * theta
- calculate loss = h - y , maybe squared cost (loss^2)/2m
- calculate gradient = x' * loss / m
- update parameters theta = theta - alpha * gradient
in case, guess have confused m
n
. here m
denotes number of examples in training set, not number of features.
let's have @ variation of code:
import numpy np import random # m denotes number of examples here, not number of features def gradientdescent(x, y, theta, alpha, m, numiterations): xtrans = x.transpose() in range(0, numiterations): hypothesis = np.dot(x, theta) loss = hypothesis - y # avg cost per example (the 2 in 2*m doesn't matter here. # consistent gradient, include it) cost = np.sum(loss ** 2) / (2 * m) print("iteration %d | cost: %f" % (i, cost)) # avg gradient per example gradient = np.dot(xtrans, loss) / m # update theta = theta - alpha * gradient return theta def gendata(numpoints, bias, variance): x = np.zeros(shape=(numpoints, 2)) y = np.zeros(shape=numpoints) # straight line in range(0, numpoints): # bias feature x[i][0] = 1 x[i][1] = # our target variable y[i] = (i + bias) + random.uniform(0, 1) * variance return x, y # gen 100 points bias of 25 , 10 variance bit of noise x, y = gendata(100, 25, 10) m, n = np.shape(x) numiterations= 100000 alpha = 0.0005 theta = np.ones(n) theta = gradientdescent(x, y, theta, alpha, m, numiterations) print(theta)
at first create small random dataset should this:
as can see added generated regression line , formula calculated excel.
you need take care intuition of regression using gradient descent. complete batch pass on data x, need reduce m-losses of every example single weight update. in case, average of sum on gradients, division m
.
the next thing need take care track convergence , adjust learning rate. matter should track cost every iteration, maybe plot it.
if run example, theta returned this:
iteration 99997 | cost: 47883.706462 iteration 99998 | cost: 47883.706462 iteration 99999 | cost: 47883.706462 [ 29.25567368 1.01108458]
which quite close equation calculated excel (y = x + 30). note passed bias first column, first theta value denotes bias weight.
Comments
Post a Comment