Monthly Archives: November 2014

Testing Logistic Regression on linearly inseparable data

generate data: x=rnorm(1000,mean=0) y=rnorm(1000,mean=10) obs1=data.frame(rbind(cbind(rnorm(1000),rnorm(1000)),cbind(rnorm(1000,mean=10),rnorm(1000,mean=10))),as.factor(“Class A”)) obs2=data.frame(rbind(cbind(x,y),cbind(y,x)),as.factor(“Class B”)) colnames(obs1) = c(“x”, “y”, “class”) colnames(obs2) = c(“x”, “y”, “class”) df=rbind(obs1,obs2) make scatter plot: #scatter plot #dev.new() png(file=”scatter.png”) plot(obs1$x,obs1$y,col=colors[[1]],xlab=”x”,ylab=”y”,main=”scatter plot”) points(obs2$x,obs2$y,col=colors[[2]]) dev.off() see histograms: for(i in 1:2) { #dev.new() png(file=paste(“hist-“,names[[i]],”.png”,sep=””)) hist(obs1[,i],col=colors[[1]]) hist(obs2[,i],col=colors[[2]],add=TRUE) … Continue reading

Posted in Software | Leave a comment

Testing Logistic Regression

create training data: visualize: result: D = \frac{|\mu_1-\mu_2|}{\sigma_1+\sigma_2}: histograms of x and y: build classifier: Output: algorithm fails to converge because of perfect separation: compute AUC: where: plot AUC: result: LDA: finally you add add some noise to make logistic … Continue reading

Posted in Software | Leave a comment

Performance in R

Here is a function to compute (false positive, true positive) pair given response, ground truth, classes and threshold: This function is 100x slower than below which does the same thing: To benchmark, install rbenchmark package and use it like below:

Posted in Software | Leave a comment