December 27, 2012 Leave a comment Go to comments

Some thing may only happen when you have a large amount of data. I came across this LinkedIn discussion and found one  of the linked post are very interesting as it mention the advantages brought by large data set for ‘scene completion’ in computer vision.  One of the conclusion is ” for nearest neighbor type minimization problems with a non-negative distance function (meaning that the cost function has a lower bound of zero), that distance function will, on average, decrease monotonically with data or sample size.” 

It’s also interesting that as we modify/improve approaches, models and making them complicated, it may not always perform well as compare to simple models based on the “law of large numbers.”.


  1. December 27, 2012 at 11:09 AM

    not sure if you can generalize this statement..for data of not-so-big dimension, this is true..but for data with very high dimension, it might never hold..rather there could be some alternative solutions:
    1. Do some sort of feature hashing and reduce the dimension of the data without disturbing much the correlation of two different data points in the original feature space (you can have a look at this paper:
    2. Take a different route like stochastic gradient descent to solve the objective in a finite amount of time. There are papers that guide you how to parallelize SGD on multiple machines each having multiple cores and with little interaction/message passing among different machines ( This work says that if you have implemented SGD on MR framework then you essentially have to do just a single “reduce” step and still can get very close to the minima of the actual objective.
    3. Now, you can still argue that if the data is of dimension d, you are still making O(d) number of computations in every SGD “iteration”. However, this might not be desirable. Therefore, a prudent approach for solving this kind of problem is to arrange the data in particular “structure” first rather than storing the data in a flat n x d matrix. Such structured data helps reduce the number of computations SGD makes in each iteration. The relevant paper is the one here: (the paper talks about co-ordinate descent but I’m sure that the same idea can be applied to SGD as well). This essentially is a trade-off between pre-processing time and computation time for solving the objective. The more you can pre-process the data, the quicker you will get to the optimum value of the original objective.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: