machine learning by MR
Holidays are always good time to slow down the pace and do some reflections. So for this time, I’m trying to randomly read something, not specific for solving any on hand problem, but just enjoy the taste of fruits from other researchers. So here are some readings for mapreduce & machine learning.
Many machine learning algorithms fit Kearnsʼ Statistical Query Model:
Linear regression, k-means, Naive Bayes, SVM, EM, PCA, backprop. These can all be written (exactly) in a summation, which leads to a linear speedup in the number of processors.
Some papers on Mapreduce:
– Map0reduce for Machine Learning on Multicore
– Mapreduce: Distributed Computing for machine Learning, 2006
– Large Language Models in Machine Translation
– Fast, easy, and cheap: construction of statistical machine translation models with mapreduce.
– Parallel implementations of word alignment tool
– Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of Dependency Relations
– Pairwise document similarity in Large Collections with Mapreduce
– Aligning needles in a Haystack: Paraphrase acquisition across the web
– Google news personalization: scalable online collaborative filtering(Assign users to clusters, and assign weights to stories based on the ratings of the users in that cluster)