Archive for May, 2011

Summer Reading Wish List

May 20, 2011 Leave a comment

I have been expecting the reading group for a long time, and now come to realize, but unfortunately, currently out of University this summer. Anyway, it’s good to follow and make some discussions remotely.
Below is the tentative list by Prof. Bouman. I was always impressed by his passion, and his lectures are just GREAT! I still remember it vividly when I was taking one of his course on image processing.

-05/20- The list is not complete yet. I will update it and put corresponding links for each sub-topics.

# Compressed Sensing
1. D. Donoho – Compressed Sensing

2. Wright et. Al – Robust Face Recognition via Sparse Representation
# Image Processing and Vision
1. Peyman Milanfar – A Tour of Modern Image Filtering ( with the earlier version as ‘A Tour of Modern Image Processing’)
It’s video-lecture.
2. Zyg Pizlo – Binocular vision paper
3. Segmentation, state-of-the-art

# High-Dimensional/Sparse Methods
1. Al. Hero et. Al – Large Scale Correlation Screening

# Dictionary Learning
2. Elad – K-SVD paper
3. Some other dictionary learning paper – possibilities
a. Wolberg
b. Shapiro
4. Moody – Classification of Transient signals

# Physics Modeling and Inversion
1. Nicholas Zabaras – manifold learning
2. Marc De Graef – vector tomography for microscopy

# Dirichlet Processes
1. Michael Jordan’s Tutorial
Graphical Models
2. Need to find papers – two major topics:
a. Belief propagation Tutorial
b. Uses of graphical models

# Johnson Lindenstrauss Lemma
1. Identify good tutorial

# Regularized Discriminant Analysis
1. Friedman – Regularized Discriminant Analysis
2. Jiang – Eigenfeature Regularization and Extraction in Face Recognition
3. Hasib – Sparse Fisher’s Linear Discriminant Analysis



May 19, 2011 Leave a comment

It’s always good to check some videos, which I always feel it’s a much easier way to catch up the key ideas, and also enjoy a “relax”.

I found this one is particular interesting, which explains a lot to me for the “unlinear” in the manifold assumption.
-> Differences with clustering methods, such as k-means:
Clustering not according to the “cluster”, but according to certain “structures”, such as lines, surface, etc
-> The unlinear is relating to the assumed manifold ( line, surface…) to be un-linear
-> Each manifold subspace is not necessary to be the same structure, or to be have the same dim ( degree of the polynomial)
I think this is most “loose” point, with the advantage to model complex structure, but with the disadvantage of the lack of constraints. Also, to estimate the parameters of “subspace”, such as dim may not be a trivial work especially for complex data set.

Categories: Data Mining

A fun dice problem

May 19, 2011 Leave a comment

Came across a fun dice problem:

On average, how many times must a 6-sided die be rolled until all sides appear at least once?

Suppose we have a 6-side-dice, Let’s first do a simple one to get you warmed up.

On average, how many times must a 6-sided die be rolled until a 6 turns up?

let’s first compute the Prob. of get the side-6 at the nth time. Let X be the random variable representing the number of rolls until a 6 appears.
so the Prob of X = n, P(X=n)

(5/6)^(n-1) * (1/6)

The expectation of X is (integrate the above Prob — sum):

E = sum_n=1-> n=inf_( n * P(X=n) )

This will give you:

E= 6

So then come back to our original question. Let’s go through this step by step. Let’s suppose you roll the dice and get one side, then you need to roll the dice to get any one from the other left 5 sides. Since there are 5 sides, this takes, on average, it takes:

1/(5/6) = 6/5 rolls

Then keep going, we need to roll and get any one from the left 4 sides, this requires:

1/ (4/6) = 6/4 = 3/2 rolls.

Continuing this process, finally we get:

1 + (6/5) + (6/4) + (6/3) + (6/2) + (6/1) = 147/10 = 14.7

That is, on average, the expected no. of time you throw the dice to get all the 6 sides is 15.

Some good resources:

-1- Game Balance Concepts
-2- A Collection of Dice Problems Matthew M. Conroy

Categories: Data Mining

Mining — A process, and the process

May 14, 2011 Leave a comment

An nice article with abstract discussion for the practical principles of data mining. No gold data this time any more 🙂
I made a brief abstraction. I especially like the Data preparation, Patterns, Retrospection and changes parts.

# 1st Law of Data Mining – “Business Goals Law”:
Business objectives are the origin of every data mining solution
Data mining is not primarily a technology; it is a process, which has one or more business objectives at its heart.

# 2nd Law of Data Mining – “Business Knowledge Law”:
Business knowledge is central to every step of the data mining process
In summary, without business knowledge, not a single step of the data mining process can be effective; there are no “purely technical” steps.

# 3rd Law of Data Mining – “Data Preparation Law”:
Data preparation is more than half of every data mining process
Every change to the data of any sort (including cleaning, large and small transformations, and augmentation) means a change to the problem space which the analysis must explore. The reason that data preparation is important, and forms such a large proportion of data mining effort, is that the data miner is deliberately manipulating the problem space to make it easier for their analytical techniques to find a solution.

# 4th Law of Data Mining – “NFL-DM”:
The right model for a given application can only be discovered by experiment
or “There is No Free Lunch for the Data Miner”
Wolpert’s “No Free Lunch” (NFL) theorem, as applied to machine learning, states that no one bias (as embodied in an algorithm) will be better than any other when averaged across all possible problems (datasets). This is because, if we consider all possible problems, their solutions are evenly distributed, so that an algorithm (or bias) which is advantageous for one subset will be disadvantageous for another.

# 5th Law of Data Mining – “Watkins’ Law”: There are always patterns
the purpose of the data mining process is to reveal the domain rules by combining pattern-discovery technology (data mining algorithms) with the business knowledge required to interpret the results of the algorithms in terms of the domain.

# 6th Law of Data Mining – “Insight Law”:
Data mining amplifies perception in the business domain
Data mining algorithms provide a capability to detect patterns beyond normal human capabilities. The data mining process allows data miners and business experts to integrate this capability into their own problem solving and into business processes.

What, then, is “prediction” in this sense? What do classification, regression, clustering and association algorithms and their resultant models have in common? The answer lies in “scoring”, that is the application of a predictive model to a new example. The model produces a prediction, or score, which is a new piece of information about the example. The available information about the example in question has been increased, locally, on the basis of the patterns found by the algorithm and embodied in the model, that is on the basis of generalisation or induction. It is important to remember that this new information is not “data”, in the sense of a “given”; it is information only in the statistical sense.

# 8th Law of Data Mining – “Value Law”:
The value of data mining results is not determined by the accuracy or stability
of predictive models
In short, the value of a predictive model is not determined by any technical measure.

# 9th Law of Data Mining – “Law of Change”: All patterns are subject to change
Patterns are not simply regularities which exist in the world and are reflected in the data – these regularities may indeed be static in some domains. Rather, the patterns discovered by data mining are part of a perceptual process, an active process in which data mining mediates between the world as described by the data and the understanding of the observer or business expert
All patterns are subject to change because they reflect not only a changing world but also our changing understanding.

Categories: Data Mining

Model for runners

May 14, 2011 Leave a comment
Categories: Uncategorized

Daily Reading List -2011-05-11

May 12, 2011 Leave a comment

This is the collection that I read today when browsing the internet.

# About the metaclass in Python
metaclass — acts as a template for producing classes, a factory of classes
Some points need to be clarify:
a) To have a new-style class
use __metaclass__ = type
is equivalent to subclass the build-in class object
If you define your own metaclass, then all the classes that are defined in the scope of this metaclass will automatically be the subclass of this metaclass.

b) Use method resolution order to check the order of the superclass

c) The instance of the metaclass is a class. Thus, the attributes of the metaclass is only available for the generated-class (by metaclass), but not for the instance of the generated-class

d) One of the adavantage of using metaclass is that the behaviors of the class may not be directly written as code(not directly defined), but are instead created by calling functions at runtime( if it is necessary, that function will be called), with dynamic arguments.

oop – What is a metaclass in Python? – Stack Overflow
Guide to Python introspection
Metaclass programming in Python
Charming Python: Create declarative mini-languages
A Primer on Python Metaclass Programming – O’Reilly Media
Just a little Python: Stupid Metaclass and Template Tricks
Unifying types and classes in Python 2.2

Categories: Uncategorized

Py Decorator

May 10, 2011 1 comment

Recently, I was reading some articles on Python decorator. It motives me since a phone interview.  Before that, I wasn’t paying much attention to it. But later one, I found it a very good tool to do the dynamic programming.

Below are some online sources that I collect, I try to make it more or like the Q-A style:

#Some good links:
– Decorator home:
– Computing Thoughts:

#One example code for decorator used for logging information:

#Neat things to do with Python decorators – Ubuntu Forums

Q: How to show the help information defined in original function, after the function is decorated.
A: use functools.wraps() The Standard Library doc for functools (

Q: An example worthy understanding more
Notes: The function like wrap the inner original function, with additional def-functions, it can most utilize the original function or class, but most likely work when initialization.
Another thing is that we need to discriminate the “decorator” and “sub-or-super-class structure”.

Q: oop – What is a metaclass in Python? – Stack Overflow
Links-> ;

Q: Understanding Python decorators – Fantastic Answer

Q: A well introduction of decorator

Q: Is there some build-in functions can be used as decorator?
A: functools module contains some fucntions that can be used as decorators, but they aren’t built-ins.
As a side notes: we can find the list of Built-in functions at: Python glossary
Also PythonDecoratorLibrary may give some good examples for using of decorator

Q: Discussion on “a-ha this looks liek a job for decorator”
A: Good examples:Bruce Eckel on Decorators and Example : A Decorator-Based Build System

An example  that I wrote using decorator for dynamic programming.

Categories: Programming, Python