## Interesting evolution

A reading list for advanced computer vision in 2010， interestingly, if you compare with the list in the previous years, such as 2003 and 2004, etc.

### 2003:

- Texture synthesis
- Image completion
- Separating style from content
- Semantics of words and pictures
- Multiscale Stochastic Modeling and Estimation
- Space-Time Stereo
- Subspace methods for rigid and non-rigid motions
- Animating human motion
- Fast detection and matching
- Rendering and reconstruction under complex BRDFs
- Kernel methods
- Classification by multiple decision trees

### 2006:

Estimating scene geometry and discovering objects using a soup of segments.

Level set segmentation

Video Visualization

Recognition

Combining Detection, Recognition and Segmentation

Video Object Segmentation

Background cut

Hashing, kNN in High Dimensions

Recent Progress in Optical Flow Computation

Learning Optical Flow

Matting

Illumination

Optimization

### 2007–List

From Local to Global Visual Similarity in Space and in Time.

Fast Image Search.Sound and motion, in harmony.

Standard Brain Model for Vision.

Multiclass SVM and Applications.

CRF/DRF and Application to Human Pose.

Direct visibility of point sets.

Color Image Understanding.

Globally Optimal Estimates for Geometric Reconstruction Problems.

Image Parsing.

Integral Shape Matching, Inner Distance, Diffusion Distance.

Motion Blur.

### 2009-list

Human visionSequence to sequence alignment

Lightfield and natural image matting

Visibility constraints on features of 3D objects

Image and video descriptors

**Image Descriptors**

- Class reading assignmnent: [SIFT] Lowe, D.G. Distinctive image features from scale-invariant keypoints. IJCV, 2004.

- [GIST] Oliva, A., Torralba, A. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 2001.

- [Shape Context] Belongie S., Malik J., Puzicha J. Shape Matching and Object Recognition Using Shape Contexts. PAMI, 2002.

- [Geometric Blur] Berg A. C., Malik J. Geometric Blur for Template Matching. CVPR, 2001.

- [Local Self-Similarity] Shechtman E., Irani M. Matching Local Self-Similarities across Images and Videos. CVPR, 2007.

- [SURF] Bay H., Ess A., Tuytelaars T., Van Gool L. SURF: Speeded Up Robust Features. CVIU, 2008.

- [LBP] Heikkila M., Pietikainen M., Schmid C. Description of interest regions with local binary patterns. Pattern Recognition, 2009.

**Video Descriptors**

- [Space-time corners]

Laptev I. On Space-Time Interest Points. IJCV, 2005.

Laptev I., Lindeberg T. Space-time Interest Points. ICCV, 2003.

**Survey / comparison papers for different applications (recognition / matching)**

- Zhang J., Marszalek M.,Lazebnik S., Schmid C. Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV, 2007.

- Mikolajczyk K., Schmid C. A performance evaluation of local descriptors. PAMI, 2005.

- Horster E., Greif T., Lienhart R., Slaney M. Comparing local feature descriptors in pLSA-based image models. DAGM, 2008.

Efficient search in large image databases

- Torralba A., Fergus R., Freeman W. T. 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI, 2008.

- Torralba A., Fergus R, Weiss Y. Small codes and large databases for recognition. CVPR, 2008.

- Weiss Y., Torralba A., Fergus R. Spectral Hashing. NIPS, 2008.

- Nister D, Stewenius H. Scalable recognition with a vocabulary tree. CVPR, 2006.

- Kumar N., Belhumeur P. N., Nayar S. K. FaceTracer: A Search Engine for Large Collections of Images with Faces. ECCV, 2008.

Exploiting wealth of huge image libraries

- Hays J., Efros A. Scene Completion Using Millions of Photographs. ACM Transactions on Graphics. SIGGRAPH, 2007.

- Simon, I., Seitz, S. M. Scene Segmentation Using the Wisdom of Crowds. ECCV, 2008.

- Bitouk D., Kumar N., Dhillon S., Belhumeur P. N., Nayar S. K. Face Swapping: Automatically Replacing Faces in Photographs. ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH), 2008.

- Hays J., Efros A. IM2GPS: estimating geographic information from a single image. CVPR, 2008.

- Agarwal S., Snavely N., Simon I., Seitz S.M. and Szeliski R. Building Rome in a Day. ICCV, 2009.

Dictionaries for sparse representation modeling

- Aharon M., Elad M., Bruckstein A. M. The K-SVD: an algorithm for designing of overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process, 2006.

- Bruckstein A. M., Donoho D. L., Elad M. From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images. SIAM review, 2009.

- Aharon M., Elad M., Bruckstein A. M. On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them. Linear Algebra and its Applications, 2006.

- Rubinstein R., Bruckstein A. M. Dictionaries for Sparse Representation Modeling. to appear in the IEEE Proceedings – Special Issue on Applications of Compressive Sensing & Sparse Representation.

Statistics of natural images

- Zhu S. C., Mumford D. Prior learning and Gibbs reaction-diffusion. PAMI, 1997.

- Roth S., Black M. J.

Conference version: Fields of experts: A framework for learning image priors. CVPR, 2005.

Journal version: Fields of experts. IJCV, 2009.

- Weiss Y., Freeman W. T. What makes a good model of natural images?. CVPR, 2007.

Blind deconvolution

Action recognition

Graph cuts

- Kolmogorov, V. and Zabih, R. What Energy Functions can be Minimized via Graph Cuts?. PAMI, 2004.

- Boykov Y., Veksler O., Zabih R. Fast Approximate Energy Minimization via Graph Cuts. ICCV, 1999.

- Kolmogorov V., Rother C. Minimizing non-submodular functions with graph cuts – a review. PAMI, 2007.

- Kohli P., Ladicky L., Torr P. Robust Higher Order Potentials for Enforcing Label Consistency. IJCV, 2009.

- Freedman D., Turek M. Graph cuts with many-pixel interactions: theory and applications to shape modeling. Image and Vision Computing, 2010.

### 2010 — list

**1. Denoising**

**Image denoising using scale mixtures of gaussians in the wavelet domain**, J. Portilla, V. Strela, M. Wainwright, E. Simoncelli. In IEEE Trans. Image Processing, 2003.**A review of image de-noising methods, with a new one.**, A. Buades, B. Coll, J. Morel. In SIAM Journal on Multiscale Modeling and Simulation, 2005.**Image denoising by sparse 3-D transform-domain collaborative ﬁltering.**, K. Dabov, A. Foi, V. Katkovnik, K. Egiazarian. In IEEE Trans. Image Processing, 2007.**A Tour of Modern Image Processing.**, P. Milanfar. Invited feature article in review IEEE Signal Processing Magazine, 2010.**Image Quality Assessment: From Error Visibility to Structural Similarity.**, W. Zhou, A.C. Bovik, H.R. Sheikh, E.P Simoncelli. In IEEE Trans. Image Processing, 2004

**2. Compressed Sensing**

**An Introduction To Compressive Sampling**, E.J. Candes, M.B. Wakin. In IEEE Signal Processing Magazine, March 2008.**Learning compressed sensing**, Y. Weiss, H.S. Chang, W.T. Freeman. In Snowbird Learning Workshop, 2007.**Stable signal recovery from incomplete and inaccurate measurements**, E. Candesy, J. Romberg, T. Tao. In Comm. Pure Appl. Math., August 2006.**Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging**, M. Lustig, D. Donoho, J.M. Pauly. In Magnetic Resonance in Medicine, 2007.**Single-pixel imaging via compressive sampling**, M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, R. Baraniuk. In IEEE Signal Processing Magazine, March 2008.

**3. Super-Resolution (in images)**

**Improving resolution by image registration**, M. Irani, S. Peleg. In CVGIP: Graphical Models and Image Processing, 1991.**Limits on super-resolution and how to break them**, S. Baker, T. Kanade. In PAMI, 2002.**Fundamental Limits of Reconstruction-Based Superresolution Algorithms under Local Translation**, Z. Lin, H. Shum. In PAMI, 2004.**Example based super-resolution**, W.T. Freeman, T.R. Jones, E.C. Pasztor. In Comp. Graph. Appl., 2002.**Super-resolution from a single image**, D. Glasner, S. Bagon, M. Irani. In ICCV 2009.

**4. Shape from Illumination**

**Shape from Shading: A Survey**, R. Zhang, P. Tsai, J.E. Cryer, M. Shah. In PAMI 1999.**Optimal Algorithm for Shape from Shading and Path Planning**, R. Kimmel, J.A. Sethian. In Journal of Mathematical Imaging and Vision, 2001.**Efficient Belief Propagation for Vision Using Linear Constraint Nodes**, B. Potetz. In CVPR 2007.**Shape from shading using graph cuts**, J.Y. Chang, K.M. Lee, S.U. Lee. In Journal of Pattern Recognition, 2008.

**5. Deep Learning**

**Learning multiple layers of representation**, G.E. Hinton. In TRENDS in Cognitive Sciences, 2007.**Reducing the dimensionality of data with neural networks**, G.E. Hinton, R.R. Salakhutdinov. In Science, July 2006.**Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations**, H. Lee, R. Grosse, R. Ranganath, A.Y. Ng. In ICML, 2009.**What is the Best Multi-Stage Architecture for Object Recognition?**, K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun. In ICCV, 2009.**Exploring Strategies for Training Deep Neural Networks**, H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin. In Journal of Machine Learning Research, 2009.

**6. Random forests**

**Semantic Texton Forests for Image Categorization and Segmentation**, J. Shotton, M. Johnson, R. Cipolla. In CVPR 2008.**Object Class Segmentation using Random Forests**, F. Schroff, A. Criminisi, A. Zisserman. In BMVC 2008.**Randomized Trees for Real-Time Keypoint Recognition**, V. Lepetit, P. Lagger, P. Fua. In CVPR 2005.**Regression forests for efficient anatomy detection and localization in CT studies**, A. Criminisi, J. Shotton, D. Robertson, E. Konukoglu. In MICCAI MCV 2010.**Fast discriminative visual codebooks using randomized clustering forests**, F. Moosmann, B. Triggs, F. Jurie. In NIPS 2006.**MIForests: multiple-instance learning with randomized trees**, C. Leistner, A. Saffari, H. Bischof. In ECCV 2010.

**7. Pascal Grand Challenge**

**Image Classification Using Super-Vector Coding of Local Image Descriptors**, X. Zhou, K. Yu, T. Zhang, T.S. Huang. In ECCV 2010.**Object Detection with Discriminatively Trained Part-Based Models**, P.F. Felzenszwalb, R.B. Girshick, D. McAllester, D. Ramanan. In PAMI 2010.**Unbiased Look at Dataset Bias**, A. Torralba, A. Efros. In CVPR 2011.**Multiple Kernels for Object Detection**, A. Vedaldi, V. Gulshan, M. Varma, A. Zisserman. In ICCV 2009.**The PASCAL Visual Object Classes (VOC) Challenge**, M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman. In IJCV 2010.

## Notes on cvpr 12

Well coming back from this year’s CVPR, haven’t got chance to write down something. I guess I’ll just take this weekend to do some short notes for some interesting papers.

* Tutorial

There are several nice tutorials this year. More people than expected are interested in ‘deep learning’, thus they have to switch to another bigger room. I like the Graph-cut slides, which unfortunately I didn’t get chance to attend. Since mobile application is a trending topic recently, there is also one opencv for mobile application short tutorial. They also gave some example codes/project as well as pre-compiled opencv for both ios and Android. Qualcomm also gave a lunch talk on their FastCV package which is specifically for mobile computer vision on Tuesday (6/19).

* The Open Source Award goes to “FREAK: Fast Retina Keypoint by Alexandre Alahi, Alexandre; Raphael Ortiz, Pierre Vandergheynst.”, In their abstract, it says:

the deployment of vision algorithms on smart phones and embedded devices with low memory and computation complexity has even upped the ante: the goal is to make descriptors faster to compute, more compact while remaining robust to scale, rotation and noise. To best address the current requirements, we propose a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Keypoint (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern.

The source code is here. Detailed information can be found at <http://www.ivpe.com/freak.htm>

to be continued…. 🙂

## Random Forest in Python

milk is the machine learning package written in python. It also comes with a complimentary data set called milksets which includes several U.C.I machine learning dataset.

from milksets import wine

features,labels = wine.load()

features will be a 2d-numpy.ndarray of features (noSample * noFeatureDim) and labels will be a 1d-numpy.ndarray of labels starting at 0 through N-1 (independently of how the labels were coded in the original data).

Below is an example using milk -random forest to predict the labels for the wine data. Three classes, feature is a (178L, 13L) np-matrix. Sample with maker ‘0’ is the correct predictions, with maker ‘x’ is the incorrect prediction. It takes some time to do the prediction, the cross-validation accuracy = 0.943820224719.

## Mean Shift Segmentation

There are actually two steps in Mean-shift image segmentation: mean-shift filtering and then some merging and eliminating for segmentation. here’s a paper well states the process. I found it quite clear and easy to understand. Below are some notes on Mean Shift Segmentation.

1. Based on non-parametric density estimation, no assumptions about probability distributions, and no restriction on the spatial window size (which is different from bi-lateral filtering)

2. Spatial-range joint domain (x, y, f(x, y)), spatial domain refers to image spatial coordinates, while range domain refers to image dimension, such as gray image (1), rgb color image (3), etc

3. Finds the maximm in the (x, y, f) space, clusters close in both space and range correspond to classes.

4. The 3 parameters (such in EDISON) are :

sigmaS: — normalization parameter for spatial

sigmaR — normalization parameter for range domain

minRegionSize — minimum size (lower bound) that a ‘region’ is declared as a class

To understand the two normalization parameter, sigmaS and sigmaR, think about the window size in the kernel function in the kernel density estimation. It controls the ‘range’ or say smoothness of the kernel, or how fast the kernel decays. Larger the normalization parameter is, it is smoother in the corresponding space( either spatial or range), or decays slower. A ZERO value corresponds to a delta function, which only concentrate on the center, i.e. the filtering output will be the same as the input, all details (each pixel) are remained. Larger sigmaS smoothes the spatial, while larger sigmaR smoothes the range (color domain). And from the results I obtain, singmaS is much more sensitive as compare to sigmaR.

The most well known open source for mean-shift, which is also very fast, is EDISON. If you want to use it in Matlab, there are also some wrappers.

Left: Original, Middle (4, 4, 5), Right: (10, 4, 5)

## Select and Crop Region of the Figure in Python

I wanted to be able to select and crop some region of the figures in python interactively. Here’s some ways that I found quite useful. You could modify the code and adapt to your need.

* http://stackoverflow.com/questions/6136588/image-cropping-using-python

* http://stackoverflow.com/questions/6916054/how-to-crop-a-region-selected-with-mouse-click-using-python

* http://kitchingroup.cheme.cmu.edu/software/python/matplotlib/interacting-with-data-sets

* http://scienceoss.com/interactively-select-points-from-a-plot-in-matplotlib

And superisingly, matplot has the same function of ginput just as in matlab.

## ImageJ

Here’s the wonderful imageJ <http://rsbweb.nih.gov/ij/plugins/index.html>

Sometimes, one can really learning something by just reading these fundamental algorithmes.

And here’s a version of mean-shift. And a sklearn python version.

Here’s a nice comparison of different clustering method.

## How to map drive through MAC to PC

This is what I found recently, which is very useful.

1. In the Finder, click on the Go menu, selectConnect to Server.

2. Enter the address to where the resource is you wish to map.

i.e. smb://www.domain.com/foldername

or if you have the ip address:

i.e. smb://http://100.250.95.98/YOURSharedFolderName

3. Enter your network password when prompted.

4. A new icon should appear on the desktop. That is your mapped network drive.