Home > Image Processing, Programming, Python > Detect Duplicate Image using Python

Detect Duplicate Image using Python

To detect the exact duplicated image, using Md5 hashing (Python).

In finding the exact duplicated image, using a dictionary to store the hash value. If exact hash value is obtained, the current processing image is defined as the duplicated one. The already-read-in image is defined as ‘Representative Image

Function findDupImgMd5 returns the list of the duplicated image, and the hash results ( with image ids fall into each hash bin) .

The main program writes the list of duplicated image, the hash results into txt file. Also move a copy of the duplicated image as well as their Representive Image into the DupImg folder.


# FindDupImage.py

import os, glob
import Image
import hashlib
import shutil

def Imge_md5hash(im_file):
    im = Image.open(im_file)
    return hashlib.md5(im.tostring()).hexdigest()

def findDupImgMd5(data_path):
    hashdict={}
    deletionList=[]

    for infile in glob.glob(data_path + os.sep +"*.jpg"):
        fileNP, ext = os.path.splitext(infile)
        ids = fileNP.split(os.sep)
        hash_result = Imge_md5hash(infile)
        if hashdict.has_key(hash_result):
            deletionList.append(infile)
            hashdict.setdefault(hash_result, []).append(ids)
    return deletionList, hashdict

if __name__=="__main__":

    folderPath = r"your-image-data-folder"
    deletionList, hashdict = findDupImgMd5(folderPath)

    print("Start to save the data")

    with open("dupImageList.txt", 'wb') as ofile:
        _=[ofile.write(item+"\n") for item in deletionList]

    with open("hashDupImageDict.txt",'wb') as ofile:
        _=[ofile.write(k+"\t"+ "\t".join(v)+"\n") for (k,v) in hashdict.items()]

    """Copy the duplicated image to Dup-folder, renamed with
    hash-code and original id"""

    DesFolder = os.path.join(folderPath, "DupImg")

    if not os.path.exist(DesFolder):
        os.madir(DesFolder)

    for k,v in hashdict.items():
        if len(v)>1:
            for vi in v:
                srcP = os.path.join(folderPath, vi+".jpg")
                dstP = os.path.join(DesFolder, k+"_"+vi+".jpg")
                shutil.copyfile(srcP,dstP)

Advertisements
  1. Mayonvacy
    August 6, 2011 at 2:54 PM

    Thank a lot good post!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: