Good afternoon, all
The issue:
I have a problem with removing watermarks from a collection of images. The watermarks appear near each other but are different (see below). One watermark is a red square and has white text inside of it. The other is a semi-transparent grey sentence. The purpose is to process the images for machine learning purposes.
Image
Problem solving attempts:
Because the watermarks vary in their location across the image dataset, I attempted the following:
- Made a copy of the image and converted it to the HSV colorspace
- Selected a range of lower and upper values for the region of interest (the values were selected after segmenting the image and constructing a histogram for each channel)
- Built a mask using the cv2.inRange() function
- Used the mask to in-paint the watermark in the original image
For the red square, the first three steps worked perfectly. But the third step only sort-of worked. The watermark is significantly less visible than before, but still very apparent. For the text, I was unable to get a good enough mask in step 3 – it simply is too close in color/pixel intensity to the surrounding area and the text itself.
This is looking more like a machine learning problem, which is fine, but I wanted to exhaust other options beforehand. Any thoughts on how to resolve this issue using machine learning or algorithmic methods?