The Tourist Filter

Deserted nature photos with OpenCV, Python and YOLOv7

Der Godafoss-Wasserfall in Island ist in der Hauptsaison immer gut besucht.

Landscape shots without people

As a tourist, you sometimes seek solitude in nature, only to find out that one must share the "seclusion" with hundreds of other people. For atmospheric nature photos tourist crowds are unsuitable. A vacation is expensive, there is a lot to see and not everyone can or wants to be the first on the spot early in the morning. Reason enough to wish for a tourist filter for photos.

The problem is well known in the photo community and the method of choice to solve it is the so-called median stacking. All you need is a series of photos taken with a tripod. Unfortunately, median stacking doesn't work well enough in most cases and requires a lot of rework. I would like to present algorithms that work much better with the same input data. The basis of the algorithms is the use of a neural network for the recognition of persons in combination with a median stack.

The basalt columns of Reynisfjara beach in Iceland. Four example pictures from a photo stack show persons at different places of the basalt columns. To remove the persons, you only have to only the areas where there are no people have to be merged.

As input data, a photo series is required that was taken with a tripod under unchanged lighting conditions. The photo series should contain about 30-50 pictures taken at intervals of a few seconds. Care should be taken to ensure that, if possible, every part of the image can be seen in one of the photos without people. The task of the tourist filter is to isolate the areas in each image where there are no people and reconstruct the original image from these areas.

If you want to try it, you can download "The-Tourist-Filter" archive from GitHub and try it out yourself with your own images. The program is written in Python and implements four improved algorithms in addition to the median stacking. But first, let's see how the median filter works and why it can remove short-term noise from images.


The following command performs median filtering of a photostack:

python ./ -i ./stack1 -m MEDIAN

Median stacking is based on applying a media filter over time to all pixels of a photo in a series of images. The median of a series of 2n+1 values is determined by first sorting the values by size. The value whose index then lies exactly in the middle of the sorted data is the median. For example, the median of the series { 4300, 9, 30, 1, 3, 10, 17 } is 10, because after sorting, 10 is in the middle of the series:

{ 1, 3, 9, 10, 17, 30, 4300 } = 10

A median filter is particularly suitable for filtering out sporadic outliers from a data series. When computing the mean value all values of the series contribute to the result with equal weight. The median however ignores all other values except the one in the middle of the sorted series. Since outliers are always either at the beginning or at the end after sorting they are never considered for the median. A median filter is only suitable for eliminating short-term disturbances of the data. A tourist who steps in front of the photo at some distance is such a disturbance but only as long as he does not linger for too long. If he stays too long in one place, his presence can no longer be filtered out with a median and ghost images are created.

Ghost images of tourists cannot be avoided in most median stacks.

The formation of ghost images cannot be prevented in images with people, because they tend to stand longer at a place when in a group. When this happens, a person will ultimately become the determining factor for the color of the pixel.

The median no longer works if too many tourists obscure parts of the image for too long. In the example, the tourist in the blue jacket obscures the rocks at one point of the image in more than half of all photos.

For reliable results, one needs methods that can prevent the formation of ghost images during median filtering. So why not eliminate the tourists as the root cause of the problem? Let's take a look at three different algorithms that do exactly this:

Detect people, remove them and perform median stacking

People are detected with YOLOv7 and cut out of the photos. After that the next image is loaded, people are cut out again and the parts of the image, which are missing in the first image are copied from the 2nd photo. This process is repeated as long as more images are available.

Detect people, cover them with noise and do a median stack

People are detected using YOLOv7 and each individual is occluded with binary noise consisting of randomly distributed black and white pixels. Then, using the images modified in this way, a median stack is is calculated.

Detect people cover them with OpenCV-inpaint and do a median stack

Persons are detected with YOLOv7 and directly overpainted by the OpenCV inpaint method. Afterwards a median stack is calculated with the modified images.

Detect people, remove them and perform median stacking

The reason why median stacking does not work one hundred percent of the time is that people tend to stay in one place for too long. The median cannot work if the actual subject is only visible a minority of the time. One could try to compose the image from the stack manually by using Photoshop an combining section by section the parts of the image where the view is free.

python ./ -i ./stack1 -m CUT

This is exactly what the second algorithm does, only fully automated. We use YOLOv7, a deep-learning based object detector, for this. YOLO is a model trained on over 80 classes using the COCO dataset. It detects classes such as people, cars, bikes, motorcycles, cats, dogs, backpacks, and more. In short, things people carry that you don't want to see in a nature image. You don't even have to bother to check what YOLO sees. If YOLO sees something, it doesn't belong in a nature picture and is removed. This is done in all photos and ideally there is at least one image for each pixel in which nothing man-made obscures the view. The following animation shows the algorithm at work:

Tourist removal with YOLOv7. Top left: Input image with YOLOv7 detection boxes. Top Right: Heatmap of tourists. Bottom left: Filtered photo with current best result. Bottom Right: White parts of the image are the parts that could be reconstructed without tourists.

The process works, but not well enough. Because there are some visible problems:

  • The object detector is not perfect and does not detect all people.
  • YOLOv7 does not detect shadows. People are cut out, but their shadows are not.
  • The complete removal of all persons only works if there is at least one photo in which every image area is free. In the stack used here this is not the case.
  • Where people have been cut out, hard transitions occur. This is due to the fact that the brightness of the individual images differs slightly from one another as a result of the changing cloud cover.

The result can be seen in the following image. If you compare this with a median stack, the result is not much better but we can improve the algorithm further by combining it with median stacking.

Links: Person detection and automated cutting removes most tourists but leaves their shadows in the image.
Right: Median stacking leaves ghost images and does not remove tourists who have been in one place for a long time.

In the animation above, you can see how the tourists are removed from the image literally piece by piece until an optimum is reached at the end of the pass. We now modify the algorithm so that it runs through the photo stack twice. In this way, we achieve that the second pass is started with an almost tourist-free image. All images of this second pass are then stacked at the end using the median method. This results in a smoothing of the cut edges and a removal of shadows. This modified algorithm is capable of removing tourists to a very high degree.

Left: Median stack with ghost images; Center: Tourists cut out after object detection; Right: Tourists cut out combined with median stacking in a 2nd pass.

Detect people, cover them with noise and do a median stack

Person removal with YOLOv7 object detection works well in the procedure described above, but requires two passes. It is not straightforward to stack the processed images directly. One would have to know for each pixel in which photos a tourist obscured the view and ignore those images for that one pixel in the median stack. Even then, the algorithm could still end up with a result that contains holes in places where tourists stayed for too long. Let's look at a method that can remove tourists with just one pass. It can be used with the following command line:

python ./ -i ./stack1 -m NOISE_AND_MEDIAN

This method also detects persons with YOLOv7. However, now each person is covered with a binary noise of randomly distributed black and white pixels. This is done because these brightness values are sorted to the beginning or the end during median stacking and there is a possibility that a few pixels without tourists were found in between.

The brightnesses in each of the photos are first modified so that neither the brightness value 0, nor the brightness value 255 occur in them. This is a virtually invisible reduction of the dynamic range in the image. Then all tourists are covered with randomly distributed black and white pixels. If after the median filtering pixels with a brightness of 0 or 255 remain in the result it is clear that these pixels belong to places where people were found. This is because, due to the initial dynamic reduction of the image, these brightness values no longer appear natural. These pixels are finally overpainted using the OpenCV-Inpaint method. This function was developed to remove small scratches and defects inconspicuously by filling them with pixels in similar colors.

Covering the tourists with binary noise makes the work of the median filter easier. The colors black or white are sorted to the edges during the median filtering and thus do not play a role for the median, which lies in the middle of the sorted pixels.

The result is impressive, the tourists were completely removed. However, this does not work completely without residue. What remains is a region with stronger noise and some dark pixels, which presumably originate from dark clothing of the tourists. The process doesn't work very well when there are a lot of tourists in the image or some parts of the image have been permanently obscured.

The algorithm removes tourists almost completely except for a few residues if there are enough input images. Problems arise when the input images have too few photos without people or objects such as cars are included in all images.

Detect people cover them with OpenCV-inpaint and do a median stack

The last algorithm will completely remove tourists, but it cannot completely restore the background. However, the artifacts it creates will be acceptable in many cases. This algorithm is enabled with the INPAINT_AND_MEDIAN option:

python ./ -i ./stack1 -m INPAINT_AND_MEDIAN

This algorithm also starts with a person detection. The found objects are then completely overpainted in each photo using the OpenCV inpaint method. Afterwards, a median filter is applied to the images modified in this way. The so-called "inpainting" is actually only suitable for removing minor scratches and image defects. It cannot properly reconstruct the background behind a person and is qualitatively far behind methods like "Content Aware Fill" from Photoshop. However, if you combine the results of many such inpaintings with photos where the view of the background was unobstructed, then the background can be reconstructed to a certain extent.

The OpenCV inpainting completely hides the tourists. In the fotos it almost looks like water drops on the lens.

The person removal works surprisingly well. The inpainting creates pixels that match the background well in terms of color. The results of smaller and larger inpaintings are later combined in the median stack and also supplemented with information from images where the view was clear. The median filter may not be able to reconstruct the background completely correctly but it is very close.

Left: Object detection with inpainting followed by median stacking works very well in nature images.
Right: Even in photo stacks with many people and parked cars, removing people and even the parked cars works, though not without any residues.

Remarkably, this method also works well when many people permanently obscure parts of the subject. The image above on the right contained parked vehicles in addition to a great many people.