Have you ever ever admired how smartphone cameras isolate the primary topic from the background, including a delicate blur to the background primarily based on depth? This “portrait mode” impact provides pictures an expert look by simulating shallow depth-of-field just like DSLR cameras. On this tutorial, we’ll recreate this impact programmatically utilizing open-source laptop imaginative and prescient fashions, like SAM2 from Meta and MiDaS from Intel ISL.
To construct our pipeline, we’ll use:
- Phase Something Mannequin (SAM2): To section objects of curiosity and separate the foreground from the background.
- Depth Estimation Mannequin: To compute a depth map, enabling depth-based blurring.
- Gaussian Blur: To blur the background with depth various primarily based on depth.
Step 1: Setting Up the Atmosphere
To get began, set up the next dependencies:
pip set up matplotlib samv2 pytest opencv-python timm pillow
Step 2: Loading a Goal Picture
Select an image to use this impact and cargo it into Python utilizing the Pillow library.
from PIL import Picture
import numpy as np
import matplotlib.pyplot as plt
image_path = ".jpg"
img = Picture.open(image_path)
img_array = np.array(img)
# Show the picture
plt.imshow(img)
plt.axis("off")
plt.present()
Step 3: Initialize the SAM2
To initialize the mannequin, obtain the pretrained checkpoint. SAM2 provides 4 variants primarily based on efficiency and inference velocity: tiny, small, base_plus, and enormous. On this tutorial, we’ll use tiny for sooner inference.
Obtain the mannequin checkpoint from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_
Exchange
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks
mannequin = build_sam2(
variant_to_config_mapping["tiny"],
"sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(mannequin)
Step 4: Feed Picture into SAM and Choose the Topic
Set the picture in SAM and supply factors that lie on the topic you need to isolate. SAM predicts a binary masks of the topic and background.
image_predictor.set_image(img_array)
input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])
input_label = np.array([1, 1, 1])
masks, scores, logits = image_predictor.predict(
point_coords=input_point,
point_labels=input_label,
field=None,
multimask_output=True,
)
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)[::-1]
Step 5: Initialize the Depth Estimation Mannequin
For depth estimation, we use MiDaS by Intel ISL. Much like SAM, you possibly can select completely different variants primarily based on accuracy and velocity.Be aware: The anticipated depth map is reversed, that means bigger values correspond to nearer objects. We’ll invert it within the subsequent step for higher intuitiveness.
import torch
import torchvision.transforms as transforms
model_type = "DPT_Large" # MiDaS v3 - Giant (highest accuracy)
# Load MiDaS mannequin
mannequin = torch.hub.load("intel-isl/MiDaS", model_type)
mannequin.eval()
# Load and preprocess picture
remodel = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = remodel(img_array)
# Carry out depth estimation
with torch.no_grad():
prediction = mannequin(input_batch)
prediction = torch.nn.purposeful.interpolate(
prediction.unsqueeze(1),
dimension=img_array.form[:2],
mode="bicubic",
align_corners=False,
).squeeze()
prediction = prediction.cpu().numpy()
# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")
plt.present()
Step 6: Apply Depth-Primarily based Gaussian Blur
Right here we optimize the depth-based blurring utilizing an iterative Gaussian blur method. As a substitute of making use of a single giant kernel, we apply a smaller kernel a number of occasions for pixels with increased depth values.
import cv2
def apply_depth_based_blur_iterative(picture, depth_map, base_kernel_size=7, max_repeats=10):
if base_kernel_size % 2 == 0:
base_kernel_size += 1
# Invert depth map
depth_map = np.max(depth_map) - depth_map
# Normalize depth to vary [0, max_repeats]
depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)
blurred_image = picture.copy()
for repeat in vary(1, max_repeats + 1):
masks = (depth_normalized == repeat)
if np.any(masks):
blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
for c in vary(picture.form[2]):
blurred_image[..., c][mask] = blurred_temp[..., c][mask]
return blurred_image
blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)
# Visualize the outcome
plt.determine(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Unique Picture")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(blurred_image)
plt.title("Depth-based Blurred Picture")
plt.axis("off")
plt.present()
Step 7: Mix Foreground and Background
Lastly, use the SAM masks to extract the sharp foreground and mix it with the blurred background.
def combine_foreground_background(foreground, background, masks):
if masks.ndim == 2:
masks = np.expand_dims(masks, axis=-1)
return np.the place(masks, foreground, background)
masks = masks[sorted_ind[0]].astype(np.uint8)
masks = cv2.resize(masks, (img_array.form[1], img_array.form[0]))
foreground = img_array
background = blurred_image
combined_image = combine_foreground_background(foreground, background, masks)
plt.determine(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Unique Picture")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(combined_image)
plt.title("Last Portrait Mode Impact")
plt.axis("off")
plt.present()
Conclusion
With only a few instruments, we’ve recreated the portrait mode impact programmatically. This method will be prolonged for photograph enhancing purposes, simulating digital camera results, or inventive tasks.
Future Enhancements:
- Use edge detection algorithms for higher refinement of topic edges.
- Experiment with kernel sizes to reinforce the blur impact.
- Create a person interface to add pictures and choose topics dynamically.
Sources:
- Phase something mannequin by META (https://github.com/facebookresearch/sam2)
- CPU suitable implementation of SAM 2 (https://github.com/SauravMaheshkar/samv2/tree/main)
- MIDas Depth Estimation Mannequin ( https://pytorch.org/hub/intelisl_midas_v2/)
Vineet Kumar is a consulting intern at MarktechPost. He’s at the moment pursuing his BS from the Indian Institute of Expertise(IIT), Kanpur. He’s a Machine Studying fanatic. He’s captivated with analysis and the newest developments in Deep Studying, Laptop Imaginative and prescient, and associated fields.