Create Portrait Mode Impact with Phase Something Mannequin 2 (SAM2)


Have you ever ever admired how smartphone cameras isolate the primary topic from the background, including a delicate blur to the background primarily based on depth? This “portrait mode” impact provides pictures an expert look by simulating shallow depth-of-field just like DSLR cameras. On this tutorial, we’ll recreate this impact programmatically utilizing open-source laptop imaginative and prescient fashions, like SAM2 from Meta and MiDaS from Intel ISL.

To construct our pipeline, we’ll use:

  1. Phase Something Mannequin (SAM2): To section objects of curiosity and separate the foreground from the background.
  2. Depth Estimation Mannequin: To compute a depth map, enabling depth-based blurring.
  3. Gaussian Blur: To blur the background with depth various primarily based on depth.

Step 1: Setting Up the Atmosphere

To get began, set up the next dependencies:

pip set up matplotlib samv2 pytest opencv-python timm pillow

Step 2: Loading a Goal Picture

Select an image to use this impact and cargo it into Python utilizing the Pillow library.

from PIL import Picture
import numpy as np
import matplotlib.pyplot as plt

image_path = ".jpg"
img = Picture.open(image_path)
img_array = np.array(img)

# Show the picture
plt.imshow(img)
plt.axis("off")
plt.present()

Step 3: Initialize the SAM2

To initialize the mannequin, obtain the pretrained checkpoint. SAM2 provides 4 variants primarily based on efficiency and inference velocity: tiny, small, base_plus, and enormous. On this tutorial, we’ll use tiny for sooner inference.

Obtain the mannequin checkpoint from: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_.pt

Exchange together with your desired mannequin sort.

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.utils.misc import variant_to_config_mapping
from sam2.utils.visualization import show_masks

mannequin = build_sam2(
    variant_to_config_mapping["tiny"],
    "sam2_hiera_tiny.pt",
)
image_predictor = SAM2ImagePredictor(mannequin)

Step 4: Feed Picture into SAM and Choose the Topic

Set the picture in SAM and supply factors that lie on the topic you need to isolate. SAM predicts a binary masks of the topic and background.

image_predictor.set_image(img_array)
input_point = np.array([[2500, 1200], [2500, 1500], [2500, 2000]])
input_label = np.array([1, 1, 1])

masks, scores, logits = image_predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    field=None,
    multimask_output=True,
)
output_mask = show_masks(img_array, masks, scores)
sorted_ind = np.argsort(scores)[::-1]

Step 5: Initialize the Depth Estimation Mannequin

For depth estimation, we use MiDaS by Intel ISL. Much like SAM, you possibly can select completely different variants primarily based on accuracy and velocity.Be aware: The anticipated depth map is reversed, that means bigger values correspond to nearer objects. We’ll invert it within the subsequent step for higher intuitiveness.

import torch
import torchvision.transforms as transforms

model_type = "DPT_Large"  # MiDaS v3 - Giant (highest accuracy)

# Load MiDaS mannequin
mannequin = torch.hub.load("intel-isl/MiDaS", model_type)
mannequin.eval()

# Load and preprocess picture
remodel = torch.hub.load("intel-isl/MiDaS", "transforms").dpt_transform
input_batch = remodel(img_array)

# Carry out depth estimation
with torch.no_grad():
    prediction = mannequin(input_batch)
    prediction = torch.nn.purposeful.interpolate(
        prediction.unsqueeze(1),
        dimension=img_array.form[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()

prediction = prediction.cpu().numpy()

# Visualize the depth map
plt.imshow(prediction, cmap="plasma")
plt.colorbar(label="Relative Depth")
plt.title("Depth Map Visualization")
plt.present()

Step 6: Apply Depth-Primarily based Gaussian Blur

Right here we optimize the depth-based blurring utilizing an iterative Gaussian blur method. As a substitute of making use of a single giant kernel, we apply a smaller kernel a number of occasions for pixels with increased depth values.

import cv2

def apply_depth_based_blur_iterative(picture, depth_map, base_kernel_size=7, max_repeats=10):
    if base_kernel_size % 2 == 0:
        base_kernel_size += 1

    # Invert depth map
    depth_map = np.max(depth_map) - depth_map

    # Normalize depth to vary [0, max_repeats]
    depth_normalized = cv2.normalize(depth_map, None, 0, max_repeats, cv2.NORM_MINMAX).astype(np.uint8)

    blurred_image = picture.copy()

    for repeat in vary(1, max_repeats + 1):
        masks = (depth_normalized == repeat)
        if np.any(masks):
            blurred_temp = cv2.GaussianBlur(blurred_image, (base_kernel_size, base_kernel_size), 0)
            for c in vary(picture.form[2]):
                blurred_image[..., c][mask] = blurred_temp[..., c][mask]

    return blurred_image

blurred_image = apply_depth_based_blur_iterative(img_array, prediction, base_kernel_size=35, max_repeats=20)

# Visualize the outcome
plt.determine(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Unique Picture")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(blurred_image)
plt.title("Depth-based Blurred Picture")
plt.axis("off")
plt.present()

Step 7: Mix Foreground and Background

Lastly, use the SAM masks to extract the sharp foreground and mix it with the blurred background.

def combine_foreground_background(foreground, background, masks):
    if masks.ndim == 2:
        masks = np.expand_dims(masks, axis=-1)
    return np.the place(masks, foreground, background)

masks = masks[sorted_ind[0]].astype(np.uint8)
masks = cv2.resize(masks, (img_array.form[1], img_array.form[0]))
foreground = img_array
background = blurred_image

combined_image = combine_foreground_background(foreground, background, masks)

plt.determine(figsize=(20, 10))
plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Unique Picture")
plt.axis("off")

plt.subplot(1, 2, 2)
plt.imshow(combined_image)
plt.title("Last Portrait Mode Impact")
plt.axis("off")
plt.present()

Conclusion

With only a few instruments, we’ve recreated the portrait mode impact programmatically. This method will be prolonged for photograph enhancing purposes, simulating digital camera results, or inventive tasks.

Future Enhancements:

  1. Use edge detection algorithms for higher refinement of topic edges.
  2. Experiment with kernel sizes to reinforce the blur impact.
  3. Create a person interface to add pictures and choose topics dynamically.

Sources:

  1. Phase something mannequin by META (https://github.com/facebookresearch/sam2)
  2. CPU suitable implementation of SAM 2 (https://github.com/SauravMaheshkar/samv2/tree/main)
  3. MIDas Depth Estimation Mannequin ( https://pytorch.org/hub/intelisl_midas_v2/)


Vineet Kumar is a consulting intern at MarktechPost. He’s at the moment pursuing his BS from the Indian Institute of Expertise(IIT), Kanpur. He’s a Machine Studying fanatic. He’s captivated with analysis and the newest developments in Deep Studying, Laptop Imaginative and prescient, and associated fields.

Leave a Reply

Your email address will not be published. Required fields are marked *