Generating 3D Images and Point Clouds with Python, Open3D, and GLPN for Depth Estimation

3 min readJun 16, 2024

In this blog post, we will explore the process of generating 3D images and point clouds using Python. We’ll utilize the GLPN model for depth estimation and the Open3D library for point cloud generation and visualization. By the end of this tutorial, you'll be able to convert a 2D image into a 3D point cloud and visualize it interactively.

Prerequisites

Before we start, ensure you have the following libraries installed:

pip install torch transformers numpy open3d matplotlib pillow

Step 1: Import Necessary Libraries

First, we import all the necessary libraries for our task:

import matplotlib
matplotlib.use('tkAgg')
from matplotlib import pyplot as plt
from PIL import Image
import torch
from transformers import GLPNImageProcessor, GLPNForDepthEstimation
import numpy as np
import open3d as o3d

Step 2: Load the Depth Estimation Model

We will use the GLPN model for depth estimation. Let's load the model and its feature extractor:

feature_extractor = GLPNImageProcessor.from_pretrained("vinvino02/glpn-nyu")
model = GLPNForDepthEstimation.from_pretrained("vinvino02/glpn-nyu")

Step 3: Preprocess the Input Image

Next, we load and preprocess the input image to the required size:

pythonopy codeimage = Image.open('data/Midi.jpeg')
new_height = 480 if image.height > 480 else image.height
new_height -= (new_height % 32)
new_width = int(new_height * image.width / image.height)
diff = new_width % 32
new_width = new_width - diff if diff < 16 else new_width + 32 - diff
new_size = (new_height, new_width)
image = image.resize(new_size)

Step 4: Perform Depth Estimation

With the preprocessed image, we can now perform depth estimation:

inputs = feature_extractor(images=image, return_tensors='pt')
with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

Step 5: Post-Processing the Depth Map

We apply padding and crop the image to remove border effects:

pad = 16
output = predicted_depth.squeeze().cpu().numpy() * 1000.0
output = output[pad:-pad, pad:-pad]
image = image.crop((pad, pad, image.width - pad, image.height - pad))

Step 6: Visualize the Depth Map

Visualize the original and depth images side by side:

fig, ax = plt.subplots(1, 2)
ax[0].imshow(image)
ax[0].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
ax[1].imshow(output, cmap='plasma')
ax[1].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
plt.tight_layout()
plt.pause(5)

Step 7: Generate the Point Cloud using Open3D

Define a function to create a point cloud from the depth image:

def create_point_cloud_from_depth_image(depth_image, intrinsic, scale=1.0):
    """
    Create a point cloud from a depth image using Open3D.    
Parameters:
    - depth_image: A numpy array containing the depth image.
    - intrinsic: An Open3D camera intrinsic object.
    - scale: A scaling factor to adjust the depth values.    
Returns:
    - A point cloud object.
    """
    
height, width = depth_image.shape
    depth_image = o3d.geometry.Image((depth_image / scale).astype(np.float32))    # Create an RGBD image
    rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(
        color=o3d.geometry.Image(np.zeros((height, width, 3), dtype=np.uint8)),
        depth=depth_image,
        convert_rgb_to_intensity=False
    )    

# Create the point cloud from the RGBD image
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(
        rgbd_image,
        intrinsic
    )    return pcd

Step 8: Define Camera Intrinsic Parameters

Set the intrinsic parameters of the camera used to capture the image

fx = 1000  # Focal length in the x axis
fy = 1000  # Focal length in the y axis
cx = output.shape[1] / 2  # Principal point in the x axis
cy = output.shape[0] / 2  # Principal point in the y axis
intrinsic = o3d.camera.PinholeCameraIntrinsic(
    width=output.shape[1],
    height=output.shape[0],
    fx=fx,
    fy=fy,
    cx=cx,
    cy=cy
)

Step 9: Create and Visualize the Point Cloud

Generate and visualize the point cloud:

# Create the point cloud from the depth image
pcd = create_point_cloud_from_depth_image(output, intrinsic, scale=1.0)
# Visualize the point cloud
o3d.visualization.draw_geometries([pcd])

Optionally, save the point cloud to a file:

# Save the point cloud to a file
# o3d.io.write_point_cloud("output_point_cloud.ply", pcd)

Conclusion

In this tutorial, we’ve covered the entire process of generating a 3D point cloud from a 2D image using the GLPN model for depth estimation and Open3D for point cloud creation and visualization. This method provides an exciting way to explore 3D reconstructions from simple 2D images.

Feel free to experiment with different images and settings to further explore the capabilities of these powerful libraries. Happy coding!