Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

demirhere · 2024-12-19T05:47:22Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Apple M3 Sequoia 15.1.1

MediaPipe Tasks SDK version

3.10.0

Task name (e.g. Image classification, Gesture recognition etc.)

Pose landmarker

Programming Language and version (e.g. C++, Python, Java)

Python 3.10

Describe the actual behavior

Detection crashes application when segmentation is enabled. If I turn off segmentation mask, detect_for_video runs fine. CPU delegate runs fine for all cases.

Describe the expected behaviour

It should not crash.

Standalone code/steps you may have used to try to get what you need

Basic pose detection code using the latest API, with GPU as delegate on Mac M3.

Other info / Complete Logs

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1734586092.124030 9722600 gl_context.cc:369] GL version: 2.1 (2.1 Metal - 89.3), renderer: Apple M3
INFO: Created TensorFlow Lite delegate for Metal.
2024-12-18 21:28:12 - root - INFO - Model loaded successfully
W0000 00:00:1734586092.236510 9722705 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
E0000 00:00:1734586092.239868 9722700 shader_util.cc:99] Failed to compile shader:
 1 #version 330 
 2 #ifdef GL_ES 
 3 #define DEFAULT_PRECISION(p, t) precision p t; 
 4 #else 
 5 #define DEFAULT_PRECISION(p, t) 
 6 #define lowp 
 7 #define mediump 
 8 #define highp 
 9 #endif  // defined(GL_ES) 
10 #if __VERSION__ < 130
11 #define in attribute
12 #define out varying
13 #endif  // __VERSION__ < 130
14 in vec4 position; in mediump vec4 texture_coordinate; out mediump vec2 sample_coordinate; void main() { gl_Position = position; sample_coordinate = texture_coordinate.xy; }
E0000 00:00:1734586092.239882 9722700 shader_util.cc:106] Error message: ERROR: 0:1: '' :  version '330' is not supported

E0000 00:00:1734586092.239922 9722600 calculator_graph.cc:928] INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Process() for node "mediapipe_tasks_vision_pose_landmarker_poselandmarkergraph__mediapipe_tasks_vision_pose_landmarker_multipleposelandmarksdetectorgraph__mediapipe_tasks_vision_pose_landmarker_singleposelandmarksdetectorgraph__TensorsToSegmentationCalculator" failed: ; RET_CHECK failure (mediapipe/calculators/tensor/tensors_to_segmentation_converter_metal.cc:217) upsample_program_Problem initializing the program.

This happens when detect_for_video is called.

def load_model(self):
        logging.info(f"Loading MediaPipe Pose model from {self.model_path}")

        # No GPU support on Windows yet
        # Why no GPU with segmentation on Mac?
        delegate = python.BaseOptions.Delegate.CPU

        #if platform.system() == "Darwin" and not self.enable_segmentation:
        if platform.system() == "Darwin":
            delegate = python.BaseOptions.Delegate.GPU

        base_options = python.BaseOptions(
            model_asset_path=self.model_path,
            delegate=delegate
        )

        options = vision.PoseLandmarkerOptions(
            base_options=base_options,
            num_poses=self.num_poses,
            min_pose_detection_confidence=self.min_pose_detection_confidence,
            min_pose_presence_confidence=self.min_pose_presence_confidence,
            min_tracking_confidence=self.min_tracking_confidence,
            output_segmentation_masks=self.enable_segmentation,
            running_mode=vision.RunningMode.VIDEO
        )
        self.pose_detector = vision.PoseLandmarker.create_from_options(options)
        logging.info("Model loaded successfully")

    def unload_model(self):
        logging.info("Unloading MediaPipe Pose model")
        """Unload the pose detection model and free resources"""
        if self.pose_detector is not None:
            self.pose_detector.close()
            self.pose_detector = None
        logging.info("Model unloaded successfully")


    def pose(self, image, timestamp_ms):
        if self.pose_detector is None:
            self.load_model()

        mp_image = mp.Image(image_format=mp.ImageFormat.SRGBA, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGBA))
        detection_result = self.pose_detector.detect_for_video(mp_image, int(round(timestamp_ms)))
        result = {
            'poses': [],
            'world_poses': [],
            'segmentation_mask': None
        }
        
        if detection_result.pose_landmarks:
            for idx, (pose_landmarks, world_landmarks) in enumerate(zip(
                detection_result.pose_landmarks, 
                detection_result.pose_world_landmarks)):
                
                keypoint_dict = {}
                world_keypoint_dict = {}
                
                for i, landmark_name in enumerate(self.KEYPOINT_NAMES):
                    # Normalized coordinates
                    point = pose_landmarks[i]
                    keypoint_dict[landmark_name] = {
                        'x': 1.0-point.x,
                        'y': 1.0-point.y,
                        'z': point.z,
                        'confidence': point.visibility
                    }
                    
                    # World coordinates
                    world_point = world_landmarks[i]
                    world_keypoint_dict[landmark_name] = {
                        'x': world_point.x,
                        'y': world_point.y,
                        'z': world_point.z,
                        'confidence': world_point.visibility
                    }
                
                result['poses'].append(keypoint_dict)
                result['world_poses'].append(world_keypoint_dict)
            
            if self.enable_segmentation and detection_result.segmentation_masks:
                # Initialize a combined mask with zeros
                combined_mask = np.zeros_like(detection_result.segmentation_masks[0].numpy_view(), dtype=np.float32)
                
                for mask in detection_result.segmentation_masks:
                    combined_mask += mask.numpy_view()
                    
                # Normalize the combined mask to the range [0, 65535]
                combined_mask = (combined_mask / combined_mask.max() * 65535).astype(np.uint16)
                result['segmentation_mask'] = combined_mask
        
        return result

google-ml-butler bot assigned kuaashish Dec 19, 2024

kuaashish added os:macOS Issues on MacOS task:pose landmarker Issues related to Pose Landmarker: Find people and body positions platform:python MediaPipe Python issues gpu MediaPipe GPU related issues labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

demirhere commented Dec 19, 2024 •

edited

Loading

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Comments

demirhere commented Dec 19, 2024 • edited Loading

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

demirhere commented Dec 19, 2024 •

edited

Loading