Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pose detect_for_video with GPU delegate crashes when segmentation enabled on Mac #5788

Open
demirhere opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels
gpu MediaPipe GPU related issues os:macOS Issues on MacOS platform:python MediaPipe Python issues task:pose landmarker Issues related to Pose Landmarker: Find people and body positions

Comments

@demirhere
Copy link

demirhere commented Dec 19, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Apple M3 Sequoia 15.1.1

MediaPipe Tasks SDK version

3.10.0

Task name (e.g. Image classification, Gesture recognition etc.)

Pose landmarker

Programming Language and version (e.g. C++, Python, Java)

Python 3.10

Describe the actual behavior

Detection crashes application when segmentation is enabled. If I turn off segmentation mask, detect_for_video runs fine. CPU delegate runs fine for all cases.

Describe the expected behaviour

It should not crash.

Standalone code/steps you may have used to try to get what you need

Basic pose detection code using the latest API, with GPU as delegate on Mac M3.

Other info / Complete Logs

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1734586092.124030 9722600 gl_context.cc:369] GL version: 2.1 (2.1 Metal - 89.3), renderer: Apple M3
INFO: Created TensorFlow Lite delegate for Metal.
2024-12-18 21:28:12 - root - INFO - Model loaded successfully
W0000 00:00:1734586092.236510 9722705 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
E0000 00:00:1734586092.239868 9722700 shader_util.cc:99] Failed to compile shader:
 1 #version 330 
 2 #ifdef GL_ES 
 3 #define DEFAULT_PRECISION(p, t) precision p t; 
 4 #else 
 5 #define DEFAULT_PRECISION(p, t) 
 6 #define lowp 
 7 #define mediump 
 8 #define highp 
 9 #endif  // defined(GL_ES) 
10 #if __VERSION__ < 130
11 #define in attribute
12 #define out varying
13 #endif  // __VERSION__ < 130
14 in vec4 position; in mediump vec4 texture_coordinate; out mediump vec2 sample_coordinate; void main() { gl_Position = position; sample_coordinate = texture_coordinate.xy; }
E0000 00:00:1734586092.239882 9722700 shader_util.cc:106] Error message: ERROR: 0:1: '' :  version '330' is not supported

E0000 00:00:1734586092.239922 9722600 calculator_graph.cc:928] INTERNAL: CalculatorGraph::Run() failed: 
Calculator::Process() for node "mediapipe_tasks_vision_pose_landmarker_poselandmarkergraph__mediapipe_tasks_vision_pose_landmarker_multipleposelandmarksdetectorgraph__mediapipe_tasks_vision_pose_landmarker_singleposelandmarksdetectorgraph__TensorsToSegmentationCalculator" failed: ; RET_CHECK failure (mediapipe/calculators/tensor/tensors_to_segmentation_converter_metal.cc:217) upsample_program_Problem initializing the program.

This happens when detect_for_video is called.

def load_model(self):
        logging.info(f"Loading MediaPipe Pose model from {self.model_path}")

        # No GPU support on Windows yet
        # Why no GPU with segmentation on Mac?
        delegate = python.BaseOptions.Delegate.CPU

        #if platform.system() == "Darwin" and not self.enable_segmentation:
        if platform.system() == "Darwin":
            delegate = python.BaseOptions.Delegate.GPU

        base_options = python.BaseOptions(
            model_asset_path=self.model_path,
            delegate=delegate
        )

        options = vision.PoseLandmarkerOptions(
            base_options=base_options,
            num_poses=self.num_poses,
            min_pose_detection_confidence=self.min_pose_detection_confidence,
            min_pose_presence_confidence=self.min_pose_presence_confidence,
            min_tracking_confidence=self.min_tracking_confidence,
            output_segmentation_masks=self.enable_segmentation,
            running_mode=vision.RunningMode.VIDEO
        )
        self.pose_detector = vision.PoseLandmarker.create_from_options(options)
        logging.info("Model loaded successfully")

    def unload_model(self):
        logging.info("Unloading MediaPipe Pose model")
        """Unload the pose detection model and free resources"""
        if self.pose_detector is not None:
            self.pose_detector.close()
            self.pose_detector = None
        logging.info("Model unloaded successfully")


    def pose(self, image, timestamp_ms):
        if self.pose_detector is None:
            self.load_model()

        mp_image = mp.Image(image_format=mp.ImageFormat.SRGBA, data=cv2.cvtColor(image, cv2.COLOR_BGR2RGBA))
        detection_result = self.pose_detector.detect_for_video(mp_image, int(round(timestamp_ms)))
        result = {
            'poses': [],
            'world_poses': [],
            'segmentation_mask': None
        }
        
        if detection_result.pose_landmarks:
            for idx, (pose_landmarks, world_landmarks) in enumerate(zip(
                detection_result.pose_landmarks, 
                detection_result.pose_world_landmarks)):
                
                keypoint_dict = {}
                world_keypoint_dict = {}
                
                for i, landmark_name in enumerate(self.KEYPOINT_NAMES):
                    # Normalized coordinates
                    point = pose_landmarks[i]
                    keypoint_dict[landmark_name] = {
                        'x': 1.0-point.x,
                        'y': 1.0-point.y,
                        'z': point.z,
                        'confidence': point.visibility
                    }
                    
                    # World coordinates
                    world_point = world_landmarks[i]
                    world_keypoint_dict[landmark_name] = {
                        'x': world_point.x,
                        'y': world_point.y,
                        'z': world_point.z,
                        'confidence': world_point.visibility
                    }
                
                result['poses'].append(keypoint_dict)
                result['world_poses'].append(world_keypoint_dict)
            
            if self.enable_segmentation and detection_result.segmentation_masks:
                # Initialize a combined mask with zeros
                combined_mask = np.zeros_like(detection_result.segmentation_masks[0].numpy_view(), dtype=np.float32)
                
                for mask in detection_result.segmentation_masks:
                    combined_mask += mask.numpy_view()
                    
                # Normalize the combined mask to the range [0, 65535]
                combined_mask = (combined_mask / combined_mask.max() * 65535).astype(np.uint16)
                result['segmentation_mask'] = combined_mask
        
        return result
@kuaashish kuaashish added os:macOS Issues on MacOS task:pose landmarker Issues related to Pose Landmarker: Find people and body positions platform:python MediaPipe Python issues gpu MediaPipe GPU related issues labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu MediaPipe GPU related issues os:macOS Issues on MacOS platform:python MediaPipe Python issues task:pose landmarker Issues related to Pose Landmarker: Find people and body positions
Projects
None yet
Development

No branches or pull requests

2 participants