Introduction
- The Segment Anything Model(SAM) proposed by facebook has made a great influence in computer vision, as it is a fundamental step in many tasks, such as edge detection, face recognition and autonomous driving. However, there are some weakness in SAM: (1) it can't return the semantic information about the regions, (2) in some cases an instance(eg. a car) may be segmented to different parts, (3) the model can't process video data.
- In this repository, we implement a segmentation and a tracking method using YOLOv8 and SAM, it can fix the weakness, we name this method Segment Any Video(SAV).
- In seg.py, our segmentation method is implemented by providing the boxes from YOLOv8 detector as prompts to SAM, and the masks with no semantic info will also be returned, this is the biggest difference with SAM. In track.py, we modified the code from ultralytics/tracker/track.py which sported ByteTrack and BoTSORT, then apply instance segmentation to all frames.
Installation
pip install ultralytics
pip install git+https://github.com/facebookresearch/segment-anything.git
Model CheckPoints
Usage
python seg.py --img_path TestImages --save_dir SegOut --sam_checkpoint model/sam_vit_h_4b8939.pth --yolo_checkpoint model/yolov8x.pt
or
python track.py --video_path video.mp4 --save_path video_test.mp4 sam_checkpoint model/sam_vit_h_4b8939.pth --yolo_checkpoint model/yolov8x.pt --imgsz 1920
Image Segment Results
We can see from the above result that our method can segment the bus, car and train to an intact object semantically while SAM segments to different parts.
Video Track Result
Segment and track
Segment and track
Segment and track
Segment and track
Demo
- Our online demo is here.
- Note: Considering that video segmentation is time-consuming, so we didn't integrate this method. If you are interested, you can git clone this repository and run on your GPU machine.
TODO
- Train YOLOv8 models with object365 dataset
- YOLOv8m.pt extract code: 65ge
- YOLOv8n.pt
- YOLOv8s.pt
- YOLOv8l.pt
- YOLOv8x.pt
License
- This code is licensed under the AGPL-3.0 License.
Contact
- Email: [email protected]
- Wechat QR code: