Detection, segmentation, pose, OBB
YOLO family
Detection, segmentation, pose, classification, and oriented bounding boxes in one line.
Visual search
CLIP, SigLIP
Image and text embeddings in a shared space. Similarity search over an archive, plus text queries.
- → CLIP, open-weight checkpoints
- → SigLIP, stronger on some workloads
OCR
PaddleOCR
Full stack: text detection, recognition, angle classifier. Expiry dates, lot codes, container IDs, labels.
- → Detection + recognition + angle
Tracking
ByteTrack, Deep SORT
Multi-object tracking. Composes with any detector in the catalog.
Re-identification
Person re-ID
Embeddings for matching the same subject across cameras. Yards, depots, multi-camera perimeters.
- → OSNet, trained on MSMT17
Face (open-weight)
SCRFD, ArcFace, InsightFace
Open-weight face detection and embedding models the SDK loads and runs.
- → SCRFD (detection)
- → ArcFace, InsightFace buffalo packs (embeddings)
Need a vendor-supported full-stack FR system with imaged-trained weights, antispoof, and an SLA? See By invitation below.
License plate recognition
Plate detect + PaddleOCR
Open-weight plate detectors paired with PaddleOCR for plate text. Works out of the box at gates, checkpoints, and yards.
- → Open-weight plate detectors
- → PaddleOCR for plate text
Depth
Depth Anything, DPT
Monocular depth estimation for AR, parallax, and 3D measurement.
- → Depth Anything V2
- → Depth Anything V3
- → DPT
Segmentation specialists
SegFormer, DeepLabV3+
Semantic segmentation families commonly fine-tuned in-house by industrial customers.
Promptable segmentation
FastSAM, MobileSAM, SAM ViT-B
Promptable segmentation for operator-assisted QA and annotation bootstrapping.
- → FastSAM
- → MobileSAM
- → SAM ViT-B
Classification backbones
MobileNet, EfficientNet, ResNet, ConvNeXt
Standard classifier backbones for BYO-classifier workflows.
- → MobileNetV3
- → EfficientNet
- → ResNet
- → ConvNeXt-Tiny
Image restoration
Real-ESRGAN
4x super-resolution for low-res archives and incident frames.
- → Real-ESRGAN, 4x upscaler
Driving perception
YOLOP
Vehicles, drivable surface, and lane lines from one pass. Fits fleet, depot, and yard cameras.
- → Vehicle detection
- → Drivable area
- → Lane lines
Document layout
PP-Structure
Find titles, paragraphs, tables, and figures in scanned pages. Pairs with PaddleOCR for full document parsing.