Real-time object detection has become increasingly accessible thanks to advancements in deep learning architectures and optimized inference engines. This guide walks through building a production-ready system using the latest YOLO (You Only Look Once) architecture.
YOLO Object Detection Pipeline
Why YOLO for Object Detection?
YOLO stands out among object detection algorithms for several reasons:
- Speed: Single-pass detection enables real-time performance
- Accuracy: State-of-the-art results on standard benchmarks
- Versatility: Works well across various object types and scales
- Community: Excellent ecosystem and pre-trained models
Architecture Overview
YOLOv8, the latest iteration, introduces several improvements:
- Anchor-free detection for better generalization
- Enhanced feature pyramid network (FPN)
- Improved loss functions for better training convergence
- Optimized architecture for edge deployment
Implementation
1. Environment Setup
pip install ultralytics opencv-python numpy pillow
2. Basic Object Detection
from ultralytics import YOLO
import cv2
import numpy as np
# Load pre-trained model
model = YOLO('yolov8n.pt') # nano model for speed
# Process video stream
cap = cv2.VideoCapture(0) # 0 for webcam
while True:
ret, frame = cap.read()
if not ret:
break
# Perform detection
results = model(frame, conf=0.5)
# Visualize results
annotated_frame = results[0].plot()
# Display
cv2.imshow('YOLOv8 Detection', annotated_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
3. Custom Object Detection
Train on your own dataset:
from ultralytics import YOLO
# Load base model
model = YOLO('yolov8n.pt')
# Train on custom dataset
results = model.train(
data='custom_dataset.yaml',
epochs=100,
imgsz=640,
batch=16,
device=0, # GPU
workers=8,
patience=50,
save=True,
plots=True
)
# Validate
metrics = model.val()
# Export for deployment
model.export(format='onnx') # or 'tflite', 'coreml'
4. Advanced Features
Multi-Object Tracking:
from collections import defaultdict
import numpy as np
track_history = defaultdict(lambda: [])
results = model.track(frame, persist=True)
if results[0].boxes.id is not None:
boxes = results[0].boxes.xywh.cpu()
track_ids = results[0].boxes.id.int().cpu().tolist()
for box, track_id in zip(boxes, track_ids):
x, y, w, h = box
track = track_history[track_id]
track.append((float(x), float(y)))
if len(track) > 30:
track.pop(0)
# Draw tracking line
points = np.array(track).astype(np.int32).reshape((-1, 1, 2))
cv2.polylines(frame, [points], False, (230, 230, 230), 2)
Region of Interest (ROI) Detection:
def detect_in_roi(frame, model, roi):
x, y, w, h = roi
roi_frame = frame[y:y+h, x:x+w]
results = model(roi_frame)
# Adjust coordinates back to full frame
for box in results[0].boxes:
box.xyxy[0][0] += x
box.xyxy[0][1] += y
box.xyxy[0][2] += x
box.xyxy[0][3] += y
return results
Performance Optimization
1. Model Selection
Choose model size based on requirements:
- YOLOv8n: Fastest, for edge devices (1.8ms)
- YOLOv8s: Balanced (2.1ms)
- YOLOv8m: Better accuracy (3.5ms)
- YOLOv8l: High accuracy (5.8ms)
- YOLOv8x: Best accuracy (8.3ms)
2. Inference Optimization
# Use half precision (FP16)
model = YOLO('yolov8n.pt')
model.to('cuda')
model.fuse() # Fuse Conv2d + BatchNorm
# Optimize for specific input size
model.warmup(imgsz=(1, 3, 640, 640))
# Batch processing
results = model(['frame1.jpg', 'frame2.jpg', 'frame3.jpg'], batch=3)
3. TensorRT Acceleration
For NVIDIA GPUs:
# Export to TensorRT
model.export(format='engine', device=0)
# Load TensorRT model
trt_model = YOLO('yolov8n.engine')
results = trt_model(frame) # 3-5x faster inference
4. Multi-Threading
from threading import Thread
from queue import Queue
class VideoCapture:
def __init__(self, source):
self.cap = cv2.VideoCapture(source)
self.q = Queue(maxsize=3)
self.running = True
t = Thread(target=self._reader)
t.daemon = True
t.start()
def _reader(self):
while self.running:
ret, frame = self.cap.read()
if not ret:
break
if not self.q.full():
self.q.put(frame)
def read(self):
return self.q.get()
def release(self):
self.running = False
self.cap.release()
Production Deployment
Docker Container
FROM ultralytics/ultralytics:latest
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
CMD ["python", "detection_server.py"]
REST API with FastAPI
from fastapi import FastAPI, File, UploadFile
from ultralytics import YOLO
import cv2
import numpy as np
app = FastAPI()
model = YOLO('yolov8n.pt')
@app.post("/detect")
async def detect_objects(file: UploadFile = File(...)):
contents = await file.read()
nparr = np.frombuffer(contents, np.uint8)
img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
results = model(img)
detections = []
for box in results[0].boxes:
detections.append({
'class': model.names[int(box.cls)],
'confidence': float(box.conf),
'bbox': box.xyxy[0].tolist()
})
return {'detections': detections}
Edge Deployment (Raspberry Pi)
# Use lightweight model
model = YOLO('yolov8n.pt')
# Optimize for ARM
model.export(format='ncnn') # NCNN framework for mobile
# Load NCNN model
ncnn_model = YOLO('yolov8n_ncnn_model')
Real-World Applications
- Retail Analytics: Customer counting, heat mapping, queue detection
- Security: Intrusion detection, abandoned object detection
- Manufacturing: Defect detection, quality control
- Autonomous Vehicles: Pedestrian detection, traffic sign recognition
- Healthcare: Medical imaging analysis, patient monitoring
Performance Benchmarks
Testing on NVIDIA RTX 3080:
- Resolution: 1920x1080
- Model: YOLOv8n
- FPS: 145 (TensorRT), 85 (PyTorch)
- Latency: 6.9ms (TensorRT), 11.8ms (PyTorch)
- Accuracy (mAP): 52.3%
Challenges and Solutions
Challenge: False positives in crowded scenes Solution: Adjust confidence threshold, use NMS tuning, post-processing filters
Challenge: Small object detection Solution: Use higher resolution inputs, multi-scale training, attention mechanisms
Challenge: Real-time processing on edge devices Solution: Model quantization, pruning, knowledge distillation
Conclusion
YOLO and OpenCV provide a powerful combination for building real-time object detection systems. By choosing the right model variant, optimizing inference, and implementing proper deployment strategies, you can achieve both high accuracy and low latency.
The key is balancing accuracy requirements with computational constraints for your specific use case. Start with a baseline implementation, profile performance, and iteratively optimize the bottlenecks.

