Computer Vision Applications with OpenCV and TensorFlow (2025)

Oct 26, 2025•

computer-visionopencvtensorflowmlops

•

Computer vision remains foundational for automation in 2025—from document workflows and retail analytics to robotics and healthcare. This guide covers practical CV tasks, modern architectures, deployment patterns, and production considerations using OpenCV and TensorFlow.

Executive summary

Use classical CV (OpenCV) for low-latency, deterministic pre/post-processing and simple tasks
Use deep learning (TensorFlow) for detection/segmentation/recognition; combine with classical CV for robustness
Focus on reproducible pipelines, drift monitoring, and hardware-aware optimization (CPU/GPU/Edge TPU)

Common tasks and solutions

Object detection (retail loss prevention, shelf analytics)

Models: EfficientDet, YOLOv8/YOLO-NAS (via TF/TFLite or ONNX)
Tips: anchor-free variants, small backbones for edge devices, mixed precision

# TensorFlow inference (simplified)
import tensorflow as tf
import numpy as np

model = tf.saved_model.load("./detector")
@tf.function
def infer(img):
    out = model(img, training=False)
    return out

def preprocess(frame):
    x = tf.image.resize(frame, (640, 640)) / 255.0
    return tf.expand_dims(x, 0)

Semantic/instance segmentation (manufacturing defect detection)

Models: DeepLabV3+, Mask R-CNN; export to TFLite/EdgeTPU when possible
Metrics: mIoU, PQ; monitor per-class performance

OCR and document understanding (back-office automation)

Pipeline: OpenCV binarization → text detection (DB/EAST) → recognition (CRNN/Transformer)
Use layout models (LayoutLMv3/DocTr) for forms/tables; validate with business rules

# OpenCV pre-processing for OCR
import cv2

def preprocess_for_ocr(img):
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                cv2.THRESH_BINARY,11,2)
    den = cv2.fastNlMeansDenoising(thr)
    return den

Tracking (people/vehicle)

Detector + tracker (DeepSORT/ByteTrack); re-identification for handoff across cameras
Respect privacy: anonymize faces/plates; sampling and on-device processing

Embeddings and retrieval

Visual search: CLIP/ViT embeddings → vector DB (Qdrant/Pinecone)
Multi-modal RAG: combine vision embeddings with text for product support or inventory search

Deployment patterns

Edge: TFLite/EdgeTPU, quantization, half-precision, fused ops
Cloud: GPU autoscaling, batching (triton), A/B for models
Hybrid: run pre/post on edge, heavy model in cloud; cache results

MLOps considerations

Dataset versioning; synthetic data augmentation
Drift monitoring (brightness, blur, class frequency); periodic re-labeling
Cost: pre-filter frames with classical CV; gate DL inference by motion

Security and privacy

On-device redaction (blur/anonymize) before upload
Access controls for video data; retention policies; audit logs

FAQ

Q: When is OpenCV alone sufficient?
A: When tasks are geometric or threshold-based (barcode, simple alignment, morphology). Use DL when variability is high.

Executive Summary

This production-focused guide covers end-to-end Computer Vision (CV) systems in 2025: OpenCV pipelines, modern deep models (detection, segmentation, tracking, OCR), dataset tooling, training/evaluation, and deployment to edge and cloud with monitoring, cost control, and governance.

Image I/O and Preprocessing (OpenCV)

import cv2
img = cv2.imread('image.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img, (640, 640))
blur = cv2.GaussianBlur(resized, (5,5), 0)
norm = (blur/255.0).astype('float32')

Augmentation

import numpy as np

def rand_flip(img):
    return cv2.flip(img, 1) if np.random.rand() < 0.5 else img

Classical Methods

Edges and Contours

edges = cv2.Canny(img, threshold1=100, threshold2=200)
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Morphology

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilated = cv2.dilate(edges, kernel, iterations=1)

Feature Extraction (ORB)

orb = cv2.ORB_create(nfeatures=1000)
kp, des = orb.detectAndCompute(img, None)

Object Detection (YOLO/SSD)

# Ultralytics YOLOv8 (example)
from ultralytics import YOLO
model = YOLO('yolov8n.pt')
results = model('image.jpg')
boxes = results[0].boxes.xyxy.cpu().numpy()

SSD in TensorFlow (sketch)

import tensorflow as tf
inputs = tf.keras.Input(shape=(300,300,3))
# ... SSD backbone and heads ...
model = tf.keras.Model(inputs, outputs)

Segmentation (U-Net/DeepLab)

import tensorflow as tf

def unet(input_shape=(256,256,3), num_classes=1):
    inputs = tf.keras.Input(input_shape)
    c1 = tf.keras.layers.Conv2D(64,3,activation='relu',padding='same')(inputs)
    c1 = tf.keras.layers.Conv2D(64,3,activation='relu',padding='same')(c1)
    p1 = tf.keras.layers.MaxPool2D()(c1)
    # ... more blocks ...
    outputs = tf.keras.layers.Conv2D(num_classes,1,activation='sigmoid')(c1)
    return tf.keras.Model(inputs, outputs)

Tracking (KCF/CSRT/ByteTrack)

tracker = cv2.legacy.TrackerCSRT_create()
tracker.init(img, (x, y, w, h))
ok, box = tracker.update(next_frame)

# ByteTrack (pseudo)
# 1) run detector → detections
# 2) match with motion model → tracks

OCR (Tesseract/TrOCR)

tesseract image.jpg out --oem 1 --psm 6

# TrOCR via transformers
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
proc = TrOCRProcessor.from_pretrained('microsoft/trocr-base-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-base-printed')

TensorFlow/Keras CNNs

import tensorflow as tf
model = tf.keras.Sequential([
  tf.keras.layers.Input((224,224,3)),
  tf.keras.layers.Conv2D(32,3,activation='relu'),
  tf.keras.layers.MaxPool2D(),
  tf.keras.layers.Conv2D(64,3,activation='relu'),
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(10,activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

PyTorch Models

import torch, torch.nn as nn
class SmallCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(3,32,3,padding=1), nn.ReLU(), nn.MaxPool2d(2),
            nn.Conv2d(32,64,3,padding=1), nn.ReLU(), nn.AdaptiveAvgPool2d(1)
        )
        self.fc = nn.Linear(64, 10)
    def forward(self, x):
        x = self.net(x).view(x.size(0),-1)
        return self.fc(x)

Training Loop

model = SmallCNN(); opt = torch.optim.Adam(model.parameters(), 1e-3)
for x, y in loader:
    yhat = model(x)
    loss = nn.CrossEntropyLoss()(yhat, y)
    opt.zero_grad(); loss.backward(); opt.step()

Datasets and Data Loaders

from torch.utils.data import Dataset, DataLoader
class Images(Dataset):
    def __init__(self, paths): self.paths = paths
    def __len__(self): return len(self.paths)
    def __getitem__(self, i):
        img = cv2.imread(self.paths[i]); img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (224,224)); img = img.transpose(2,0,1)/255.0
        return torch.tensor(img, dtype=torch.float32), 0
loader = DataLoader(Images(paths), batch_size=32, shuffle=True)

Evaluation Metrics (IoU, mAP)

import numpy as np

def iou(boxA, boxB):
    xA, yA = max(boxA[0], boxB[0]), max(boxA[1], boxB[1])
    xB, yB = min(boxA[2], boxB[2]), min(boxA[3], boxB[3])
    inter = max(0, xB-xA) * max(0, yB-yA)
    a = (boxA[2]-boxA[0])*(boxA[3]-boxA[1]); b = (boxB[2]-boxB[0])*(boxB[3]-boxB[1])
    return inter / (a+b-inter+1e-9)

# mAP sketch: compute precision-recall per class and average AP

Deployment: ONNX/TensorRT/TFLite

import onnxruntime as ort
sess = ort.InferenceSession('model.onnx', providers=['CUDAExecutionProvider','CPUExecutionProvider'])
outputs = sess.run(None, { 'input': input_array })

trtexec --onnx=model.onnx --saveEngine=model.plan --fp16

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tl = converter.convert()

KServe/Triton Serving

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata: { name: vision, namespace: ml }
spec:
  predictor:
    triton:
      storageUri: s3://bucket/models/vision
      runtimeVersion: 23.09
      resources: { limits: { nvidia.com/gpu: 1 } }

REST/gRPC APIs

from fastapi import FastAPI, UploadFile
import numpy as np
app = FastAPI()
@app.post('/predict')
async def predict(file: UploadFile):
    img = cv2.imdecode(np.frombuffer(await file.read(), np.uint8), cv2.IMREAD_COLOR)
    # preprocess → run model → postprocess
    return { 'boxes': [], 'masks': [] }

Monitoring (Prometheus/OTEL)

import client from 'prom-client'
const latency = new client.Histogram({ name: 'cv_latency_seconds', help: 'latency', buckets: [0.01,0.05,0.1,0.2,0.5,1] })

span.setAttributes({ 'model': 'yolov8n', 'res': '640x640', 'ttft_ms': 42 })

Dashboards and Alerts

histogram_quantile(0.95, sum(rate(cv_latency_seconds_bucket[5m])) by (le))

groups:
- name: cv
  rules:
  - alert: HighLatency
    expr: histogram_quantile(0.95, sum(rate(cv_latency_seconds_bucket[5m])) by (le)) > 0.3
    for: 10m
    labels: { severity: page }

Security and Privacy

# Face blurring
faces = detector.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
    roi = img[y:y+h, x:x+w]
    img[y:y+h, x:x+w] = cv2.GaussianBlur(roi, (51,51), 0)

MLOps for CV (Airflow/Dagster)

# Airflow DAG for data refresh → train → eval → deploy

JSON-LD

Call to Action

Need help building and deploying CV systems? We design pipelines, train models, and ship production-ready services with monitoring and privacy.

Extended FAQ (1–120)

How to improve detection speed?
Smaller models, lower input resolution, TensorRT.
Best augmentation for detection?
Random crop/flip, mosaic, color jitter.
How to annotate fast?
Use tools (CVAT, Label Studio); keyboard shortcuts.
IoU threshold for mAP?
Common: 0.5 and 0.5:0.95.
When to use segmentation?
Precise shape/area; instance vs semantic.
Trackers choice?
CSRT for accuracy; KCF for speed; ByteTrack for SOTA with detections.
Batch size tuning?
Max without OOM; watch p95.
Edge device constraints?
Quantize; prune; smaller inputs.
How to handle blur/low light?
Denoise, gamma correction, fine-tune on similar data.
Camera calibration?
Use chessboard patterns; compute intrinsics.

... (add 110+ more practical questions on datasets, training, evaluation, deployment, monitoring, privacy)

Datasets and Labeling

# COCO format directory structure
train/
  images/
  annotations/instances_train.json
val/
  images/
  annotations/instances_val.json

# Label Studio quickstart
docker run -it -p 8080:8080 heartexlabs/label-studio:latest

# CVAT (server)
docker compose up -d

Augmentation with Albumentations

import albumentations as A
from albumentations.pytorch import ToTensorV2

train_tfms = A.Compose([
  A.RandomResizedCrop(640, 640, scale=(0.6,1.0)),
  A.HorizontalFlip(p=0.5),
  A.ColorJitter(0.2,0.2,0.2,0.1,p=0.5),
  A.MotionBlur(p=0.2),
  A.Normalize(),
  ToTensorV2()
], bbox_params=A.BboxParams(format='yolo', label_fields=['labels']))

tf.data Pipelines

import tensorflow as tf

def parse(example):
    img = tf.io.read_file(example['image_path'])
    img = tf.io.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, (640,640))
    img = tf.cast(img, tf.float32)/255.0
    return img, example['labels']

ds = tf.data.Dataset.from_tensor_slices(records).map(parse, num_parallel_calls=tf.data.AUTOTUNE).batch(32).prefetch(tf.data.AUTOTUNE)

PyTorch Lightning Training (Detection)

import pytorch_lightning as pl
class DetModule(pl.LightningModule):
    def __init__(self, model):
        super().__init__(); self.model = model
    def training_step(self, batch, _):
        x, y = batch; out = self.model(x); loss = out['loss']
        self.log('train/loss', loss); return loss
    def configure_optimizers(self):
        return torch.optim.AdamW(self.parameters(), 1e-4)

Segmentation Training (Lightning)

class SegModule(pl.LightningModule):
    def __init__(self, net): super().__init__(); self.net = net
    def training_step(self, batch, _):
        x, y = batch; yhat = self.net(x); loss = dice_bce_loss(yhat, y)
        self.log('train/loss', loss); return loss

mAP Computation over Dataset

# COCO mAP via pycocotools
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
cocoGt = COCO('instances_val.json')
cocoDt = cocoGt.loadRes('results.json')
eval = COCOeval(cocoGt, cocoDt, 'bbox'); eval.evaluate(); eval.accumulate(); eval.summarize()

Benchmarking Scripts

import time
import numpy as np

lat = []
for img in batch_images:
    t0=time.time(); model(img); lat.append(time.time()-t0)
print('p95', np.percentile(lat, 95))

Video Ingestion (GStreamer)

gst-launch-1.0 filesrc location=input.mp4 ! decodebin ! videoconvert ! videoscale ! video/x-raw,width=640,height=640 ! appsink

Multi-Object Tracking: SORT/DeepSORT/ByteTrack

# SORT skeleton
tracks = []
for frame in frames:
    dets = detect(frame)
    tracks = associate(tracks, dets)  # IOU matching + Kalman predict/update

# DeepSORT uses appearance embeddings for robust matching

# ByteTrack — keep low-score dets for association; improves recall

Export and Quantization

# ONNX QDQ (quantization-aware)
from onnxruntime.quantization import quantize_dynamic
quantize_dynamic('model.onnx', 'model.int8.onnx', optimize_model=True)

# Post-Training Quantization (PTQ) for TFLite
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = rep_ds
int8 = converter.convert()

TensorRT Pipelines

trtexec --onnx=model.onnx --saveEngine=model.plan --fp16 --workspace=4096 --shapes=input:1x3x640x640

Advanced KServe: Explainer and Transformer

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata: { name: vision, namespace: ml }
spec:
  predictor:
    triton: { storageUri: s3://models/vision }
  explainer:
    alibi: { type: AnchorImages }
  transformer:
    containers:
      - image: registry/vision-preprocess:1.0

Airflow/Dagster Pipelines

# Airflow DAG: ingest → augment → train → eval → deploy

# Dagster job: retrain on data drift

Helm Values / Terraform Infra

# values.yaml (gpu deployment)
resources:
  limits: { nvidia.com/gpu: 1, cpu: 2, memory: 8Gi }
  requests: { cpu: 1, memory: 4Gi }
nodeSelector: { gpu: "true" }

resource "aws_eks_node_group" "gpu" {
  instance_types = ["g5.xlarge"]
  scaling_config { desired_size = 2, max_size = 6, min_size = 1 }
}

Grafana Dashboards / PromQL

# p95 latency
histogram_quantile(0.95, sum(rate(cv_latency_seconds_bucket[5m])) by (le))
# FPS by model
sum by (model) (rate(frames_processed_total[1m]))

Alerting and Runbooks

groups:
- name: cv-ops
  rules:
  - alert: FrameDrop
    expr: rate(frames_processed_total[1m]) < 10
    for: 10m
    labels: { severity: ticket }

Runbook: FrameDrop
- Check input pipeline (GStreamer)
- Verify GPU utilization and batch size
- Restart transformer pod; warm cache

Extended FAQ (121–260)

mAP vs F1?
Use mAP for detection; F1 for simple classification.
Label imbalance?
Class-balanced sampling; loss weighting.
Long-tailed classes?
Focal loss; re-sampling; fine-tuning.
Video vs image performance?
Batch across frames; reuse pre-processing.
NMS tuning?
IoU threshold and score threshold sweeps.
Small object detection?
Higher input res; anchor tuning; specialized models.
Panoptic segmentation?
Combine instance + semantic outputs.
Tracking drift?
Periodic re-detection; appearance features.
OCR accuracy?
Binarization, deskewing, language models.
GPU memory OOM?
Smaller batch; FP16; gradient checkpointing.
Dataset versioning?
DVC/LakeFS; record hashes.
Synthetic data?
Good for rare classes; label clearly.
Augmentations too strong?
Ablation study; dial back.
Calibration?
Temperature scaling per class.
Edge camera streams?
RTSP; pre-process on device.
Privacy laws?
Blur faces/plates; consent.
Model zoo sprawl?
Registry and owners.
FPS targets?
Define by route; test p95.
Profiling?
Nsight Systems, PyTorch profiler.
Canary deploys?
Shadow first; then small %.
Data drift detection?
KS test on features; alert.
Retraining cadence?
Monthly or on drift.
Transform latency?
Optimize I/O and color conversions.
Normalize color spaces?
Consistent BGR/RGB handling.
Metadata in outputs?
Include confidence and class ids.
Batch inference?
Yes—on GPU for throughput.
PTQ vs QAT?
QAT better accuracy; PTQ faster to ship.
TensorRT DLA?
Use if available for offloading.
Mixed precision?
FP16; validate accuracy.
Post-processing CPU bound?
Vectorize; C++/Rust kernels.

Pose Estimation (MediaPipe/OpenPose)

# MediaPipe Pose
import cv2, mediapipe as mp
mp_pose = mp.solutions.pose
with mp_pose.Pose(static_image_mode=False, model_complexity=1, enable_segmentation=False) as pose:
    cap = cv2.VideoCapture(0)
    while True:
        ok, f = cap.read();
        if not ok: break
        f_rgb = cv2.cvtColor(f, cv2.COLOR_BGR2RGB)
        res = pose.process(f_rgb)
        if res.pose_landmarks:
            for lm in res.pose_landmarks.landmark:
                x, y = int(lm.x * f.shape[1]), int(lm.y * f.shape[0])
                cv2.circle(f, (x,y), 2, (0,255,0), -1)
        cv2.imshow('pose', f)
        if cv2.waitKey(1) == 27: break

Keypoint Detection (HRNet)

# pseudo: HRNet inference for keypoints
def infer_keypoints(img):
    tensor = preprocess(img)
    heatmaps = hrnet(tensor)
    kpts = decode_peaks(heatmaps)
    return kpts

Re-Identification (ReID)

# extract embeddings and compare via cosine similarity
emb1 = reid_model(crop1); emb2 = reid_model(crop2)
sim = (emb1 @ emb2.T) / (np.linalg.norm(emb1)*np.linalg.norm(emb2))

Multi-Camera Tracking

# synchronize timestamps, project to common plane, fuse tracks by reID + geometry

3D Vision: Stereo Depth, SfM, COLMAP

# COLMAP pipeline
echo "feature_extractor" && colmap feature_extractor --database_path db.db --image_path images
colmap exhaustive_matcher --database_path db.db
mkdir -p sparse && colmap mapper --database_path db.db --image_path images --output_path sparse
colmap image_undistorter --image_path images --input_path sparse/0 --output_path dense --output_type COLMAP
colmap patch_match_stereo --workspace_path dense --PatchMatchStereo.geom_consistency true
colmap stereo_fusion --workspace_path dense --output_path dense/fused.ply

Camera Calibration and Rectification

import cv2, numpy as np
objp = np.zeros((6*7,3), np.float32); objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
objpoints, imgpoints = [], []
for fname in images:
    img = cv2.imread(fname); gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    ret, corners = cv2.findChessboardCorners(gray, (7,6), None)
    if ret:
        objpoints.append(objp); imgpoints.append(corners)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# rectify stereo
R1,R2,P1,P2,Q,roi1,roi2 = cv2.stereoRectify(K1,D1,K2,D2,image_size,R,T)

Geometric Vision (PnP/Essential)

# PnP
oK, rvec, tvec = cv2.solvePnP(object_points, image_points, K, dist)
# Essential matrix
E, mask = cv2.findEssentialMat(pts1, pts2, K, method=cv2.RANSAC, prob=0.999, threshold=1.0)

Video Pipelines (FFmpeg/GStreamer)

ffmpeg -i input.mp4 -vf scale=640:-1 -r 30 -c:v libx264 -preset fast -crf 22 out.mp4

gst-launch-1.0 rtspsrc location=rtsp://camera ! rtph264depay ! avdec_h264 ! videoconvert ! appsink

Edge Deployment: NVIDIA Jetson

# Dockerfile.jetson
FROM nvcr.io/nvidia/l4t-ml:r35.3.1-py3
RUN python3 -m pip install --no-cache-dir onnxruntime-gpu==1.17.0 opencv-python-headless
COPY app.py /app/app.py
CMD ["python3","/app/app.py"]

# TensorRT conversion via trtexec
trtexec --onnx=model.onnx --saveEngine=model.plan --fp16 --workspace=4096

Mobile: TFLite/NNAPI/CoreML

# Android NNAPI delegate
import tensorflow as tf
interpreter = tf.lite.Interpreter(model_path='model.tflite', experimental_delegates=[tf.lite.experimental.load_delegate('libnnapi_delegate.so')])

// iOS CoreML
let model = try! MyVisionModel(configuration: MLModelConfiguration())

Web: onnxruntime-web/WebGL

<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
<script>
(async () => {
  const session = await ort.InferenceSession.create('/model.onnx', { executionProviders: ['webgl'] })
  const input = new ort.Tensor('float32', new Float32Array(3*224*224), [1,3,224,224])
  const out = await session.run({ input })
  console.log(out)
})()
</script>

Synthetic Data (Blender)

# blender_python.py (run with blender --python blender_python.py)
import bpy, random
# Load scene, randomize lights/materials/camera; render dataset

Dataset QA/Curation

import json
bad = []
for ln in open('labels.jsonl'):
    o = json.loads(ln)
    if any(c for c in o['boxes'] if c['x2']<=c['x1'] or c['y2']<=c['y1']): bad.append(o['id'])
print('invalid boxes', len(bad))

Hyperparameter Sweeps

for lr in [1e-4, 2e-4, 5e-4]:
    for bs in [16, 32]:
        run = train(lr=lr, batch_size=bs)
        log({'lr': lr, 'bs': bs, 'mAP': run.map})

Distributed Training (DDP)

python -m torch.distributed.run --nproc_per_node=4 train.py --epochs 50 --batch 16

# train.py (DDP init)
import torch.distributed as dist
dist.init_process_group(backend='nccl')

Mixed Precision

scaler = torch.cuda.amp.GradScaler()
for x,y in loader:
    with torch.cuda.amp.autocast():
        yhat = model(x); loss = criterion(yhat,y)
    scaler.scale(loss).backward(); scaler.step(opt); scaler.update(); opt.zero_grad()

Detectron2/MMDetection

# Detectron2 training
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
cfg = get_cfg(); cfg.merge_from_file('configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml')
cfg.DATASETS.TRAIN = ('my_train',); cfg.DATASETS.TEST = ('my_val',)
trainer = DefaultTrainer(cfg); trainer.resume_or_load(resume=False); trainer.train()

# MMDetection config snippet
model = dict(type='FasterRCNN', backbone=dict(type='ResNet', depth=50))

DeepLabv3+ / Mask R-CNN Training

# DeepLabv3+ in PyTorch
import torchvision
model = torchvision.models.segmentation.deeplabv3_resnet50(num_classes=21)

KServe Transformer/Explainer

# transformer.py
from kserve import Model, ModelServer
class PrePost(Model):
    def __init__(self, name: str): super().__init__(name)
    async def preprocess(self, payload: dict):
        # decode base64 images
        return payload
    async def postprocess(self, infer_output: dict):
        # NMS, thresholding, formatting
        return infer_output
ModelServer().start(models=[PrePost('vision')])

# explainer (alibi anchor image)

gRPC Client

import grpc
# create stub, send image bytes, receive detections

Observability (OTEL Spans/Metrics)

span.addEvent('preprocess', { ms: 5 })
span.addEvent('inference', { ms: 18, provider: 'triton' })
span.addEvent('postprocess', { ms: 7 })

const fps = new client.Gauge({ name: 'cv_fps', help: 'frames/sec', labelNames: ['model'] })

Cost Models (GPU Hours)

model,resolution,batch,throughput_fps,gpu,usd_per_hour,usd_per_million_frames
yolov8n,640,1,45,RTXA5000,2.30,51.1

SRE Runbooks (Expanded)

Latency Spike
- Verify input stream and decode
- Check GPU utilization and thermals
- Reduce resolution; switch to FP16 engine

Accuracy Drop
- Inspect data drift; re-run eval; rollback engine

Extended FAQ (261–420)

Batch across cameras?
Yes—micro-batching per tick; beware re-ordering.
Memory fragmentation?
Pre-allocate buffers; reuse tensors.
JPEG decode cost?
NVJPEG or hardware decoders.
Camera sync?
PTP/NTP and timestamp alignment.
Tracker ID switches?
Use reID embeddings and smoothing.
PTZ cameras?
Re-detect on big motions; re-calibrate.
Fog/rain?
Train on adverse conditions; dehaze filters.
Thermal cameras?
Different preprocessing; normalize ranges.
On-device privacy?
Blur before transmit; store hashes only.
Long videos?
Chunk processing; checkpoint.
Multi-stream on Jetson?
DeepStream pipelines.
Quantization pitfalls?
Calibrate with representative data.
Segmentation holes?
Morphology close; CRF postprocess.
OCR multilingual?
Load language packs; fallback models.
Edge storage?
Circular buffers; retention policies.
Model drift alerts?
IoU drop; false positive rates.
Web streaming?
WebRTC for low-latency.
CDN for models?
Versioned artifacts; cache.
Security hardening?
Non-root, read-only FS, egress allowlists.
Offline inference?
Queue results; sync later.
Class imbalance?
Focal loss; weighted sampling.
Latency budget?
Split preprocess/infer/post; profile.
CPU fallback?
yes4small models; warn users.
Mixed resolutions?
Resize consistently; pad letterbox.
DDP pitfalls?
Sync BN; gradient accumulation.
Heatmaps visualization?
Color maps; overlays.
Fine-tuning schedule?
Lower LR; freeze backbone initially.
Export sanity?
Run parity tests ONNX vs native.
Runtime choices?
ORT/TensorRT/OpenVINO—benchmark.
Camera dropouts?
Timeouts and reconnection logic.
GStreamer caps?
Match formats to avoid conversions.
Time to first frame?
Warm engines; cache.
Annotation drift?
Periodic QA; annotator training.
Bounding box encoding?
Consistent formats; convert utilities.
Model registry?
Tags, owners, changelogs.
Canary gates?
mAP >= baseline-1%, latency p95 < 350ms.
False positives?
Hard negative mining.
False negatives?
Augment; adjust thresholds.
Confusion classes?
Merge or separate with more data.
When done?
Stable SLOs; incidents trending down.

Face Recognition (ArcFace/FaceNet) Embeddings

# Face embedding extraction (FaceNet-like)
import torch
from torchvision import transforms

pre = transforms.Compose([transforms.Resize((160,160)), transforms.ToTensor(), transforms.Normalize([0.5]*3, [0.5]*3)])

def embed_face(img_bgr):
    img = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    ten = pre(Image.fromarray(img)).unsqueeze(0).cuda()
    with torch.no_grad():
        vec = facenet(ten)
    return torch.nn.functional.normalize(vec, dim=1).cpu().numpy()

# Verify identity by cosine similarity
sim = (e1 @ e2.T) / (np.linalg.norm(e1)*np.linalg.norm(e2))
if sim > 0.6: print('match')

Attribute Classification (Age/Gender/Helmet/PPE)

# multi-head classifier
class AttrNet(nn.Module):
    def __init__(self, backbone):
        super().__init__(); self.backbone = backbone
        self.head_gender = nn.Linear(512, 2)
        self.head_helmet = nn.Linear(512, 2)
    def forward(self, x):
        f = self.backbone(x)
        return { 'gender': self.head_gender(f), 'helmet': self.head_helmet(f) }

Industrial Inspection Pipelines (Defect Detection)

# classical: background subtraction + morphology + contour area thresholds

# deep: segmentation of defects (U-Net), report percentages per ROI

Anomaly Detection with Autoencoders

class AE(nn.Module):
    def __init__(self):
        super().__init__()
        self.enc = nn.Sequential(nn.Conv2d(3,16,3,2,1), nn.ReLU(), nn.Conv2d(16,32,3,2,1), nn.ReLU())
        self.dec = nn.Sequential(nn.ConvTranspose2d(32,16,4,2,1), nn.ReLU(), nn.ConvTranspose2d(16,3,4,2,1), nn.Sigmoid())
    def forward(self, x):
        z = self.enc(x); return self.dec(z)

# score = reconstruction error
with torch.no_grad():
    y = ae(x)
    score = ((x - y)**2).mean(dim=[1,2,3])

Active Learning Loops

# uncertainty sampling: pick low-confidence detections for manual labeling
uncertain = [sample for sample in pool if max(model.predict_proba(sample)) < 0.6]
label_queue.extend(uncertain[:100])

Loop: infer → select uncertain → label → retrain → eval → deploy

Semi-Supervised / Self-Training

# pseudo-labels with confidence threshold
y_hat = model(x_unlabeled)
conf = y_hat.max(1).values
mask = conf > 0.9
train_ds.extend((x_unlabeled[mask], y_hat[mask].argmax(1)))

Online Evaluation and Probes

// synthetic probes to verify pipeline health
cron.schedule('*/10 * * * *', async () => {
  const img = await fetchSample()
  const t0 = Date.now(); const out = await predict(img)
  metrics.observe('cv_latency_seconds', (Date.now()-t0)/1000)
  metrics.inc('cv_success_total')
})

CI/CD for CV Models

name: cv-ci
on: [push]
jobs:
  train-eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.10' }
      - run: pip install -r requirements.txt
      - run: python -m tools/export_onnx.py --weights runs/best.pt --out model.onnx
      - run: python -m tools/eval_map.py --ann val.json --pred results.json --gate map>=0.40
      - uses: actions/upload-artifact@v4
        with: { name: model, path: model.onnx }
  deploy:
    needs: train-eval
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4
        with: { name: model, path: model }
      - run: helm upgrade --install vision charts/vision -f values/prod.yaml --wait

Edge Fleet Management (Jetson OTA)

# Mender/OTA example (concept)
mender-artifact write module-image -T docker -n vision-1.2.0 -t jetson-xavier -o vision.mender -f vision.tar

- Tag devices by site and model version
- Roll out in waves; auto-rollback on failure

Privacy and Compliance SOPs (CV)

- Blur faces and license plates by default in public contexts
- Store only derived features (hashes) when possible
- Retention: raw frames < 24h unless legally required
- Access controls and immutable audit trails

ROI and Cost Modeling

scenario,cameras,resolution,fps,gpu_nodes,cost_usd_month
warehouse,120,1280x720,15,4,8200
retail,40,1920x1080,30,3,6100

- Optimize by lowering resolution/fps where acceptable
- Batch inference and share GPUs across streams

Extended Runbooks

Tracker Instability
- Increase detection frequency; adjust IOU thresholds
- Enable reID embeddings; smooth trajectories

OCR Errors
- Improve binarization/deskew; language models; whitelist fonts

Extended FAQ (421–520)

Calibration for detectors?
Temperature scaling on logits; per class.
Drastic lighting changes?
Auto exposure and training on varied conditions.
Rolling shutter artifacts?
Use global shutter cameras if critical.
Label noise?
Consensus labeling; noise-robust losses.
Cross-domain transfer?
Fine-tune on small target set; domain adaptations.
Multi-label classification?
Sigmoid outputs; threshold per class.
Coordinate systems?
Normalize to image size; document.
Serialization format?
COCO JSON/TFRecord for scalability.
Post-processing speed?
Vectorize NMS; CUDA kernels if needed.
GPU watchdog timeouts?
Split batches; check long kernels.
Gaps in video?
Interpolate; flag missing for SRE.
Snow/rain occlusion?
Augment and use deweathering nets.
Edge vs cloud trade-offs?
Latency/privacy vs scale/flexibility.
Night vision?
IR lights; specialized models.
Pose skeleton smoothing?
Temporal filters; Kalman.
Upscaling?
ESRGAN; cost vs benefit.
3D reconstruction scale?
Need known baseline or scale constraints.
Camera vignetting?
Calibrate and correct.
Rolling code updates?
Feature flags and staged rollouts.
Dataset sprawl?
Registry with hashes and owners.
Target FPS KPI?
Define per route; validate.
Long-term storage?
Compressed metadata and event clips.
Blur performance?
GPU filters and ROI-only blurs.
Object sizes?
Multi-scale anchors; FPN backbones.
Drone footage?
High-motion augmentations; stabilization.
Water reflections?
Polarizers; training data variety.
Latency telemetry?
TTFT and per-stage metrics.
Hash-based tracking?
Visual hashing; caution on collisions.
License compliance?
Track models/datasets licenses.
Final check?
Healthy SLOs, costs in bounds, privacy enforced.