Topics
Browse posts by category and tag — every topic we cover, with the latest pieces under each.
Tags
- #content-moderation 10
- #llm-safety 9
- #production 5
- #classifier 2
- #image-moderation 2
- #llama-guard 2
- #meta 2
- #safety-classifier 2
- #accuracy 1
- #ai-safety 1
- #api-review 1
- #architecture 1
- #benchmark 1
- #conversation-control 1
- #cost 1
- #ensemble 1
- #false-positives 1
- #google-jigsaw 1
- #guardrails 1
- #multimodal 1
- #nemo-guardrails 1
- #nvidia 1
- #openai-moderation 1
- #ops 1
- #perspective-api 1
- #prompt-injection 1
- #rag 1
- #retrieval-augmented-generation 1
- #text-moderation 1
- #toxicity-detection 1
- #trust-and-safety 1
- #user-experience 1
- #video-moderation 1
Categories
ops 4 posts
- Fine-Tuned Classifiers vs. Off-the-Shelf Moderation APIs: Cost & TradeoffsOff-the-shelf moderation APIs are cheap to start and expensive to outgrow. Fine-tuned classifiers are the reverse.
- Content Moderation for RAG: The Retrieval Layer Is an Attack PathRAG pipelines have a moderation problem at the retrieval layer that input/output classifiers don't address. Injected content in retrieved documents can
- Classifier Ensembles for Production Content ModerationSingle classifiers have characteristic failure modes. Ensembles that combine models with different architectures and training distributions reduce
- False Positive Costs in Content Moderation: How to Measure ThemFalse positives in content moderation drive hidden costs: user abandonment, review-queue spend, appeal load. Learn how to quantify them and calibrate
reviews 4 posts
- Perspective API: Good at Its Original Job, Wrong for LLM SafetyJigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong.
- OpenAI Moderation API Review: Strengths and Real GapsAn honest OpenAI Moderation API review: fast (~20ms) and free with credits, strong category breadth, but predictable gaps on obfuscated text, context, and
- Llama Guard Benchmark Review: Real Performance vs. Vendor ClaimsMeta's Llama Guard series has become a default choice for open-source content moderation. Benchmarks on the standard test sets look strong.
- NeMo Guardrails in Production: What It Does Well; Where It FailsNVIDIA's NeMo Guardrails offers conversation-flow control that classifiers can't provide. The deployment complexity is real.
guides 2 posts
- Image & Video Content Moderation Tools (2026)Text moderation gets the attention, but image and video are where the hard moderation problems live. A practitioner's map of the major tools — cloud APIs
- Llama Guard vs Llama Guard 2 vs Llama Guard 3: The Lineage, ClarifiedMeta's Llama Guard series gets cited loosely, often with the wrong base model or category count. Here's the verified lineage — base models, taxonomies