Skip to main content
Node.js Redaction SDK

ML-Powered PDF Redaction for Node.js — Remove PII from Any PDF

Permanent PII removal with audit trails. ML-powered detection across 20+ entity types with confidence scoring. HIPAA, GDPR, CCPA compliant.

Why PDF Redaction Is Harder Than It Looks

Finding PII is hard. Regex catches patterns like SSNs, but misses context-dependent data like names and addresses. You need ML to close that gap.

Removing it is harder. PDFs weren't built for editing — what looks like "John Smith" on screen might be scattered across multiple internal objects. Most tools just draw black boxes over text, but the original content stays in the file.

Get either side wrong and you have a compliance gap.

The Limitations

  • Pattern matching alone misses context-dependent PII like names and addresses
  • Overlay-based redaction hides text visually but doesn't remove it from the file
  • No confidence scoring — you can't tell good detections from false positives
  • No audit trail — you can't prove what was removed or when

What PDFDancer Changes

  • ML-powered detection — context-aware entity recognition across 20+ PII types
  • True binary-level removal — content permanently deleted, not covered up
  • Confidence scores — filter detections by threshold to control precision vs. recall
  • Audit trails — verifiable proof of what was redacted and when

PII Redaction in Node.js

ML-powered entity detection across 20+ PII categories with confidence scoring. Filter by threshold to control precision vs. recall.

PDFDancer vs. Apryse, Adobe, pdf-lib

FeaturePDFDancerApryseAdobe PDF Servicespdf-lib
ML-Powered PII Detection✓ Entity detection with confidence scoresLimited patternsCloud-only API✗ Text-only, no redaction
Permanent Removal✓ Binary-level deletionAnnotation-basedRequires separate sanitize✗ No redaction support
Audit Trail✓ Full logging with timestampsLimited metadataPer-API-call logging✗ No tracking
Express/Lambda Support✓ Async/await, statelessRequires heap allocationHTTP client required✓ Browser-focused
Self-Hosted✓ Yes, on-prem available✓ Yes (expensive)✗ Cloud-only✓ Yes (no redaction)
PricingFree tier + usage-based$10K+/year per dev$$$ per API callOpen source (Apache 2.0)

ML-Powered Detection Benchmarks

PDFDancer's semantic redaction engine achieves industry-leading accuracy across all PII categories. Powered by purpose-built ML, not generic text search.

CategoryPrecisionRecallF1 Score
Person97.43%96.28%0.969
Dates of Birth100.00%92.57%0.961
Account Number / SSN85.27%93.93%0.894
Addresses99.43%91.22%0.951
Phone / Fax Numbers94.12%96.30%0.952
Email Addresses99.58%99.98%0.998

Three Steps to Your First Redaction

1

Install the Package

npm install pdfdancer-client-typescript

Works with Node.js 14+. TypeScript and JavaScript both supported.

2

Get Your API Key

Sign up at pdfdancer.com to get a free tier API key. Set it as an environment variable or pass it to PDFDancer.

3

Run Your First Redaction

Import PDFDancer, open a PDF, configure entity detection, redact, and save. See the code examples above for ready-to-use templates.

Frequently Asked Questions

Let’s Talk About Your Use Case

15-minute call — we’ll walk through your document pipeline and show how PDFDancer fits.