PDFDancer

Automated Redaction SDK

Accurate enough to automate.

A developer SDK for compliance-grade document redaction, powered by semantic understanding and our own local ML model.

Deploy on-prem or in the cloud.

Integrate PII detection and true binary-level redaction into your document pipeline. The engine understands context, not just patterns — so your reviewers spend their time on edge cases, not every single page.

An SDK for teams that need redaction at scale

This isn't an app your team logs into. It's infrastructure. Integrate our redaction engine into your existing document pipeline via API or on-prem deployment. You control the workflow, the thresholds, and the output. We handle the hard part: finding and removing sensitive data with precision.

Semantic Analysis

Understands document structure and context, not just keyword matching. Handles invisible text, vector text, and text in images.

Purpose-Built ML

Trained on a massive synthetic dataset. Runs on our infrastructure — no data sent to external AI providers. Returns labeled findings with confidence scores.

True Redaction

Binary-level removal from the PDF. The underlying data isn't covered up — it's permanently eliminated from the file.

From raw document to clean output

1
Ingest

Feed in PDFs: scanned, digital, or mixed.

2
OCR

Extract text from scanned pages, images, and non-standard text layers.

3
Analyze

Semantic engine parses document structure, context, and entity relationships.

4
Classify

ML engine labels every detected entity with a confidence score.

5
You Decide

Your logic sets the rules: which labels, what threshold, what action.

6
Redact

Binary-level removal. Clean at the file level, not cosmetically masked.

Built for high-volume, high-stakes workflows

Teams use our SDK to automate redaction across regulated industries. Here's where it fits.

Legal Document Review Coming soon

Stop paying associates to hold a black marker.

Automate PII redaction across contracts, discovery documents, and filings. Process thousands of pages with consistent accuracy, and an audit trail your compliance team will actually trust.

Learn more about legal redaction →

Healthcare & Life Sciences Coming soon

De-identify at the speed your pipeline demands.

Strip patient data from clinical records, trial documentation, and regulatory submissions. Built to support HIPAA and EMA Policy 0070 compliance at scale.

Learn more about healthcare redaction →

Financial Services Coming soon

Share documents without sharing customer data.

Redact PII from loan applications, KYC files, audit trails, and reports before they leave your system. Confidence scoring lets you tune precision to your risk tolerance.

Learn more about financial services redaction →

Government & Public Records Coming soon

FOIA-ready in hours, not weeks.

Prepare public disclosure documents, inter-agency transfers, and records requests with automated PII detection and removal. Deployable on-prem for environments with strict data residency requirements.

Learn more about government redaction →

We publish our numbers. Here they are.

Redaction is high-stakes. You need to know exactly how well the engine performs before you trust it with real data. We agree — so we don't hide behind vague claims.

Detection Performance

By HIPAA entity category, measured on our English-language benchmark dataset:

CategoryRecallPrecisionF1 Score
Person96.28%97.43%0.969
Dates of Birth92.57%100%0.961
Account Number / SSN93.93%85.27%0.894
Addresses91.22%99.43%0.951
Phone / Fax Numbers96.3%94.12%0.952
Email Addresses99.98%99.58%0.998

In Production

In our current pilot with a legal services provider, the SDK processes thousands of pages per month with high accuracy on first pass. Manual review time dropped significantly compared to their previous workflow.

Compliance & Certifications

We are certified / compliant

ISO 27001 — certified
GDPR — compliant
Own Infrastructure — no third-party AI providers

These apply to us. We run our own ML models on a ISO 27001-certified compute infrastructure. Your documents are never sent to external AI providers.

We help you build compliant workflows for

HIPAA
UK GDPR
CCPA
Safe Harbor
21 CFR Part 11
EMA Policy 0070

The SDK is designed to support your organization's compliance with these frameworks. How you integrate, configure, and deploy it determines your compliance posture — we give you the tools and guidance to get there.

What we don't do (yet)

We believe being upfront about scope builds more trust than a features page that over-promises.

Non-text content

The engine processes text. It does not detect or redact faces in photographs, visible handwritten signatures, logos, or other graphical elements.

Languages beyond our current set

We support English, German, Spanish, French, and Italian. Additional languages are on our roadmap — talk to us if your use case requires others.

Entities that span page boundaries

The engine analyzes each page independently. If an entity (such as a name or address) starts on one page and continues on the next, we may miss part of it. This is a known gap for documents with dense, flowing text across page breaks.

More than an API call

You're not just licensing an engine. Depending on your plan, you get the support to deploy it properly and keep it running.

Consulting

We help you scope the integration — document types, entity categories, confidence thresholds, edge cases specific to your domain.

Implementation

Hands-on support for deployment, whether you're calling our cloud API or installing on-prem in a locked-down environment.

Ongoing Support

SLAs, model updates, new language and entity support as we ship it, and a dedicated account contact for enterprise customers.

Simple pricing. No per-seat licenses.

On-Prem / Enterprise
Custom

For teams that need data residency, high-volume pricing, custom SLAs, or dedicated support.

15 minutes. No pitch deck.

Book a short call and we'll figure out if we're a fit for your use case. We'll ask about your document types, volume, and compliance requirements — and tell you honestly whether our SDK is the right tool.