PDF Glossary
Clear definitions of technical terms you'll encounter in the PDF world — PDF/A, OCR, GDPR, signatures, compression.
PDF/A
Standardized PDF format for long-term archiving (ISO 19005).
PDF/A is a PDF variant designed to guarantee a document remains readable in 10, 20, or 50 years. All fonts must be embedded, no external content allowed, the file is self-contained. Three families: PDF/A-1 (strictest), PDF/A-2 (with transparency and layers), PDF/A-3 (with file embedding).
In several European contexts (public archives, public tenders, electronic invoices), PDF/A is mandatory. For 95% of cases, PDF/A-2b is sufficient.
OCR (Optical Character Recognition)
Converting an image of text into searchable, copyable digital text.
OCR transforms an image containing text (scan, photo) into usable text data — searchable, copyable, indexable.
Modern engines (Tesseract 5+, Google Vision, AWS Textract) use convolutional neural networks trained on tens of millions of pages, reaching 99.5%+ accuracy on clean French/English scans. On free handwriting, accuracy drops to 60-80% — still OCR's Achilles heel.
GDPR (General Data Protection Regulation)
EU regulation on personal data protection, in force since 2018.
GDPR requires every company processing data of European citizens to respect principles of minimization, transparency, right to access, and erasure. It applies regardless of where the company is based.
GDPR compliance alone isn't enough for transfers to the United States: since Schrems II (2020) and the Cloud Act (2018), real compliance requires either European hosting or additional safeguards (end-to-end encryption, prior anonymization).
Cloud Act
US 2018 law allowing authorities to access data of US-based companies, wherever stored.
The CLOUD Act (Clarifying Lawful Overseas Use of Data Act) authorizes any US federal agency to require a US-jurisdiction company — including its European subsidiaries — to provide data stored anywhere in the world, without prior user notification.
Concretely: a PDF processed by iLovePDF, Adobe, or any service using AWS / Google Cloud / Azure is technically accessible to US authorities. For sensitive European data (contracts, HR, legal), this is a GDPR non-compliance risk via Schrems II.
eIDAS
EU regulation on electronic identification and trust services.
The eIDAS regulation (Electronic IDentification, Authentication and trust Services), in force since 2014, harmonizes electronic signatures in Europe. It defines three levels: simple (SES), advanced (AES), and qualified (QES).
QES is the only level strictly equivalent to handwritten signature before the law. It requires a certified device (smart card, national eID, certified app). SES (drawing your signature with a mouse) is legally valid but weakly probative — reserved for low-stakes uses.