Why is My PDF Text Showing Weird Symbols? Fixing Mojibake

Published on March 15, 2026 • 5 min read

We've all experienced this extremely frustrating scenario: You have a perfectly readable PDF document. You try to highlight a paragraph of text, right-click "Copy", and paste it into Microsoft Word. Instead of English sentences, your screen is suddenly flooded with bizarre symbols, strange wingdings, and unreadable alien characters (like æ, ø, å, or Ã©). This phenomenon is officially known in computer science as Mojibake.

Understanding the Root Cause of Garbled PDF Text

To fix the issue, you must first understand that a PDF (Portable Document Format) is fundamentally not a text document like Word or Notepad. A PDF is actually a highly precise graphical coordinate system. When a software generates a PDF, it places visual "glyphs" (shapes representing letters) onto an X/Y grid so it looks perfect on any screen or printer.

In a properly created PDF, there is a hidden background table called an Encoding Dictionary (ToUnicode map). This table tells the computer: "The graphical shape located at X:100, Y:50 visually looks like the letter 'A', and logically maps to the Unicode character point U+0041."

Mojibake happens when the software that originally created the PDF was either broken, outdated, or used custom proprietary font subgrouping. The document visually looks fine because the graphics are correct, but the hidden `ToUnicode` map is entirely missing or corrupted. When you try to copy the text, your clipboard pulls the raw, corrupted logical data instead of the visual glyphs.

How to Extract Text from a Corrupted PDF

Method 1: Advanced OCR (Optical Character Recognition)

If the internal text map is destroyed, the easiest way to recover the data is to treat the PDF like a physical photograph. By using an OCR engine, the software visually scans the document, recognizes the shapes as letters using Artificial Intelligence, and types out a brand new, clean text file.

You can use QuickDoPDF's PDF to Excel or PDF to Word tools which utilize advanced heuristic extraction parameters to often bypass minor encoding errors and rebuild the document structure accurately.

Method 2: Print to PDF Flattening

Sometimes, simply forcing the browser to rebuild the document can fix broken font subsets. Open the corrupted PDF in Chrome, press "Print", and select "Save as PDF". This creates a brand new document architecture which occasionally resolves encoding conflicts.

Method 3: Image Conversion

If you absolutely must extract data and copy-pasting is entirely broken, you can convert the entire PDF into a series of images using the "PDF to JPG" tool. Once you have high-quality JPGs, you can upload them to Google Drive and open them with Google Docs, which automatically applies its massive cloud-based OCR engine to extract the text flawlessly.

Why is My PDF Text Showing Weird Symbols? Fixing Mojibake

Understanding the Root Cause of Garbled PDF Text

How to Extract Text from a Corrupted PDF

Method 1: Advanced OCR (Optical Character Recognition)

Method 2: Print to PDF Flattening

Method 3: Image Conversion

Frequently Asked Questions

Why does this only happen with certain fonts?

Can I fix the original PDF file permanently?

💡 Pro Tip: Prevent Mojibake in the Future

Ready to manage documents seamlessly?