BR
BankStatementReader

How to Convert PDF to Excel: Methods That Keep Formatting

By BankStatementReader Team ·

PDFs are built for reading, not editing. The moment you need to sort, filter, total, or chart the data inside one, you need it in a spreadsheet. This guide walks through the main ways to convert a PDF to Excel, explains when each one works, and shows where formatting tends to break so you know what to expect before you start.

First, know what kind of PDF you have

The right method depends on how the PDF was made.

  • Text-based PDF: created from a digital document (an export, an invoice, a generated report). It has a real text layer, so the characters are selectable and searchable.
  • Scanned or image-only PDF: created by scanning paper or photographing a page. There is no text layer — every page is essentially a picture, even if it looks like text on screen.

To check, try selecting a line of text in your PDF viewer. If the cursor highlights individual words, it is text-based. If nothing selects, it is a scan and you will need optical character recognition (OCR), covered below.

Knowing the type up front matters, because a method that suits a text-based file will return empty or garbled results on a scan. It is also worth noting that a single PDF can mix both kinds — a digital report with a scanned page pasted in, for example — so check the specific pages you care about rather than assuming the whole file behaves the same way.

Method 1: Copy and paste

The quickest route for a small, text-based PDF is selecting the content and pasting it into a worksheet.

  1. Open the PDF and select the table or block of text you want.
  2. Copy it.
  3. Click a cell in Excel and paste.

This often works for a short list or a single clean table. Where it breaks: columns frequently collapse into one cell, dates and amounts run together, and any row with a multi-line description knocks the alignment off. To recover the structure, paste into a single column, then split it using the Text to Columns feature under the Data tools, choosing a delimiter (a comma, space, or tab) or fixed widths. Expect some manual cleanup afterward.

Copy-paste returns nothing on a scanned PDF, because there is no text to select.

Method 2: Excel's built-in PDF import

Excel includes a data import feature that reads tables directly from a PDF — look under the Get Data options on the Data tab for the From PDF source. After you pick a file, Excel scans it and lists the tables and pages it detected. You preview each one, choose what to load, and it lands in the worksheet as structured rows and columns.

This handles formatting better than copy-paste because it interprets the page as a set of tables rather than a stream of characters. It works well when:

  • the PDF is text-based, and
  • the data sits in clear, ruled tables.

Where it struggles: pages with mixed layouts, tables that span page breaks, or flowing paragraph text instead of a grid. It also does not read scanned PDFs, since those contain no table data to detect. When the preview shows a jumbled or partial table, that is the import telling you the source layout is ambiguous.

Method 3: A dedicated PDF-to-Excel converter

Standalone converters are built specifically for this job. You upload or open a PDF, the tool detects the tables, and it exports rows to an XLSX or CSV file. Because the conversion logic is the whole point of the product, these tools tend to handle awkward layouts — merged cells, repeating headers, multi-page tables — more gracefully than a general import.

This is the practical choice when:

  • you convert files regularly and want a repeatable workflow,
  • the source has complex or inconsistent tables, or
  • you need a clean export without hand-fixing columns each time.

The trade-off is that you are using an outside tool, so check how it handles your files and whether the output matches the structure you expect before relying on it for ongoing work.

Method 4: OCR for scanned PDFs

If your PDF is a scan or photo, none of the methods above can read it, because there is no text underneath the image. OCR solves this by analyzing the picture and reconstructing the characters it recognizes.

The workflow is:

  1. Run the scanned PDF through an OCR step, which produces a searchable text layer or extracts the data directly.
  2. Move that recognized text or table into Excel using one of the methods above.

OCR quality depends on the input. A crisp, high-resolution scan with straight lines converts more reliably than a skewed phone photo or a faded fax. Numbers are especially sensitive — review totals and any column where a misread digit would matter.

Where formatting actually breaks

A few patterns cause most of the trouble, regardless of method.

  • Tables vs. flowing text: ruled tables convert cleanly; paragraphs and free-form layouts do not map neatly to rows and columns.
  • Merged or multi-line cells: a description that wraps onto two lines often splits into two rows, shifting everything below it.
  • Tables across page breaks: headers may repeat mid-data, or a single table may be read as several disconnected ones.
  • Currency, dates, and thousands separators: these can land as text instead of numbers, so formulas silently fail until you convert the cell type.

After any conversion, spot-check a few rows against the original PDF and confirm that numeric columns are stored as numbers, not text.

Converting a bank statement specifically

Bank statements are a common reason people need this, and they are a hard case: transaction tables wrap across pages, descriptions run long, and statements are often scanned. A purpose-built tool that understands statement layouts reduces the cleanup that general methods leave behind. You can try our bank statement converter, and for a step-by-step walkthrough see how to convert a bank statement to Excel.

Choosing a method

Match the method to the file. For a short, clean, text-based PDF, copy-paste or the built-in import is enough. For complex or recurring tables, a dedicated converter reduces manual cleanup. For scans, start with OCR. Whatever you choose, verify the result against the source before you build anything on top of the numbers.

Related reading