Migration

Migrating Oracle LOB and XML Columns to PostgreSQL

Rakesh Mamidala·Founder & Lead Engineer·June 26, 2026·7 min read

Most of an Oracle→PostgreSQL migration is boring in a good way — numbers, strings, dates map cleanly. Then you hit the LOB columns: CLOB, NCLOB, BLOB, and XMLType. That’s where migrations quietly go wrong, first on correctness and then on speed.

The correct type mappings

Start here — get the target types right before anything else:

CLOB → TEXT — PostgreSQL text is unbounded; no need for a length.
NCLOB → TEXT — same; the national-charset distinction disappears once you’re in a UTF-8 database.
BLOB → BYTEA — binary stays binary.
XMLType → XML — PostgreSQL has a native xml type. Use it.
BFILE → (no real equivalent) — BFILE is a pointer to an OS file, not stored data; you migrate the bytes separately or store a path in text.

The XMLType → jsonb footgun

It is tempting to map XMLType to jsonb — both are “semi-structured,” and jsonb is the trendy choice. Don’t. XML is not JSON. The moment you try to load an XML value into a json column, every row fails:

text

ERROR: invalid input syntax for type json
DETAIL: Token "<" is invalid.
CONTEXT: COPY big_data_types, line 1, column xml_col: "<ROW><ID>3</ID></ROW>"

The fix is simply the right target type. PostgreSQL’s xmltype accepts XML content directly, validates it, and lets you query it with XPath later. If you genuinely want JSON, that’s a datatransformation (parse the XML, emit JSON) — a deliberate step, not a column type you can quietly swap in. Map the type faithfully first; transform data on purpose, never by accident.

The performance trap: LOBs are heavy per row

Correct types get the data in. Speed is the second, sneakier problem. A normal row is a few hundred bytes; a row with a couple of LOBs can be kilobytes to megabytes. Bulk loaders read in batches for throughput — but with LOB columns, every row in the batch carries its full LOB payload. A batch of 50,000 rows that’s fine for a skinny table becomes gigabytes of buffered memory on a LOB-heavy one. The loader thrashes, throughput collapses, and the table looks frozen even though it’s technically still running.

Two things make LOB reads fast and safe

Fetch LOBs as values, not locators.By default many drivers return a LOB locator — a handle you then read with a separate round-trip per LOB. Reading thousands of locators one at a time is brutally slow. Configure the driver to fetch CLOB/NCLOB as strings and BLOB as bytes inline (in python-oracledb, fetch_lobs=False) so the data streams with the row.
Use a smaller batch for LOB tables.The right batch size for a LOB-heavy table is much smaller than for the rest of the schema — hundreds of rows, not tens of thousands — to keep the per-fetch memory bounded. The migration should detect LOB columns and cap the batch automatically, not make you guess.

Validate LOBs by value, not by row count

Row counts won’t catch a truncated CLOB or a mis-encoded BLOB. After the load, validate LOB columns at the value level — a hash or checksum of the content, source vs target — so a silently shortened document or a botched binary is caught before cutover, not after a customer notices.

The short version

CLOB/NCLOB → text, BLOB → bytea, XMLType → xml (never jsonb).
Fetch LOBs inline as str/bytes, not as locators.
Give LOB-heavy tables a much smaller batch size than the rest.
Validate LOB content by value, not by row count.

None of this is exotic — but every one of these is a place a migration passes its row-count check and still ships corrupted or half-loaded data. Treat LOBs as their own little project inside the migration and they stop being the part that goes wrong.

← Back to Blog Try DBMigrateAIPro →