Files
mars/confluence/additions/Oracle_External_Tables_Column_Order_Issue.md
Grzegorz Michalski ffcb288afa Develop
2026-02-02 11:13:24 +01:00

9.4 KiB

Oracle External Tables - Column Order Mapping Issue

Document Information

  • Date: 2026-01-08
  • Issue: CSV External Tables map columns by position, not by name
  • Impact: HIGH - NULL values in A_WORKFLOW_HISTORY_KEY column for CSV exports
  • Status: IDENTIFIED - Solution required

Problem Summary

Oracle External Tables using CSV format map columns by position (ignoring header names), while Parquet format uses schema-based mapping (by column name). This causes critical data loss when exporting to CSV files that are read by external tables with different column ordering than the source table.

Affected Operations

  • Parquet exports to ARCHIVE - Working correctly (uses column names)
  • CSV exports to ODS/DATA - Returns NULL for A_WORKFLOW_HISTORY_KEY (positional mismatch)

Root Cause Analysis

Source Table Structure (OU_CSDB.LEGACY_DEBT_DAILY)

Column Position | Column Name           | Data Type
----------------|----------------------|----------
1               | A_KEY                | NUMBER
2               | NEWUPDATED           | DATE
3               | IDLOADDATE_DIM       | DATE
...             | ...                  | ...
72              | A_ETL_LOAD_SET_FK    | NUMBER  ← Key column
...             | ...                  | ...
129             | PLACEHOLDER50        | VARCHAR2

External Table Structure (ODS.CSDB_DEBT_DAILY_ODS)

Column Position | Column Name                | Data Type
----------------|----------------------------|----------
1               | A_KEY                      | NUMBER
2               | A_WORKFLOW_HISTORY_KEY     | NUMBER  ← Expected here!
3               | NEWUPDATED                 | DATE
4               | IDLOADDATE_DIM             | DATE
...             | ...                        | ...
129             | PLACEHOLDER50              | VARCHAR2

Export Query Generated

SELECT 
    T.A_KEY,                                    -- Position 1
    T.NEWUPDATED,                               -- Position 2
    T.IDLOADDATE_DIM,                           -- Position 3
    ...,
    T.A_ETL_LOAD_SET_FK AS A_WORKFLOW_HISTORY_KEY,  -- Position 72
    ...
FROM OU_CSDB.LEGACY_DEBT_DAILY T

CSV File Structure

A_KEY,NEWUPDATED,IDLOADDATE_DIM,...,A_WORKFLOW_HISTORY_KEY,...
1,2026-01-08,2025-12-01,...,999,...

What External Table Reads

Position 1 → A_KEY (NUMBER)           ✅ Reads: 1 (correct)
Position 2 → A_WORKFLOW_HISTORY_KEY   ❌ Reads: 2026-01-08 (NEWUPDATED as DATE)
                                         Converts DATE → NUMBER → NULL
Position 3 → NEWUPDATED (DATE)        ❌ Reads: 2025-12-01 (IDLOADDATE_DIM)
...

Experimental Verification

Test 1: CSV with Mismatched Column Order

CSV File Content:

A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999

External Table Definition (Wrong Order):

CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_WRONG (
    A_KEY NUMBER,
    A_WORKFLOW_HISTORY_KEY NUMBER,  -- Position 2
    NEWUPDATED DATE                 -- Position 3
)

Result:

❌ ORA-30653: reject limit reached

Reason: Oracle tried to convert NEWUPDATED (DATE) to A_WORKFLOW_HISTORY_KEY (NUMBER) → Conversion failed


Test 2: CSV with Matching Column Order

CSV File Content:

A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999

External Table Definition (Correct Order):

CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_MATCHING (
    A_KEY NUMBER,
    NEWUPDATED DATE,                -- Position 2
    A_WORKFLOW_HISTORY_KEY NUMBER   -- Position 3
)

Result:

✅ SUCCESS
A_KEY=1, NEWUPDATED=2026-01-08, A_WORKFLOW_HISTORY_KEY=999

Test 3: Parquet with Mismatched Column Order

Parquet File Columns:

A_KEY (NUMBER), NEWUPDATED (DATE), A_WORKFLOW_HISTORY_KEY (NUMBER)

External Table Definition (Different Order):

CREATE EXTERNAL TABLE TEST_PARQUET_ORDER_WRONG (
    A_KEY NUMBER,
    A_WORKFLOW_HISTORY_KEY NUMBER,  -- Different position than Parquet!
    NEWUPDATED DATE
) ... FORMAT parquet

Result:

✅ SUCCESS
A_KEY=1, A_WORKFLOW_HISTORY_KEY=999, NEWUPDATED=2026-01-08

Reason: Parquet uses schema-based mapping - matches by column name, not position


Format Comparison

Aspect CSV Parquet
Column Mapping By position (1, 2, 3, ...) By column name (schema)
Header Usage Ignored (cosmetic only) Used for mapping
Order Sensitivity HIGH - Must match exactly LOW - Order irrelevant
Type Mismatch Reject/NULL conversion Proper validation
Current Status BROKEN for ODS exports WORKING for ARCHIVE

Current Implementation Issue

processColumnList Function (DATA_EXPORTER.pkb)

FUNCTION processColumnList(...) RETURN VARCHAR2 IS
BEGIN
    IF pColumnList IS NULL THEN
        -- Build list of all columns from SOURCE table
        SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
        INTO vAllCols
        FROM all_tab_columns
        WHERE table_name = pTableName  -- Source: LEGACY_DEBT_DAILY
          AND owner = pSchemaName;
        
        -- Columns ordered by SOURCE table structure (1, 2, ..., 72, ...)
        vResult := 'T.' || REPLACE(vAllCols, ', ', ', T.');
        
        -- Add alias at original position (72)
        vResult := REPLACE(vResult, 'T.' || pKeyColumnName, 
                          'T.' || pKeyColumnName || ' AS A_WORKFLOW_HISTORY_KEY');
        
        RETURN vResult;
    END IF;
    -- ...
END;

Problem: Generates columns in LEGACY_DEBT_DAILY order, not CSDB_DEBT_DAILY_ODS order!


Real-World Impact Example

Export Command

CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
    pSchemaName => 'OU_CSDB',
    pTableName => 'LEGACY_DEBT_DAILY',
    pKeyColumnName => 'A_ETL_LOAD_SET_FK',
    pBucketArea => 'DATA',
    pFolderName => 'ODS/CSDB/CSDB_DEBT_DAILY',
    pMinDate => DATE '2025-01-01',
    pMaxDate => SYSDATE,
    pParallelDegree => 8
);

Result

  • Export completed successfully - 6 CSV files created
  • Parquet export to ARCHIVE - A_WORKFLOW_HISTORY_KEY populated correctly
  • CSV read via ODS.CSDB_DEBT_DAILY_ODS - Returns NULL for A_WORKFLOW_HISTORY_KEY

Verification Query

-- Returns 0 non-NULL values
SELECT COUNT(*) 
FROM ODS.CSDB_DEBT_DAILY_ODS 
WHERE A_WORKFLOW_HISTORY_KEY IS NOT NULL;

-- Returns: 0

Solution Options

Action: Update all ODS external table definitions to match source table column order

Pros:

  • No code changes in DATA_EXPORTER
  • Maintains backward compatibility
  • One-time migration effort

Cons:

  • Requires access to all external table definitions
  • Must coordinate with downstream consumers

Action: Update processColumnList to query external table structure and match column order

Implementation:

FUNCTION processColumnList(..., pExternalTableOwner VARCHAR2, pExternalTableName VARCHAR2) IS
BEGIN
    IF pExternalTableName IS NOT NULL THEN
        -- Get column order from EXTERNAL table, not source table
        SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
        INTO vAllCols
        FROM all_tab_columns
        WHERE table_name = pExternalTableName
          AND owner = pExternalTableOwner;
        
        -- Build SELECT mapping source columns to external table order
        -- ...
    ELSE
        -- Existing logic (source table order)
        -- ...
    END IF;
END;

Pros:

  • Future-proof for new tables
  • Explicit mapping control
  • Self-documenting exports

Cons:

  • Requires API change (new parameters)
  • More complex logic
  • Performance overhead (additional metadata query)

Option 3: Use Parquet Format for ODS

Action: Switch ODS exports from CSV to Parquet

Pros:

  • Eliminates column order issue
  • Better compression
  • Stronger typing

Cons:

  • May break downstream Airflow/DBT pipelines expecting CSV
  • Larger migration effort

Recommendations

Immediate Action (Hot Fix)

  1. Document issue (this document)
  2. Identify all affected CSV exports to ODS
  3. Choose between Option 1 (rebuild tables) or Option 2 (modify export)

Long-term Strategy

  1. Migrate ODS exports to Parquet format where possible
  2. Maintain CSV only for systems requiring it
  3. Add validation tests comparing source vs external table column order
  4. Update documentation to warn about CSV positional mapping

Testing Checklist

  • Verify CSV positional mapping behavior
  • Verify Parquet schema-based mapping behavior
  • Confirm NULL values in ODS.CSDB_DEBT_DAILY_ODS
  • Confirm correct values in CSDB_DEBT_DAILY_ARCHIVE (Parquet)
  • Test proposed solution
  • Validate all affected external tables
  • Update deployment procedures

  • MARS-835-PREHOOK: Parallel processing implementation (exposed this issue)
  • A_WORKFLOW_HISTORY_KEY aliasing fix (partial - doesn't solve column order)

References

  • Oracle Documentation: External Tables - Data Access Parameters
  • DBMS_CLOUD.EXPORT_DATA format options
  • Oracle External Tables Column Mapping Behavior (CSV vs Parquet)

Document Status: DRAFT - Awaiting solution decision Last Updated: 2026-01-08 Author: GitHub Copilot (Analysis Session)