9.4 KiB
Oracle External Tables - Column Order Mapping Issue
Document Information
- Date: 2026-01-08
- Issue: CSV External Tables map columns by position, not by name
- Impact: HIGH - NULL values in A_WORKFLOW_HISTORY_KEY column for CSV exports
- Status: IDENTIFIED - Solution required
Problem Summary
Oracle External Tables using CSV format map columns by position (ignoring header names), while Parquet format uses schema-based mapping (by column name). This causes critical data loss when exporting to CSV files that are read by external tables with different column ordering than the source table.
Affected Operations
- ✅ Parquet exports to ARCHIVE - Working correctly (uses column names)
- ❌ CSV exports to ODS/DATA - Returns NULL for A_WORKFLOW_HISTORY_KEY (positional mismatch)
Root Cause Analysis
Source Table Structure (OU_CSDB.LEGACY_DEBT_DAILY)
Column Position | Column Name | Data Type
----------------|----------------------|----------
1 | A_KEY | NUMBER
2 | NEWUPDATED | DATE
3 | IDLOADDATE_DIM | DATE
... | ... | ...
72 | A_ETL_LOAD_SET_FK | NUMBER ← Key column
... | ... | ...
129 | PLACEHOLDER50 | VARCHAR2
External Table Structure (ODS.CSDB_DEBT_DAILY_ODS)
Column Position | Column Name | Data Type
----------------|----------------------------|----------
1 | A_KEY | NUMBER
2 | A_WORKFLOW_HISTORY_KEY | NUMBER ← Expected here!
3 | NEWUPDATED | DATE
4 | IDLOADDATE_DIM | DATE
... | ... | ...
129 | PLACEHOLDER50 | VARCHAR2
Export Query Generated
SELECT
T.A_KEY, -- Position 1
T.NEWUPDATED, -- Position 2
T.IDLOADDATE_DIM, -- Position 3
...,
T.A_ETL_LOAD_SET_FK AS A_WORKFLOW_HISTORY_KEY, -- Position 72
...
FROM OU_CSDB.LEGACY_DEBT_DAILY T
CSV File Structure
A_KEY,NEWUPDATED,IDLOADDATE_DIM,...,A_WORKFLOW_HISTORY_KEY,...
1,2026-01-08,2025-12-01,...,999,...
What External Table Reads
Position 1 → A_KEY (NUMBER) ✅ Reads: 1 (correct)
Position 2 → A_WORKFLOW_HISTORY_KEY ❌ Reads: 2026-01-08 (NEWUPDATED as DATE)
Converts DATE → NUMBER → NULL
Position 3 → NEWUPDATED (DATE) ❌ Reads: 2025-12-01 (IDLOADDATE_DIM)
...
Experimental Verification
Test 1: CSV with Mismatched Column Order
CSV File Content:
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999
External Table Definition (Wrong Order):
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_WRONG (
A_KEY NUMBER,
A_WORKFLOW_HISTORY_KEY NUMBER, -- Position 2
NEWUPDATED DATE -- Position 3
)
Result:
❌ ORA-30653: reject limit reached
Reason: Oracle tried to convert NEWUPDATED (DATE) to A_WORKFLOW_HISTORY_KEY (NUMBER) → Conversion failed
Test 2: CSV with Matching Column Order
CSV File Content:
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999
External Table Definition (Correct Order):
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_MATCHING (
A_KEY NUMBER,
NEWUPDATED DATE, -- Position 2
A_WORKFLOW_HISTORY_KEY NUMBER -- Position 3
)
Result:
✅ SUCCESS
A_KEY=1, NEWUPDATED=2026-01-08, A_WORKFLOW_HISTORY_KEY=999
Test 3: Parquet with Mismatched Column Order
Parquet File Columns:
A_KEY (NUMBER), NEWUPDATED (DATE), A_WORKFLOW_HISTORY_KEY (NUMBER)
External Table Definition (Different Order):
CREATE EXTERNAL TABLE TEST_PARQUET_ORDER_WRONG (
A_KEY NUMBER,
A_WORKFLOW_HISTORY_KEY NUMBER, -- Different position than Parquet!
NEWUPDATED DATE
) ... FORMAT parquet
Result:
✅ SUCCESS
A_KEY=1, A_WORKFLOW_HISTORY_KEY=999, NEWUPDATED=2026-01-08
Reason: Parquet uses schema-based mapping - matches by column name, not position
Format Comparison
| Aspect | CSV | Parquet |
|---|---|---|
| Column Mapping | By position (1, 2, 3, ...) | By column name (schema) |
| Header Usage | Ignored (cosmetic only) | Used for mapping |
| Order Sensitivity | ❌ HIGH - Must match exactly | ✅ LOW - Order irrelevant |
| Type Mismatch | Reject/NULL conversion | Proper validation |
| Current Status | ❌ BROKEN for ODS exports | ✅ WORKING for ARCHIVE |
Current Implementation Issue
processColumnList Function (DATA_EXPORTER.pkb)
FUNCTION processColumnList(...) RETURN VARCHAR2 IS
BEGIN
IF pColumnList IS NULL THEN
-- Build list of all columns from SOURCE table
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
INTO vAllCols
FROM all_tab_columns
WHERE table_name = pTableName -- Source: LEGACY_DEBT_DAILY
AND owner = pSchemaName;
-- Columns ordered by SOURCE table structure (1, 2, ..., 72, ...)
vResult := 'T.' || REPLACE(vAllCols, ', ', ', T.');
-- Add alias at original position (72)
vResult := REPLACE(vResult, 'T.' || pKeyColumnName,
'T.' || pKeyColumnName || ' AS A_WORKFLOW_HISTORY_KEY');
RETURN vResult;
END IF;
-- ...
END;
Problem: Generates columns in LEGACY_DEBT_DAILY order, not CSDB_DEBT_DAILY_ODS order!
Real-World Impact Example
Export Command
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
pSchemaName => 'OU_CSDB',
pTableName => 'LEGACY_DEBT_DAILY',
pKeyColumnName => 'A_ETL_LOAD_SET_FK',
pBucketArea => 'DATA',
pFolderName => 'ODS/CSDB/CSDB_DEBT_DAILY',
pMinDate => DATE '2025-01-01',
pMaxDate => SYSDATE,
pParallelDegree => 8
);
Result
- ✅ Export completed successfully - 6 CSV files created
- ✅ Parquet export to ARCHIVE - A_WORKFLOW_HISTORY_KEY populated correctly
- ❌ CSV read via ODS.CSDB_DEBT_DAILY_ODS - Returns NULL for A_WORKFLOW_HISTORY_KEY
Verification Query
-- Returns 0 non-NULL values
SELECT COUNT(*)
FROM ODS.CSDB_DEBT_DAILY_ODS
WHERE A_WORKFLOW_HISTORY_KEY IS NOT NULL;
-- Returns: 0
Solution Options
Option 1: Rebuild External Tables (RECOMMENDED for existing system)
Action: Update all ODS external table definitions to match source table column order
Pros:
- No code changes in DATA_EXPORTER
- Maintains backward compatibility
- One-time migration effort
Cons:
- Requires access to all external table definitions
- Must coordinate with downstream consumers
Option 2: Modify Export Logic (RECOMMENDED for new exports)
Action: Update processColumnList to query external table structure and match column order
Implementation:
FUNCTION processColumnList(..., pExternalTableOwner VARCHAR2, pExternalTableName VARCHAR2) IS
BEGIN
IF pExternalTableName IS NOT NULL THEN
-- Get column order from EXTERNAL table, not source table
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
INTO vAllCols
FROM all_tab_columns
WHERE table_name = pExternalTableName
AND owner = pExternalTableOwner;
-- Build SELECT mapping source columns to external table order
-- ...
ELSE
-- Existing logic (source table order)
-- ...
END IF;
END;
Pros:
- Future-proof for new tables
- Explicit mapping control
- Self-documenting exports
Cons:
- Requires API change (new parameters)
- More complex logic
- Performance overhead (additional metadata query)
Option 3: Use Parquet Format for ODS
Action: Switch ODS exports from CSV to Parquet
Pros:
- Eliminates column order issue
- Better compression
- Stronger typing
Cons:
- May break downstream Airflow/DBT pipelines expecting CSV
- Larger migration effort
Recommendations
Immediate Action (Hot Fix)
- ✅ Document issue (this document)
- Identify all affected CSV exports to ODS
- Choose between Option 1 (rebuild tables) or Option 2 (modify export)
Long-term Strategy
- Migrate ODS exports to Parquet format where possible
- Maintain CSV only for systems requiring it
- Add validation tests comparing source vs external table column order
- Update documentation to warn about CSV positional mapping
Testing Checklist
- Verify CSV positional mapping behavior
- Verify Parquet schema-based mapping behavior
- Confirm NULL values in ODS.CSDB_DEBT_DAILY_ODS
- Confirm correct values in CSDB_DEBT_DAILY_ARCHIVE (Parquet)
- Test proposed solution
- Validate all affected external tables
- Update deployment procedures
Related Issues
- MARS-835-PREHOOK: Parallel processing implementation (exposed this issue)
- A_WORKFLOW_HISTORY_KEY aliasing fix (partial - doesn't solve column order)
References
- Oracle Documentation: External Tables - Data Access Parameters
- DBMS_CLOUD.EXPORT_DATA format options
- Oracle External Tables Column Mapping Behavior (CSV vs Parquet)
Document Status: DRAFT - Awaiting solution decision Last Updated: 2026-01-08 Author: GitHub Copilot (Analysis Session)