# Oracle External Tables - Column Order Mapping Issue ## Document Information - **Date**: 2026-01-08 - **Issue**: CSV External Tables map columns by position, not by name - **Impact**: HIGH - NULL values in A_WORKFLOW_HISTORY_KEY column for CSV exports - **Status**: IDENTIFIED - Solution required --- ## Problem Summary Oracle External Tables using **CSV format** map columns **by position** (ignoring header names), while **Parquet format** uses **schema-based mapping** (by column name). This causes critical data loss when exporting to CSV files that are read by external tables with different column ordering than the source table. ### Affected Operations - ✅ **Parquet exports to ARCHIVE** - Working correctly (uses column names) - ❌ **CSV exports to ODS/DATA** - Returns NULL for A_WORKFLOW_HISTORY_KEY (positional mismatch) --- ## Root Cause Analysis ### Source Table Structure (OU_CSDB.LEGACY_DEBT_DAILY) ``` Column Position | Column Name | Data Type ----------------|----------------------|---------- 1 | A_KEY | NUMBER 2 | NEWUPDATED | DATE 3 | IDLOADDATE_DIM | DATE ... | ... | ... 72 | A_ETL_LOAD_SET_FK | NUMBER ← Key column ... | ... | ... 129 | PLACEHOLDER50 | VARCHAR2 ``` ### External Table Structure (ODS.CSDB_DEBT_DAILY_ODS) ``` Column Position | Column Name | Data Type ----------------|----------------------------|---------- 1 | A_KEY | NUMBER 2 | A_WORKFLOW_HISTORY_KEY | NUMBER ← Expected here! 3 | NEWUPDATED | DATE 4 | IDLOADDATE_DIM | DATE ... | ... | ... 129 | PLACEHOLDER50 | VARCHAR2 ``` ### Export Query Generated ```sql SELECT T.A_KEY, -- Position 1 T.NEWUPDATED, -- Position 2 T.IDLOADDATE_DIM, -- Position 3 ..., T.A_ETL_LOAD_SET_FK AS A_WORKFLOW_HISTORY_KEY, -- Position 72 ... FROM OU_CSDB.LEGACY_DEBT_DAILY T ``` ### CSV File Structure ```csv A_KEY,NEWUPDATED,IDLOADDATE_DIM,...,A_WORKFLOW_HISTORY_KEY,... 1,2026-01-08,2025-12-01,...,999,... ``` ### What External Table Reads ``` Position 1 → A_KEY (NUMBER) ✅ Reads: 1 (correct) Position 2 → A_WORKFLOW_HISTORY_KEY ❌ Reads: 2026-01-08 (NEWUPDATED as DATE) Converts DATE → NUMBER → NULL Position 3 → NEWUPDATED (DATE) ❌ Reads: 2025-12-01 (IDLOADDATE_DIM) ... ``` --- ## Experimental Verification ### Test 1: CSV with Mismatched Column Order **CSV File Content:** ```csv A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY 1,2026-01-08,999 ``` **External Table Definition (Wrong Order):** ```sql CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_WRONG ( A_KEY NUMBER, A_WORKFLOW_HISTORY_KEY NUMBER, -- Position 2 NEWUPDATED DATE -- Position 3 ) ``` **Result:** ``` ❌ ORA-30653: reject limit reached ``` **Reason**: Oracle tried to convert NEWUPDATED (DATE) to A_WORKFLOW_HISTORY_KEY (NUMBER) → Conversion failed --- ### Test 2: CSV with Matching Column Order **CSV File Content:** ```csv A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY 1,2026-01-08,999 ``` **External Table Definition (Correct Order):** ```sql CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_MATCHING ( A_KEY NUMBER, NEWUPDATED DATE, -- Position 2 A_WORKFLOW_HISTORY_KEY NUMBER -- Position 3 ) ``` **Result:** ``` ✅ SUCCESS A_KEY=1, NEWUPDATED=2026-01-08, A_WORKFLOW_HISTORY_KEY=999 ``` --- ### Test 3: Parquet with Mismatched Column Order **Parquet File Columns:** ``` A_KEY (NUMBER), NEWUPDATED (DATE), A_WORKFLOW_HISTORY_KEY (NUMBER) ``` **External Table Definition (Different Order):** ```sql CREATE EXTERNAL TABLE TEST_PARQUET_ORDER_WRONG ( A_KEY NUMBER, A_WORKFLOW_HISTORY_KEY NUMBER, -- Different position than Parquet! NEWUPDATED DATE ) ... FORMAT parquet ``` **Result:** ``` ✅ SUCCESS A_KEY=1, A_WORKFLOW_HISTORY_KEY=999, NEWUPDATED=2026-01-08 ``` **Reason**: Parquet uses **schema-based mapping** - matches by column name, not position --- ## Format Comparison | Aspect | CSV | Parquet | |--------|-----|---------| | **Column Mapping** | By position (1, 2, 3, ...) | By column name (schema) | | **Header Usage** | Ignored (cosmetic only) | Used for mapping | | **Order Sensitivity** | ❌ HIGH - Must match exactly | ✅ LOW - Order irrelevant | | **Type Mismatch** | Reject/NULL conversion | Proper validation | | **Current Status** | ❌ BROKEN for ODS exports | ✅ WORKING for ARCHIVE | --- ## Current Implementation Issue ### processColumnList Function (DATA_EXPORTER.pkb) ```sql FUNCTION processColumnList(...) RETURN VARCHAR2 IS BEGIN IF pColumnList IS NULL THEN -- Build list of all columns from SOURCE table SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id) INTO vAllCols FROM all_tab_columns WHERE table_name = pTableName -- Source: LEGACY_DEBT_DAILY AND owner = pSchemaName; -- Columns ordered by SOURCE table structure (1, 2, ..., 72, ...) vResult := 'T.' || REPLACE(vAllCols, ', ', ', T.'); -- Add alias at original position (72) vResult := REPLACE(vResult, 'T.' || pKeyColumnName, 'T.' || pKeyColumnName || ' AS A_WORKFLOW_HISTORY_KEY'); RETURN vResult; END IF; -- ... END; ``` **Problem**: Generates columns in **LEGACY_DEBT_DAILY** order, not **CSDB_DEBT_DAILY_ODS** order! --- ## Real-World Impact Example ### Export Command ```sql CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE( pSchemaName => 'OU_CSDB', pTableName => 'LEGACY_DEBT_DAILY', pKeyColumnName => 'A_ETL_LOAD_SET_FK', pBucketArea => 'DATA', pFolderName => 'ODS/CSDB/CSDB_DEBT_DAILY', pMinDate => DATE '2025-01-01', pMaxDate => SYSDATE, pParallelDegree => 8 ); ``` ### Result - ✅ **Export completed successfully** - 6 CSV files created - ✅ **Parquet export to ARCHIVE** - A_WORKFLOW_HISTORY_KEY populated correctly - ❌ **CSV read via ODS.CSDB_DEBT_DAILY_ODS** - Returns NULL for A_WORKFLOW_HISTORY_KEY ### Verification Query ```sql -- Returns 0 non-NULL values SELECT COUNT(*) FROM ODS.CSDB_DEBT_DAILY_ODS WHERE A_WORKFLOW_HISTORY_KEY IS NOT NULL; -- Returns: 0 ``` --- ## Solution Options ### Option 1: Rebuild External Tables (RECOMMENDED for existing system) **Action**: Update all ODS external table definitions to match source table column order **Pros:** - No code changes in DATA_EXPORTER - Maintains backward compatibility - One-time migration effort **Cons:** - Requires access to all external table definitions - Must coordinate with downstream consumers --- ### Option 2: Modify Export Logic (RECOMMENDED for new exports) **Action**: Update `processColumnList` to query external table structure and match column order **Implementation:** ```sql FUNCTION processColumnList(..., pExternalTableOwner VARCHAR2, pExternalTableName VARCHAR2) IS BEGIN IF pExternalTableName IS NOT NULL THEN -- Get column order from EXTERNAL table, not source table SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id) INTO vAllCols FROM all_tab_columns WHERE table_name = pExternalTableName AND owner = pExternalTableOwner; -- Build SELECT mapping source columns to external table order -- ... ELSE -- Existing logic (source table order) -- ... END IF; END; ``` **Pros:** - Future-proof for new tables - Explicit mapping control - Self-documenting exports **Cons:** - Requires API change (new parameters) - More complex logic - Performance overhead (additional metadata query) --- ### Option 3: Use Parquet Format for ODS **Action**: Switch ODS exports from CSV to Parquet **Pros:** - Eliminates column order issue - Better compression - Stronger typing **Cons:** - May break downstream Airflow/DBT pipelines expecting CSV - Larger migration effort --- ## Recommendations ### Immediate Action (Hot Fix) 1. ✅ Document issue (this document) 2. Identify all affected CSV exports to ODS 3. Choose between Option 1 (rebuild tables) or Option 2 (modify export) ### Long-term Strategy 1. Migrate ODS exports to **Parquet format** where possible 2. Maintain CSV only for systems requiring it 3. Add validation tests comparing source vs external table column order 4. Update documentation to warn about CSV positional mapping --- ## Testing Checklist - [x] Verify CSV positional mapping behavior - [x] Verify Parquet schema-based mapping behavior - [x] Confirm NULL values in ODS.CSDB_DEBT_DAILY_ODS - [x] Confirm correct values in CSDB_DEBT_DAILY_ARCHIVE (Parquet) - [ ] Test proposed solution - [ ] Validate all affected external tables - [ ] Update deployment procedures --- ## Related Issues - MARS-835-PREHOOK: Parallel processing implementation (exposed this issue) - A_WORKFLOW_HISTORY_KEY aliasing fix (partial - doesn't solve column order) --- ## References - Oracle Documentation: External Tables - Data Access Parameters - DBMS_CLOUD.EXPORT_DATA format options - Oracle External Tables Column Mapping Behavior (CSV vs Parquet) --- **Document Status**: DRAFT - Awaiting solution decision **Last Updated**: 2026-01-08 **Author**: GitHub Copilot (Analysis Session)