This commit is contained in:
Grzegorz Michalski
2026-02-02 11:13:24 +01:00
parent ecd833f682
commit ffcb288afa
62 changed files with 8463 additions and 2845 deletions

View File

@@ -0,0 +1,338 @@
# Oracle External Tables - Column Order Mapping Issue
## Document Information
- **Date**: 2026-01-08
- **Issue**: CSV External Tables map columns by position, not by name
- **Impact**: HIGH - NULL values in A_WORKFLOW_HISTORY_KEY column for CSV exports
- **Status**: IDENTIFIED - Solution required
---
## Problem Summary
Oracle External Tables using **CSV format** map columns **by position** (ignoring header names), while **Parquet format** uses **schema-based mapping** (by column name). This causes critical data loss when exporting to CSV files that are read by external tables with different column ordering than the source table.
### Affected Operations
-**Parquet exports to ARCHIVE** - Working correctly (uses column names)
-**CSV exports to ODS/DATA** - Returns NULL for A_WORKFLOW_HISTORY_KEY (positional mismatch)
---
## Root Cause Analysis
### Source Table Structure (OU_CSDB.LEGACY_DEBT_DAILY)
```
Column Position | Column Name | Data Type
----------------|----------------------|----------
1 | A_KEY | NUMBER
2 | NEWUPDATED | DATE
3 | IDLOADDATE_DIM | DATE
... | ... | ...
72 | A_ETL_LOAD_SET_FK | NUMBER ← Key column
... | ... | ...
129 | PLACEHOLDER50 | VARCHAR2
```
### External Table Structure (ODS.CSDB_DEBT_DAILY_ODS)
```
Column Position | Column Name | Data Type
----------------|----------------------------|----------
1 | A_KEY | NUMBER
2 | A_WORKFLOW_HISTORY_KEY | NUMBER ← Expected here!
3 | NEWUPDATED | DATE
4 | IDLOADDATE_DIM | DATE
... | ... | ...
129 | PLACEHOLDER50 | VARCHAR2
```
### Export Query Generated
```sql
SELECT
T.A_KEY, -- Position 1
T.NEWUPDATED, -- Position 2
T.IDLOADDATE_DIM, -- Position 3
...,
T.A_ETL_LOAD_SET_FK AS A_WORKFLOW_HISTORY_KEY, -- Position 72
...
FROM OU_CSDB.LEGACY_DEBT_DAILY T
```
### CSV File Structure
```csv
A_KEY,NEWUPDATED,IDLOADDATE_DIM,...,A_WORKFLOW_HISTORY_KEY,...
1,2026-01-08,2025-12-01,...,999,...
```
### What External Table Reads
```
Position 1 → A_KEY (NUMBER) ✅ Reads: 1 (correct)
Position 2 → A_WORKFLOW_HISTORY_KEY ❌ Reads: 2026-01-08 (NEWUPDATED as DATE)
Converts DATE → NUMBER → NULL
Position 3 → NEWUPDATED (DATE) ❌ Reads: 2025-12-01 (IDLOADDATE_DIM)
...
```
---
## Experimental Verification
### Test 1: CSV with Mismatched Column Order
**CSV File Content:**
```csv
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999
```
**External Table Definition (Wrong Order):**
```sql
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_WRONG (
A_KEY NUMBER,
A_WORKFLOW_HISTORY_KEY NUMBER, -- Position 2
NEWUPDATED DATE -- Position 3
)
```
**Result:**
```
❌ ORA-30653: reject limit reached
```
**Reason**: Oracle tried to convert NEWUPDATED (DATE) to A_WORKFLOW_HISTORY_KEY (NUMBER) → Conversion failed
---
### Test 2: CSV with Matching Column Order
**CSV File Content:**
```csv
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
1,2026-01-08,999
```
**External Table Definition (Correct Order):**
```sql
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_MATCHING (
A_KEY NUMBER,
NEWUPDATED DATE, -- Position 2
A_WORKFLOW_HISTORY_KEY NUMBER -- Position 3
)
```
**Result:**
```
✅ SUCCESS
A_KEY=1, NEWUPDATED=2026-01-08, A_WORKFLOW_HISTORY_KEY=999
```
---
### Test 3: Parquet with Mismatched Column Order
**Parquet File Columns:**
```
A_KEY (NUMBER), NEWUPDATED (DATE), A_WORKFLOW_HISTORY_KEY (NUMBER)
```
**External Table Definition (Different Order):**
```sql
CREATE EXTERNAL TABLE TEST_PARQUET_ORDER_WRONG (
A_KEY NUMBER,
A_WORKFLOW_HISTORY_KEY NUMBER, -- Different position than Parquet!
NEWUPDATED DATE
) ... FORMAT parquet
```
**Result:**
```
✅ SUCCESS
A_KEY=1, A_WORKFLOW_HISTORY_KEY=999, NEWUPDATED=2026-01-08
```
**Reason**: Parquet uses **schema-based mapping** - matches by column name, not position
---
## Format Comparison
| Aspect | CSV | Parquet |
|--------|-----|---------|
| **Column Mapping** | By position (1, 2, 3, ...) | By column name (schema) |
| **Header Usage** | Ignored (cosmetic only) | Used for mapping |
| **Order Sensitivity** | ❌ HIGH - Must match exactly | ✅ LOW - Order irrelevant |
| **Type Mismatch** | Reject/NULL conversion | Proper validation |
| **Current Status** | ❌ BROKEN for ODS exports | ✅ WORKING for ARCHIVE |
---
## Current Implementation Issue
### processColumnList Function (DATA_EXPORTER.pkb)
```sql
FUNCTION processColumnList(...) RETURN VARCHAR2 IS
BEGIN
IF pColumnList IS NULL THEN
-- Build list of all columns from SOURCE table
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
INTO vAllCols
FROM all_tab_columns
WHERE table_name = pTableName -- Source: LEGACY_DEBT_DAILY
AND owner = pSchemaName;
-- Columns ordered by SOURCE table structure (1, 2, ..., 72, ...)
vResult := 'T.' || REPLACE(vAllCols, ', ', ', T.');
-- Add alias at original position (72)
vResult := REPLACE(vResult, 'T.' || pKeyColumnName,
'T.' || pKeyColumnName || ' AS A_WORKFLOW_HISTORY_KEY');
RETURN vResult;
END IF;
-- ...
END;
```
**Problem**: Generates columns in **LEGACY_DEBT_DAILY** order, not **CSDB_DEBT_DAILY_ODS** order!
---
## Real-World Impact Example
### Export Command
```sql
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
pSchemaName => 'OU_CSDB',
pTableName => 'LEGACY_DEBT_DAILY',
pKeyColumnName => 'A_ETL_LOAD_SET_FK',
pBucketArea => 'DATA',
pFolderName => 'ODS/CSDB/CSDB_DEBT_DAILY',
pMinDate => DATE '2025-01-01',
pMaxDate => SYSDATE,
pParallelDegree => 8
);
```
### Result
-**Export completed successfully** - 6 CSV files created
-**Parquet export to ARCHIVE** - A_WORKFLOW_HISTORY_KEY populated correctly
-**CSV read via ODS.CSDB_DEBT_DAILY_ODS** - Returns NULL for A_WORKFLOW_HISTORY_KEY
### Verification Query
```sql
-- Returns 0 non-NULL values
SELECT COUNT(*)
FROM ODS.CSDB_DEBT_DAILY_ODS
WHERE A_WORKFLOW_HISTORY_KEY IS NOT NULL;
-- Returns: 0
```
---
## Solution Options
### Option 1: Rebuild External Tables (RECOMMENDED for existing system)
**Action**: Update all ODS external table definitions to match source table column order
**Pros:**
- No code changes in DATA_EXPORTER
- Maintains backward compatibility
- One-time migration effort
**Cons:**
- Requires access to all external table definitions
- Must coordinate with downstream consumers
---
### Option 2: Modify Export Logic (RECOMMENDED for new exports)
**Action**: Update `processColumnList` to query external table structure and match column order
**Implementation:**
```sql
FUNCTION processColumnList(..., pExternalTableOwner VARCHAR2, pExternalTableName VARCHAR2) IS
BEGIN
IF pExternalTableName IS NOT NULL THEN
-- Get column order from EXTERNAL table, not source table
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
INTO vAllCols
FROM all_tab_columns
WHERE table_name = pExternalTableName
AND owner = pExternalTableOwner;
-- Build SELECT mapping source columns to external table order
-- ...
ELSE
-- Existing logic (source table order)
-- ...
END IF;
END;
```
**Pros:**
- Future-proof for new tables
- Explicit mapping control
- Self-documenting exports
**Cons:**
- Requires API change (new parameters)
- More complex logic
- Performance overhead (additional metadata query)
---
### Option 3: Use Parquet Format for ODS
**Action**: Switch ODS exports from CSV to Parquet
**Pros:**
- Eliminates column order issue
- Better compression
- Stronger typing
**Cons:**
- May break downstream Airflow/DBT pipelines expecting CSV
- Larger migration effort
---
## Recommendations
### Immediate Action (Hot Fix)
1. ✅ Document issue (this document)
2. Identify all affected CSV exports to ODS
3. Choose between Option 1 (rebuild tables) or Option 2 (modify export)
### Long-term Strategy
1. Migrate ODS exports to **Parquet format** where possible
2. Maintain CSV only for systems requiring it
3. Add validation tests comparing source vs external table column order
4. Update documentation to warn about CSV positional mapping
---
## Testing Checklist
- [x] Verify CSV positional mapping behavior
- [x] Verify Parquet schema-based mapping behavior
- [x] Confirm NULL values in ODS.CSDB_DEBT_DAILY_ODS
- [x] Confirm correct values in CSDB_DEBT_DAILY_ARCHIVE (Parquet)
- [ ] Test proposed solution
- [ ] Validate all affected external tables
- [ ] Update deployment procedures
---
## Related Issues
- MARS-835-PREHOOK: Parallel processing implementation (exposed this issue)
- A_WORKFLOW_HISTORY_KEY aliasing fix (partial - doesn't solve column order)
---
## References
- Oracle Documentation: External Tables - Data Access Parameters
- DBMS_CLOUD.EXPORT_DATA format options
- Oracle External Tables Column Mapping Behavior (CSV vs Parquet)
---
**Document Status**: DRAFT - Awaiting solution decision
**Last Updated**: 2026-01-08
**Author**: GitHub Copilot (Analysis Session)

Binary file not shown.