Develop
This commit is contained in:
@@ -0,0 +1,338 @@
|
||||
# Oracle External Tables - Column Order Mapping Issue
|
||||
|
||||
## Document Information
|
||||
- **Date**: 2026-01-08
|
||||
- **Issue**: CSV External Tables map columns by position, not by name
|
||||
- **Impact**: HIGH - NULL values in A_WORKFLOW_HISTORY_KEY column for CSV exports
|
||||
- **Status**: IDENTIFIED - Solution required
|
||||
|
||||
---
|
||||
|
||||
## Problem Summary
|
||||
|
||||
Oracle External Tables using **CSV format** map columns **by position** (ignoring header names), while **Parquet format** uses **schema-based mapping** (by column name). This causes critical data loss when exporting to CSV files that are read by external tables with different column ordering than the source table.
|
||||
|
||||
### Affected Operations
|
||||
- ✅ **Parquet exports to ARCHIVE** - Working correctly (uses column names)
|
||||
- ❌ **CSV exports to ODS/DATA** - Returns NULL for A_WORKFLOW_HISTORY_KEY (positional mismatch)
|
||||
|
||||
---
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Source Table Structure (OU_CSDB.LEGACY_DEBT_DAILY)
|
||||
```
|
||||
Column Position | Column Name | Data Type
|
||||
----------------|----------------------|----------
|
||||
1 | A_KEY | NUMBER
|
||||
2 | NEWUPDATED | DATE
|
||||
3 | IDLOADDATE_DIM | DATE
|
||||
... | ... | ...
|
||||
72 | A_ETL_LOAD_SET_FK | NUMBER ← Key column
|
||||
... | ... | ...
|
||||
129 | PLACEHOLDER50 | VARCHAR2
|
||||
```
|
||||
|
||||
### External Table Structure (ODS.CSDB_DEBT_DAILY_ODS)
|
||||
```
|
||||
Column Position | Column Name | Data Type
|
||||
----------------|----------------------------|----------
|
||||
1 | A_KEY | NUMBER
|
||||
2 | A_WORKFLOW_HISTORY_KEY | NUMBER ← Expected here!
|
||||
3 | NEWUPDATED | DATE
|
||||
4 | IDLOADDATE_DIM | DATE
|
||||
... | ... | ...
|
||||
129 | PLACEHOLDER50 | VARCHAR2
|
||||
```
|
||||
|
||||
### Export Query Generated
|
||||
```sql
|
||||
SELECT
|
||||
T.A_KEY, -- Position 1
|
||||
T.NEWUPDATED, -- Position 2
|
||||
T.IDLOADDATE_DIM, -- Position 3
|
||||
...,
|
||||
T.A_ETL_LOAD_SET_FK AS A_WORKFLOW_HISTORY_KEY, -- Position 72
|
||||
...
|
||||
FROM OU_CSDB.LEGACY_DEBT_DAILY T
|
||||
```
|
||||
|
||||
### CSV File Structure
|
||||
```csv
|
||||
A_KEY,NEWUPDATED,IDLOADDATE_DIM,...,A_WORKFLOW_HISTORY_KEY,...
|
||||
1,2026-01-08,2025-12-01,...,999,...
|
||||
```
|
||||
|
||||
### What External Table Reads
|
||||
```
|
||||
Position 1 → A_KEY (NUMBER) ✅ Reads: 1 (correct)
|
||||
Position 2 → A_WORKFLOW_HISTORY_KEY ❌ Reads: 2026-01-08 (NEWUPDATED as DATE)
|
||||
Converts DATE → NUMBER → NULL
|
||||
Position 3 → NEWUPDATED (DATE) ❌ Reads: 2025-12-01 (IDLOADDATE_DIM)
|
||||
...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Experimental Verification
|
||||
|
||||
### Test 1: CSV with Mismatched Column Order
|
||||
|
||||
**CSV File Content:**
|
||||
```csv
|
||||
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
|
||||
1,2026-01-08,999
|
||||
```
|
||||
|
||||
**External Table Definition (Wrong Order):**
|
||||
```sql
|
||||
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_WRONG (
|
||||
A_KEY NUMBER,
|
||||
A_WORKFLOW_HISTORY_KEY NUMBER, -- Position 2
|
||||
NEWUPDATED DATE -- Position 3
|
||||
)
|
||||
```
|
||||
|
||||
**Result:**
|
||||
```
|
||||
❌ ORA-30653: reject limit reached
|
||||
```
|
||||
**Reason**: Oracle tried to convert NEWUPDATED (DATE) to A_WORKFLOW_HISTORY_KEY (NUMBER) → Conversion failed
|
||||
|
||||
---
|
||||
|
||||
### Test 2: CSV with Matching Column Order
|
||||
|
||||
**CSV File Content:**
|
||||
```csv
|
||||
A_KEY,NEWUPDATED,A_WORKFLOW_HISTORY_KEY
|
||||
1,2026-01-08,999
|
||||
```
|
||||
|
||||
**External Table Definition (Correct Order):**
|
||||
```sql
|
||||
CREATE EXTERNAL TABLE TEST_COLUMN_ORDER_MATCHING (
|
||||
A_KEY NUMBER,
|
||||
NEWUPDATED DATE, -- Position 2
|
||||
A_WORKFLOW_HISTORY_KEY NUMBER -- Position 3
|
||||
)
|
||||
```
|
||||
|
||||
**Result:**
|
||||
```
|
||||
✅ SUCCESS
|
||||
A_KEY=1, NEWUPDATED=2026-01-08, A_WORKFLOW_HISTORY_KEY=999
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Test 3: Parquet with Mismatched Column Order
|
||||
|
||||
**Parquet File Columns:**
|
||||
```
|
||||
A_KEY (NUMBER), NEWUPDATED (DATE), A_WORKFLOW_HISTORY_KEY (NUMBER)
|
||||
```
|
||||
|
||||
**External Table Definition (Different Order):**
|
||||
```sql
|
||||
CREATE EXTERNAL TABLE TEST_PARQUET_ORDER_WRONG (
|
||||
A_KEY NUMBER,
|
||||
A_WORKFLOW_HISTORY_KEY NUMBER, -- Different position than Parquet!
|
||||
NEWUPDATED DATE
|
||||
) ... FORMAT parquet
|
||||
```
|
||||
|
||||
**Result:**
|
||||
```
|
||||
✅ SUCCESS
|
||||
A_KEY=1, A_WORKFLOW_HISTORY_KEY=999, NEWUPDATED=2026-01-08
|
||||
```
|
||||
**Reason**: Parquet uses **schema-based mapping** - matches by column name, not position
|
||||
|
||||
---
|
||||
|
||||
## Format Comparison
|
||||
|
||||
| Aspect | CSV | Parquet |
|
||||
|--------|-----|---------|
|
||||
| **Column Mapping** | By position (1, 2, 3, ...) | By column name (schema) |
|
||||
| **Header Usage** | Ignored (cosmetic only) | Used for mapping |
|
||||
| **Order Sensitivity** | ❌ HIGH - Must match exactly | ✅ LOW - Order irrelevant |
|
||||
| **Type Mismatch** | Reject/NULL conversion | Proper validation |
|
||||
| **Current Status** | ❌ BROKEN for ODS exports | ✅ WORKING for ARCHIVE |
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation Issue
|
||||
|
||||
### processColumnList Function (DATA_EXPORTER.pkb)
|
||||
```sql
|
||||
FUNCTION processColumnList(...) RETURN VARCHAR2 IS
|
||||
BEGIN
|
||||
IF pColumnList IS NULL THEN
|
||||
-- Build list of all columns from SOURCE table
|
||||
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
|
||||
INTO vAllCols
|
||||
FROM all_tab_columns
|
||||
WHERE table_name = pTableName -- Source: LEGACY_DEBT_DAILY
|
||||
AND owner = pSchemaName;
|
||||
|
||||
-- Columns ordered by SOURCE table structure (1, 2, ..., 72, ...)
|
||||
vResult := 'T.' || REPLACE(vAllCols, ', ', ', T.');
|
||||
|
||||
-- Add alias at original position (72)
|
||||
vResult := REPLACE(vResult, 'T.' || pKeyColumnName,
|
||||
'T.' || pKeyColumnName || ' AS A_WORKFLOW_HISTORY_KEY');
|
||||
|
||||
RETURN vResult;
|
||||
END IF;
|
||||
-- ...
|
||||
END;
|
||||
```
|
||||
|
||||
**Problem**: Generates columns in **LEGACY_DEBT_DAILY** order, not **CSDB_DEBT_DAILY_ODS** order!
|
||||
|
||||
---
|
||||
|
||||
## Real-World Impact Example
|
||||
|
||||
### Export Command
|
||||
```sql
|
||||
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
|
||||
pSchemaName => 'OU_CSDB',
|
||||
pTableName => 'LEGACY_DEBT_DAILY',
|
||||
pKeyColumnName => 'A_ETL_LOAD_SET_FK',
|
||||
pBucketArea => 'DATA',
|
||||
pFolderName => 'ODS/CSDB/CSDB_DEBT_DAILY',
|
||||
pMinDate => DATE '2025-01-01',
|
||||
pMaxDate => SYSDATE,
|
||||
pParallelDegree => 8
|
||||
);
|
||||
```
|
||||
|
||||
### Result
|
||||
- ✅ **Export completed successfully** - 6 CSV files created
|
||||
- ✅ **Parquet export to ARCHIVE** - A_WORKFLOW_HISTORY_KEY populated correctly
|
||||
- ❌ **CSV read via ODS.CSDB_DEBT_DAILY_ODS** - Returns NULL for A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
### Verification Query
|
||||
```sql
|
||||
-- Returns 0 non-NULL values
|
||||
SELECT COUNT(*)
|
||||
FROM ODS.CSDB_DEBT_DAILY_ODS
|
||||
WHERE A_WORKFLOW_HISTORY_KEY IS NOT NULL;
|
||||
|
||||
-- Returns: 0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Solution Options
|
||||
|
||||
### Option 1: Rebuild External Tables (RECOMMENDED for existing system)
|
||||
**Action**: Update all ODS external table definitions to match source table column order
|
||||
|
||||
**Pros:**
|
||||
- No code changes in DATA_EXPORTER
|
||||
- Maintains backward compatibility
|
||||
- One-time migration effort
|
||||
|
||||
**Cons:**
|
||||
- Requires access to all external table definitions
|
||||
- Must coordinate with downstream consumers
|
||||
|
||||
---
|
||||
|
||||
### Option 2: Modify Export Logic (RECOMMENDED for new exports)
|
||||
**Action**: Update `processColumnList` to query external table structure and match column order
|
||||
|
||||
**Implementation:**
|
||||
```sql
|
||||
FUNCTION processColumnList(..., pExternalTableOwner VARCHAR2, pExternalTableName VARCHAR2) IS
|
||||
BEGIN
|
||||
IF pExternalTableName IS NOT NULL THEN
|
||||
-- Get column order from EXTERNAL table, not source table
|
||||
SELECT LISTAGG(column_name, ', ') WITHIN GROUP (ORDER BY column_id)
|
||||
INTO vAllCols
|
||||
FROM all_tab_columns
|
||||
WHERE table_name = pExternalTableName
|
||||
AND owner = pExternalTableOwner;
|
||||
|
||||
-- Build SELECT mapping source columns to external table order
|
||||
-- ...
|
||||
ELSE
|
||||
-- Existing logic (source table order)
|
||||
-- ...
|
||||
END IF;
|
||||
END;
|
||||
```
|
||||
|
||||
**Pros:**
|
||||
- Future-proof for new tables
|
||||
- Explicit mapping control
|
||||
- Self-documenting exports
|
||||
|
||||
**Cons:**
|
||||
- Requires API change (new parameters)
|
||||
- More complex logic
|
||||
- Performance overhead (additional metadata query)
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Use Parquet Format for ODS
|
||||
**Action**: Switch ODS exports from CSV to Parquet
|
||||
|
||||
**Pros:**
|
||||
- Eliminates column order issue
|
||||
- Better compression
|
||||
- Stronger typing
|
||||
|
||||
**Cons:**
|
||||
- May break downstream Airflow/DBT pipelines expecting CSV
|
||||
- Larger migration effort
|
||||
|
||||
---
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Immediate Action (Hot Fix)
|
||||
1. ✅ Document issue (this document)
|
||||
2. Identify all affected CSV exports to ODS
|
||||
3. Choose between Option 1 (rebuild tables) or Option 2 (modify export)
|
||||
|
||||
### Long-term Strategy
|
||||
1. Migrate ODS exports to **Parquet format** where possible
|
||||
2. Maintain CSV only for systems requiring it
|
||||
3. Add validation tests comparing source vs external table column order
|
||||
4. Update documentation to warn about CSV positional mapping
|
||||
|
||||
---
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
- [x] Verify CSV positional mapping behavior
|
||||
- [x] Verify Parquet schema-based mapping behavior
|
||||
- [x] Confirm NULL values in ODS.CSDB_DEBT_DAILY_ODS
|
||||
- [x] Confirm correct values in CSDB_DEBT_DAILY_ARCHIVE (Parquet)
|
||||
- [ ] Test proposed solution
|
||||
- [ ] Validate all affected external tables
|
||||
- [ ] Update deployment procedures
|
||||
|
||||
---
|
||||
|
||||
## Related Issues
|
||||
- MARS-835-PREHOOK: Parallel processing implementation (exposed this issue)
|
||||
- A_WORKFLOW_HISTORY_KEY aliasing fix (partial - doesn't solve column order)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
- Oracle Documentation: External Tables - Data Access Parameters
|
||||
- DBMS_CLOUD.EXPORT_DATA format options
|
||||
- Oracle External Tables Column Mapping Behavior (CSV vs Parquet)
|
||||
|
||||
---
|
||||
|
||||
**Document Status**: DRAFT - Awaiting solution decision
|
||||
**Last Updated**: 2026-01-08
|
||||
**Author**: GitHub Copilot (Analysis Session)
|
||||
BIN
confluence/additions/package_version_history.xlsx
Normal file
BIN
confluence/additions/package_version_history.xlsx
Normal file
Binary file not shown.
Reference in New Issue
Block a user