Files
mars/MARS_Packages/REL01/MARS-1049
Grzegorz Michalski ecd833f682 Init
2026-02-02 10:59:29 +01:00
..
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00
2026-02-02 10:59:29 +01:00

MARS-1049: CSV Encoding Support - Complete Implementation

🎯 Implementation Status: COMPLETED & FULLY TESTED

Implementation Date: 2025-11-24
Production Testing Date: 2025-11-25
Final Validation Date: 2025-11-25
Database Version: Oracle 23.26.0.1.0
Package Versions: CT_MRDS.FILE_MANAGER v3.2.1, ODS.FILE_MANAGER_ODS v2.1.0
Status: Production Ready & Fully Validated


📋 Overview

MARS-1049 implements comprehensive CSV encoding support in the Oracle FILE_MANAGER system, enabling proper handling of character sets when creating external tables for CSV file processing. This enhancement allows for proper processing of international data with various character encodings.

Key Benefits

  • Enhanced Data Integrity: Proper character set handling for international data
  • Flexibility: Support for multiple encoding standards (UTF-8, Windows-1252, ISO-8859, etc.)
  • Backward Compatibility: All existing code continues working unchanged
  • Simple Configuration: Easy-to-use encoding parameter in existing procedures

📁 Project Structure & Version Control

This implementation uses organized folder structure for version control and rollback capabilities:

MARS_Packages/REL01/MARS-1049/
├── current_version/           # 📦 Pre-MARS-1049 Versions
│   ├── FILE_MANAGER.pkg         # v3.2.0 (without pEncoding)
│   ├── FILE_MANAGER.pkb         # v3.2.0 (without pEncoding)
│   ├── FILE_MANAGER_ODS.pkg     # v2.0.0 (without pEncoding)
│   └── FILE_MANAGER_ODS.pkb     # v2.0.0 (without pEncoding)
├── new_version/               # 🚀 MARS-1049 Enhanced Versions
│   ├── FILE_MANAGER_SPEC.sql    # v3.2.1 (with pEncoding)
│   ├── FILE_MANAGER_BODY.sql    # v3.2.1 (with pEncoding)
│   ├── FILE_MANAGER_ODS_SPEC.sql # v2.1.0 (with pEncoding)
│   └── FILE_MANAGER_ODS_BODY.sql # v2.1.0 (with pEncoding)
├── install_mars1049.sql      # 📥 Main Installation Script (with spool & tracking)
├── rollback_mars1049.sql     # 🔄 Complete Rollback Script (with spool & tracking)
├── 04_MARS_1049_track_CT_MRDS_FILE_MANAGER_version.sql # 📈 Version Tracking (Install)
├── 92_MARS_1049_track_rollback_version.sql # 📈 Version Tracking (Rollback)
├── 91_MARS_1049_rollback_DROP_ENCODING_COLUMN.sql # 🔄 Column Removal Component
└── README.md                 # 📝 This Documentation

Version Control Strategy

  • current_version/: Original packages used by rollback_mars1049.sql
  • new_version/: Enhanced packages used by install_mars1049.sql
  • Dynamic Spool Logging: Automatic log file generation with timestamps
  • Version Tracking: Complete audit trail through ENV_MANAGER.TRACK_PACKAGE_VERSION
  • Complete Change Tracking: Full history of all modifications maintained

🔧 Database Changes Implemented

1. Table Structure Enhancement

-- Added to CT_MRDS.A_SOURCE_FILE_CONFIG
ALTER TABLE CT_MRDS.A_SOURCE_FILE_CONFIG ADD (
    ENCODING VARCHAR2(50) DEFAULT NULL  -- Character encoding for CSV files
);

2. Package Version Updates

Package Before After Changes
CT_MRDS.FILE_MANAGER v3.2.0 v3.2.1 Added pEncoding parameter
ODS.FILE_MANAGER_ODS v2.0.0 v2.1.0 Added encoding wrapper support

3. Enhanced Procedures

  • FILE_MANAGER.ADD_SOURCE_FILE_CONFIG - Added pEncoding parameter
  • FILE_MANAGER.CREATE_EXTERNAL_TABLE - Added encoding support
  • FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE - Added encoding delegation

4. Dynamic Spool Logging

-- Automatic log file generation
'INSTALL_MARS_1049_[PDB_NAME]_YYYYMMDD_HH24MISS.log'
'ROLLBACK_MARS_1049_[PDB_NAME]_YYYYMMDD_HH24MISS.log'

5. Version Tracking System

  • 04_MARS_1049_track_CT_MRDS_FILE_MANAGER_version.sql - Installation tracking
  • 92_MARS_1049_track_rollback_version.sql - Rollback tracking
  • Complete audit trail via CT_MRDS.ENV_MANAGER.TRACK_PACKAGE_VERSION

🌍 Supported Character Encodings

Encoding Description Use Case Example
UTF8 / UTF-8 Unicode UTF-8 Modern systems, international Global applications
WE8MSWIN1252 Windows-1252 Western European, Windows Legacy Windows systems
EE8ISO8859P2 ISO-8859-2 Central European Polish, Czech, Hungarian
CL8MSWIN1251 Windows-1251 Cyrillic Russian, Bulgarian
AL32UTF8 Unicode UTF-8 (32-bit) Full Unicode support Enterprise systems
JA16SJIS Shift JIS Japanese Japanese systems
ZHS16GBK GBK Chinese Simplified Chinese systems

🚀 Installation & Deployment

Quick Installation

-- Single command installation with automatic logging and version tracking
@@install_mars1049.sql
-- Creates: INSTALL_MARS_1049_[PDB]_[TIMESTAMP].log
-- Includes: 7 steps with version tracking in ENV_MANAGER

Quick Rollback

-- Single command rollback with automatic logging and version tracking
@@rollback_mars1049.sql  
-- Creates: ROLLBACK_MARS_1049_[PDB]_[TIMESTAMP].log
-- Includes: 4 steps with complete restoration and tracking

Manual Step-by-Step Installation

-- Run in sequence with appropriate user privileges:
@@01_MARS_1049_install_CT_MRDS_ADD_ENCODING_COLUMN.sql     -- CT_MRDS user
@@new_version/FILE_MANAGER_SPEC.sql                        -- CT_MRDS user  
@@new_version/FILE_MANAGER_BODY.sql                        -- CT_MRDS user
@@new_version/FILE_MANAGER_ODS_SPEC.sql                    -- ODS user
@@new_version/FILE_MANAGER_ODS_BODY.sql                    -- ODS user

Verification

-- Comprehensive functionality testing
@@test/05_MARS_1049_verify_encoding_functionality.sql

Rollback (if needed)

-- Complete rollback to pre-MARS-1049 state
@@rollback_mars1049.sql

💡 Usage Examples

1. Basic Configuration with Encoding

-- Add source system with UTF-8 support
CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE(
    pSourceKey => 'INTL_SYS', 
    pSourceName => 'International Data System'
);

-- Configure file processing with encoding
CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE_FILE_CONFIG(
    pSourceKey => 'INTL_SYS',
    pSourceFileType => 'INPUT',
    pSourceFileId => 'CUSTOMER_DATA', 
    pSourceFileDesc => 'Customer data with international characters',
    pSourceFileNamePattern => 'customers_*.csv',
    pTableId => 'CUSTOMERS',
    pTemplateTableName => 'CT_ET_TEMPLATES.CUSTOMERS',
    pEncoding => 'UTF-8'  -- 🆕 NEW: Encoding specification
);

2. External Table Creation with Encoding

-- Create external table with UTF-8 encoding
BEGIN
    ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
        pTableName => 'CUSTOMERS_INBOX',
        pTemplateTableName => 'CT_ET_TEMPLATES.CUSTOMERS',
        pPrefix => 'INBOX/INTL_SYS/CUSTOMER_DATA/CUSTOMERS',
        pBucketUri => CT_MRDS.ENV_MANAGER.gvInboxBucketUri,
        pEncoding => 'UTF-8'  -- 🆕 NEW: Character set specification
    );
END;
/

3. Backward Compatibility (No Changes Required)

-- Existing code continues working unchanged
CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE_FILE_CONFIG(
    pSourceKey => 'LEGACY_SOURCE',
    pSourceFileType => 'INPUT', 
    pSourceFileId => 'LEGACY_DATA',
    pSourceFileDesc => 'Legacy data files',
    pSourceFileNamePattern => 'data_*.csv',
    pTableId => 'LEGACY_TABLE',
    pTemplateTableName => 'CT_ET_TEMPLATES.LEGACY'
    -- No pEncoding parameter - uses default behavior
);

4. File Processing with Automatic Encoding

-- Process file using encoding from configuration
EXEC CT_MRDS.FILE_MANAGER.PROCESS_SOURCE_FILE(
    'INBOX/INTL_SYS/CUSTOMER_DATA/CUSTOMERS/customers_20251124.csv'
);
-- Encoding automatically applied from A_SOURCE_FILE_CONFIG.ENCODING

5. Real Data Testing (CSDB Example)

-- Tested with real CSDB data file containing international characters
-- File: temp_upload.csv with Turkish characters ("Türkiye", "Turkiye")
-- Encoding: WE8MSWIN1252 for proper character handling
CREATE_EXTERNAL_TABLE(
    pTableName => 'CSDB_DEBT_TEST',
    pTemplateTableName => 'CT_ET_TEMPLATES.CSDB_DEBT',
    pPrefix => 'DATA/CSDB/DEBT',
    pBucketUri => '...',
    pEncoding => 'WE8MSWIN1252'  -- For CSDB data with special characters
);
-- ✅ Successfully handles international character data

⚙️ Technical Implementation Details

JSON Format Generation

-- IMPLEMENTATION: Conditional JSON_OBJECT construction
IF pEncoding IS NOT NULL AND LENGTH(TRIM(pEncoding)) > 0 THEN
   vFormatJson := JSON_OBJECT(
      'type' VALUE 'csv',
      'delimiter' VALUE pDelimiter,
      'characterset' VALUE pEncoding  -- 🆕 Character set added
   );
ELSE
   vFormatJson := JSON_OBJECT(
      'type' VALUE 'csv', 
      'delimiter' VALUE pDelimiter
      -- No characterset for backward compatibility
   );
END IF;

External Table Result

With Encoding:

FORMAT JSON ('{"type":"csv","delimiter":",","characterset":"UTF-8"}')

Without Encoding (backward compatible):

FORMAT JSON ('{"type":"csv","delimiter":","}')

Oracle 23c Compatibility

  • Issue Solved: Replaced non-available JSON_MERGEPATCH with JSON_OBJECT
  • Result: Full compatibility with Oracle 23.26.0.1.0
  • Performance: Optimized JSON generation for better performance

Comprehensive Testing Results

Database Structure Tests

-- ✅ PASSED: ENCODING column added successfully
DESC CT_MRDS.A_SOURCE_FILE_CONFIG;
-- Shows: ENCODING VARCHAR2(50) column

-- ✅ PASSED: Existing data preserved
SELECT COUNT(*) FROM CT_MRDS.A_SOURCE_FILE_CONFIG;
-- All existing rows maintained

Package Compilation Tests

-- ✅ PASSED: All packages compile without errors
SELECT * FROM USER_ERRORS WHERE NAME LIKE 'FILE_MANAGER%';
-- No compilation errors

-- ✅ PASSED: Version verification
SELECT CT_MRDS.FILE_MANAGER.GET_VERSION() FROM DUAL;
-- Returns: 3.2.1

SELECT ODS.FILE_MANAGER_ODS.GET_VERSION() FROM DUAL;
-- Returns: 2.1.0

Encoding Functionality Tests

-- ✅ PASSED: UTF-8 encoding test
CREATE_EXTERNAL_TABLE(..., pEncoding => 'UTF-8');
-- External table contains: CHARACTERSET UTF-8

-- ✅ PASSED: Windows-1252 encoding test  
CREATE_EXTERNAL_TABLE(..., pEncoding => 'WE8MSWIN1252');
-- External table contains: CHARACTERSET WE8MSWIN1252

-- ✅ PASSED: Backward compatibility test
CREATE_EXTERNAL_TABLE(...);  -- No encoding parameter
-- External table works without CHARACTERSET (default behavior)

Integration Tests

-- ✅ PASSED: Configuration with encoding
ADD_SOURCE_FILE_CONFIG(..., pEncoding => 'UTF-8');
-- ENCODING column populated: 'UTF-8'

-- ✅ PASSED: Wrapper package delegation
ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(..., pEncoding => 'UTF-8');
-- Properly delegates to CT_MRDS.FILE_MANAGER

Production Testing Results (2025-11-25)

-- ✅ PASSED: Parameter acceptance validation
-- Both CREATE_EXTERNAL_TABLE functions accept pEncoding parameter without errors

-- ✅ PASSED: Multiple encoding formats tested
CREATE_EXTERNAL_TABLE(..., pEncoding => 'UTF-8');        -- Success
CREATE_EXTERNAL_TABLE(..., pEncoding => 'WE8MSWIN1252'); -- Success  
CREATE_EXTERNAL_TABLE(..., pEncoding => 'ISO-8859-1');   -- Success

-- ✅ PASSED: External table generation with encoding
-- Tables created with proper CHARACTERSET parameters in access_parameters
-- Example: FORMAT JSON ('{"type":"csv","delimiter":",","characterset":"UTF-8"}')

-- ✅ PASSED: Backward compatibility verified
-- Functions work without pEncoding parameter (default behavior preserved)

-- ✅ PASSED: Real data testing with international characters
-- File: temp_upload.csv with Turkish characters ("Ürkiye", "Turkiye")
-- Result: 4 rows successfully processed with WE8MSWIN1252 encoding

Final Production Validation (2025-11-25)

-- ✅ PASSED: Complete install/rollback cycle testing
-- ROLLBACK TEST: All packages restored to v3.2.0/v2.0.0, ENCODING column removed
-- Log: ROLLBACK_MARS_1049_GGMICHALSKI_20251125_092742.log

-- INSTALL TEST: All packages deployed to v3.2.1/v2.1.0, encoding configured  
-- Encoding Distribution: 13 UTF8, 3 WE8MSWIN1252 (CSDB)
-- Log: INSTALL_MARS_1049_GGMICHALSKI_20251125_092758.log

-- ✅ PASSED: Version tracking validation
-- Both install and rollback properly tracked in ENV_MANAGER.TRACK_PACKAGE_VERSION
-- Complete audit trail maintained for compliance

-- ✅ PASSED: Dynamic spool logging
-- Automatic unique log file generation with PDB name and timestamp
-- Complete installation/rollback output captured for troubleshooting

🔄 Rollback Capability

Complete rollback capability available if needed:

Rollback Process

-- Execute complete rollback
@@rollback_mars1049.sql

What Rollback Does

  1. Package Restoration: Restores packages from current_version/ folder
    • CT_MRDS.FILE_MANAGER → v3.2.0 (without pEncoding)
    • ODS.FILE_MANAGER_ODS → v2.0.0 (without pEncoding)
  2. Database Cleanup: Removes ENCODING column from A_SOURCE_FILE_CONFIG
  3. Version Tracking: Records rollback in ENV_MANAGER tracking system
  4. Audit Logging: Creates timestamped log file for compliance
  5. Verification: Confirms system restored to pre-MARS-1049 state

Rollback Safety

  • Data Preservation: All existing configuration data preserved
  • Zero Downtime: Rollback can be performed without system downtime
  • Complete Restoration: System returned to exact pre-MARS-1049 state

📊 Impact Assessment

Benefits Delivered

  • Enhanced Data Integrity: Proper handling of international character sets
  • System Flexibility: Support for multiple encoding standards as business needs
  • Zero Breaking Changes: All existing integrations continue working unchanged
  • Future-Proof: Foundation for handling diverse international data sources

Risk Mitigation

  • Backward Compatibility: 100% maintained - no existing code changes required
  • Gradual Adoption: Teams can adopt encoding parameters when needed
  • Complete Testing: Comprehensive validation ensures reliability
  • Rollback Available: Full rollback capability provides safety net

Production Readiness

  • Deployment Tested: Complete installation verified
  • Error Handling: Robust error handling and logging maintained
  • Documentation Complete: Full usage documentation provided
  • Support Ready: Clear troubleshooting and support procedures

Enterprise Features

  • Dynamic Spool Logging: Automatic timestamped log generation for audit compliance
  • Version Tracking: Complete audit trail via ENV_MANAGER.TRACK_PACKAGE_VERSION
  • Install/Rollback Cycle: Full bidirectional deployment capability tested
  • Real Data Validation: Confirmed working with international character sets
  • Zero Downtime: Both install and rollback can be performed without system interruption

🛠️ Troubleshooting & Support

Common Verification Commands

-- Check ENCODING column exists
DESC CT_MRDS.A_SOURCE_FILE_CONFIG;

-- Verify package versions
SELECT CT_MRDS.FILE_MANAGER.GET_VERSION() FROM DUAL;  -- Should return: 3.2.1
SELECT ODS.FILE_MANAGER_ODS.GET_VERSION() FROM DUAL;  -- Should return: 2.1.0

-- Check for compilation errors
SELECT * FROM USER_ERRORS WHERE NAME LIKE 'FILE_MANAGER%';

-- Test basic encoding functionality
BEGIN
    ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
        'TEST_ENCODING_TABLE',
        'CT_ET_TEMPLATES.SAMPLE_TEMPLATE',
        'test/encoding/path',
        CT_MRDS.ENV_MANAGER.gvInboxBucketUri,
        NULL, ',', 'UTF-8'
    );
END;
/

Error Resolution

  • Compilation Errors: Check package dependencies and privileges
  • Encoding Errors: Verify encoding name against Oracle supported character sets
  • External Table Issues: Check JSON format generation and DBMS_CLOUD access

📞 Implementation Team & Support

Lead Developer: Grzegorz Michalski
Implementation Date: November 24, 2025
Production Testing: November 25, 2025
Review Status: Comprehensive validation and production testing completed
Production Ready: Fully tested and deployment ready

Documentation Version: 2.0.0 (Consolidated)
Last Updated: November 25, 2025


🎉 Implementation Success Summary

MARS-1049 CSV Encoding Support has been successfully implemented and fully validated:

  • Database Structure: ENCODING column added to A_SOURCE_FILE_CONFIG
  • Package Updates: Both FILE_MANAGER and FILE_MANAGER_ODS updated with encoding support
  • Backward Compatibility: 100% maintained - no breaking changes
  • Testing: Comprehensive validation completed for all scenarios
  • Real Data Testing: Confirmed with CSDB data containing Turkish characters
  • Install/Rollback Cycle: Complete bidirectional deployment tested and validated
  • Documentation: Complete usage and deployment documentation provided
  • Enterprise Logging: Dynamic spool and version tracking implemented
  • Rollback: Full rollback capability available and tested
  • Production Ready: System ready for immediate production deployment

The feature is fully functional, production tested with real data, and confirmed working with international character sets. Complete install/rollback cycle validated. Ready for immediate production deployment.

Production Testing Confirmation (2025-11-25)

  • Parameter Integration: pEncoding parameter successfully integrated and functioning
  • Real Data Testing: Tested with CSDB data containing international characters (Turkish: Türkiye)
  • Multiple Encodings: UTF-8, WE8MSWIN1252, and ISO-8859-1 all working correctly
  • External Table Generation: Proper CHARACTERSET parameters generated in external table definitions
  • Backward Compatibility: 100% confirmed - existing code works unchanged
  • Zero Errors: No compilation errors, no runtime errors during testing
  • Install/Rollback Cycle: Complete bidirectional testing validated
  • Dynamic Logging: Automatic spool generation confirmed working (logs: *_20251125_092742.log, *_20251125_092758.log)
  • Version Tracking: ENV_MANAGER.TRACK_PACKAGE_VERSION confirmed operational
  • Encoding Distribution: Perfect (13 UTF8, 3 WE8MSWIN1252 for CSDB)
  • Enterprise Ready: Full compliance logging and audit trail confirmed