Files
mars/confluence/PROCESS_SOURCE_FILE_Guide.md
Grzegorz Michalski ecd833f682 Init
2026-02-02 10:59:29 +01:00

13 KiB

PROCESS_SOURCE_FILE Procedure Guide

This document provides comprehensive documentation for the FILE_MANAGER.PROCESS_SOURCE_FILE procedure, which validates incoming files and prepares them for loading on the Airflow+DBT side through Oracle Cloud Infrastructure (OCI) file management operations.

Overview

PROCESS_SOURCE_FILE is an umbrella procedure that validates incoming files and prepares them for downstream processing by Airflow+DBT pipelines. It orchestrates the complete workflow from file registration and validation to OCI storage preparation, ensuring files are properly validated and positioned for consumption by the Airflow+DBT data processing stack.

Key Characteristics:

  • File Validation Focus: Comprehensive validation of incoming CSV files against template structures
  • Airflow+DBT Preparation: Prepares validated files for loading and processing by Airflow+DBT pipelines
  • OCI File Management: Handles file operations and movements within Oracle Cloud Infrastructure
  • Umbrella Procedure: Coordinates multiple validation and file preparation sub-procedures in sequence
  • Automated Workflow: Requires minimal manual intervention once configured
  • Error Resilient: Comprehensive error handling and logging for validation and file operations
  • Status Tracking: Updates file processing status throughout validation and preparation workflow

Procedure Signatures

The procedure is available in two variants:

Procedure Version

PROCEDURE PROCESS_SOURCE_FILE(pSourceFileReceivedName IN VARCHAR2);

Purpose: Execute processing workflow without return value
Use Case: Standard automated processing, fire-and-forget scenarios

Function Version

FUNCTION PROCESS_SOURCE_FILE(pSourceFileReceivedName IN VARCHAR2) RETURN PLS_INTEGER;

Purpose: Execute processing workflow and return status code
Use Case: When you need to check processing result programmatically

Parameters

pSourceFileReceivedName

  • Type: VARCHAR2
  • Required: YES
  • Description: Relative path to the file within the cloud storage bucket
  • Format: INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_ID}/filename.csv

Examples:

'INBOX/C2D/UC_DISSEM/A_UC_DISSEM_METADATA_LOADS/UC_NMA_DISSEM-277740.csv'
'INBOX/TOP/ALLOTMENT/AGGREGATED_ALLOTMENT/allotment_data_20241006.csv'
'INBOX/LM/RATES/INTEREST_RATES/rates_monthly_202410.csv'

Processing Workflow

The procedure executes six main steps in sequence:

Step 1: REGISTER_SOURCE_FILE_RECEIVED

Purpose: Register file in the system and extract metadata

Actions:

  • Creates record in CT_MRDS.A_SOURCE_FILE_RECEIVED table
  • Determines source configuration based on file path pattern
  • Extracts file metadata (size, checksum, creation date)
  • Assigns unique A_SOURCE_FILE_RECEIVED_KEY
  • Sets initial status to 'RECEIVED'

Step 2: CREATE_EXTERNAL_TABLE

Purpose: Create temporary external table for data access

Actions:

  • Generates unique external table name
  • Creates external table pointing to the CSV file
  • Uses template table structure from CT_ET_TEMPLATES
  • Configures appropriate column mappings and data types

Step 3: VALIDATE_SOURCE_FILE_RECEIVED

Purpose: Perform comprehensive data validation

Actions:

  • Validates CSV column count against template
  • Checks data type compatibility
  • Verifies required fields are populated
  • Performs business rule validations
  • Updates status to 'VALIDATED' on success

Step 4: DROP_EXTERNAL_TABLE

Purpose: Clean up temporary external table

Actions:

  • Drops the temporary external table created in Step 2
  • Releases database resources
  • Maintains clean schema state

Step 5: MOVE_FILE

Purpose: Relocate file from INBOX to ODS location

Actions:

  • Copies file from INBOX bucket to ODS bucket
  • Preserves file metadata
  • Deletes original file from INBOX after successful copy

Step 6: SET_SOURCE_FILE_RECEIVED_STATUS

Purpose: Update final processing status

Actions:

  • Sets PROCESSING_STATUS to 'READY_FOR_INGESTION'
  • Records completion timestamp
  • Indicates file is validated and ready for Airflow+DBT processing

Return Values (Function Version)

Value Meaning Description
0 Success File processed successfully through all steps
-20001 Empty Parameters Both fileUri and receivedKey parameters are NULL
-20002 No Config Match No configuration matches the file pattern
-20011 Column Mismatch CSV has different column count than template
-20021 Processing Error General processing failure
Other negative Various Errors Specific error codes for different failure scenarios

Usage Examples

Basic Processing

-- Simple processing (procedure version)
BEGIN
    CT_MRDS.FILE_MANAGER.PROCESS_SOURCE_FILE(
        pSourceFileReceivedName => 'INBOX/C2D/UC_DISSEM/A_UC_DISSEM_METADATA_LOADS/data_file.csv'
    );
END;
/

Prerequisites

Before using PROCESS_SOURCE_FILE, ensure proper system configuration is in place. For detailed setup instructions including source system registration, file type configuration, template table creation, and date format configuration, see the FILE_MANAGER Configuration Guide.

Monitoring and Troubleshooting

Monitoring File Processing Status

-- Check recent file processing activity
SELECT 
    SOURCE_FILE_NAME,
    PROCESSING_STATUS,
    RECEPTION_DATE,
    EXTERNAL_TABLE_NAME
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE RECEPTION_DATE >= SYSDATE - 1  -- Last 24 hours
ORDER BY RECEPTION_DATE DESC;

Processing Status Values

Processing Status Values:

Status Description Workflow Stage
RECEIVED File registered, processing starting Initial registration
VALIDATED File validation completed successfully After successful validation
READY_FOR_INGESTION File validated and prepared for Airflow+DBT processing After successful validation and preparation
INGESTED Data has been consumed/ingested by target system After data consumption
ARCHIVED Data exported to PARQUET format and file moved to archival storage Final archival state using FILE_ARCHIVER
VALIDATION_FAILED File validation failed After failed validation

Detailed Processing Logs

-- View detailed processing logs
SELECT 
    LOG_TIMESTAMP,
    PROCEDURE_NAME,
    LOG_LEVEL,
    LOG_MESSAGE,
    PROCEDURE_PARAMETERS
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCEDURE_NAME IN ('PROCESS_SOURCE_FILE', 'REGISTER_SOURCE_FILE_RECEIVED', 
                         'CREATE_EXTERNAL_TABLE', 'VALIDATE_SOURCE_FILE_RECEIVED')
AND LOG_TIMESTAMP >= SYSDATE - 1
ORDER BY LOG_TIMESTAMP DESC;

Common Error Scenarios and Solutions

Error -20002: No Configuration Match

Problem: File path doesn't match any configured pattern

-- Check configured patterns
SELECT 
    s.A_SOURCE_KEY,
    sfc.SOURCE_FILE_ID,
    sfc.SOURCE_FILE_NAME_PATTERN,
    sfc.TABLE_ID
FROM CT_MRDS.A_SOURCE_FILE_CONFIG sfc
JOIN CT_MRDS.A_SOURCE s ON s.A_SOURCE_KEY = sfc.A_SOURCE_KEY
ORDER BY s.A_SOURCE_KEY, sfc.SOURCE_FILE_ID;

Solution: Add missing configuration or correct file naming

Error -20011: Column Count Mismatch

Problem: CSV file has different number of columns than template table

-- Check template table structure
SELECT column_name, data_type, column_id
FROM user_tab_columns 
WHERE table_name = 'YOUR_TEMPLATE_TABLE'
ORDER BY column_id;

-- Analyze validation errors
SELECT FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(file_key) FROM DUAL;

Solutions:

  1. Fix CSV file column count
  2. Add missing columns to template table
  3. Remove excess columns from CSV

File Not Found Errors

Problem: File doesn't exist in expected cloud storage location

-- List files in bucket location
SELECT object_name 
FROM DBMS_CLOUD.LIST_OBJECTS(
    credential_name => 'DEF_CRED_ARN',
    location_uri => 'https://your-bucket-uri/',
    prefix => 'INBOX/C2D/UC_DISSEM/'
) 
WHERE ROWNUM <= 20;

Solutions:

  1. Verify file upload to correct location
  2. Check file naming matches expected pattern
  3. Verify cloud storage credentials and permissions

Enhanced Error Monitoring and Logging

Error Log Monitoring

The FILE_MANAGER system provides comprehensive error logging for troubleshooting:

-- View recent processing errors
SELECT LOG_TIMESTAMP, LOG_LEVEL, LOG_MESSAGE, PROCEDURE_NAME
FROM CT_MRDS.A_PROCESS_LOG 
WHERE LOG_LEVEL = 'ERROR'
AND LOG_TIMESTAMP >= SYSDATE - 1  -- Last 24 hours
ORDER BY LOG_TIMESTAMP DESC;

-- View validation-specific errors
SELECT LOG_TIMESTAMP, LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG 
WHERE LOG_MESSAGE LIKE '%EXCESS COLUMNS%'
OR LOG_MESSAGE LIKE '%VALIDATION%'
ORDER BY LOG_TIMESTAMP DESC;

-- Analyze errors for specific file
SELECT sfl.SOURCE_FILE_NAME, pl.LOG_MESSAGE, pl.LOG_TIMESTAMP
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED sfl
JOIN CT_MRDS.A_PROCESS_LOG pl ON pl.LOG_MESSAGE LIKE '%' || sfl.SOURCE_FILE_NAME || '%'
WHERE sfl.SOURCE_FILE_NAME = 'your_file.csv'
AND pl.LOG_LEVEL = 'ERROR';

File Validation and Error Handling

The FILE_MANAGER system includes comprehensive validation features for CSV files during processing:

Pre-Processing Validation

  • Column Count Verification: Automatically checks if CSV files match template table structure
  • Error Prevention: Validates files before creating external tables to prevent processing failures
  • Detailed Error Messages: Provides specific guidance when validation fails

Common Validation Scenarios

Scenario 1: Excess Columns (Error -20011)

EXCESS COLUMNS DETECTED!
CSV file has 8 columns but template expects only 5
Excess columns: 3

Solutions:

  1. Remove excess columns from CSV file
  2. Add missing columns to template table:
    ALTER TABLE CT_ET_TEMPLATES.{SOURCE}_{TABLE_NAME} 
    ADD (NEW_COLUMN1 VARCHAR2(100), NEW_COLUMN2 NUMBER);
    

Error Analysis for File Validation

-- Find file key for analysis
SELECT A_SOURCE_FILE_RECEIVED_KEY 
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED 
WHERE SOURCE_FILE_NAME = 'your_file.csv';

-- Analyze validation errors using wrapper function
SELECT CT_MRDS.FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(file_key) FROM DUAL;

-- Example with specific key:
SELECT CT_MRDS.FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(63) FROM DUAL;

Validation Error Monitoring

-- View recent validation errors
SELECT LOG_TIMESTAMP, LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG 
WHERE LOG_LEVEL = 'ERROR'
AND (LOG_MESSAGE LIKE '%EXCESS COLUMNS%' OR LOG_MESSAGE LIKE '%VALIDATION%')
ORDER BY LOG_TIMESTAMP DESC;

Common Error Patterns and Solutions

Error Code Pattern Solution
ORA-20011 EXCESS COLUMNS DETECTED Remove excess columns from CSV or add missing columns to template table
ORA-20002 No match for source file Configure file pattern in A_SOURCE_FILE_CONFIG
ORA-29913 External table open error Check bucket paths and file existence
ORA-01821 Date format not recognized Update date format in ADD_COLUMN_DATE_FORMAT

Proactive Monitoring Setup

Set up monitoring for critical error patterns:

-- Create monitoring view for critical errors
CREATE OR REPLACE VIEW V_CRITICAL_ERRORS AS
SELECT 
  LOG_TIMESTAMP,
  PROCEDURE_NAME,
  CASE 
    WHEN LOG_MESSAGE LIKE '%ORA-20011%' THEN 'COLUMN_MISMATCH'
    WHEN LOG_MESSAGE LIKE '%ORA-20002%' THEN 'CONFIG_MISSING'
    WHEN LOG_MESSAGE LIKE '%ORA-29913%' THEN 'FILE_ACCESS'
    ELSE 'OTHER_ERROR'
  END as ERROR_CATEGORY,
  LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG
WHERE LOG_LEVEL = 'ERROR'
AND LOG_TIMESTAMP >= SYSDATE - 7;  -- Last week

This enhanced monitoring helps identify and resolve issues quickly, ensuring smooth file processing operations.

Best Practices

File Naming Conventions

  • Use consistent naming patterns that match SOURCE_FILE_NAME_PATTERN
  • Avoid special characters that might cause parsing issues

The following procedures are called internally by PROCESS_SOURCE_FILE:

  • REGISTER_SOURCE_FILE_RECEIVED: File registration and metadata extraction
  • CREATE_EXTERNAL_TABLE: External table creation for data access
  • VALIDATE_SOURCE_FILE_RECEIVED: Data validation and structure checking
  • DROP_EXTERNAL_TABLE: Cleanup of temporary external tables
  • MOVE_FILE: File relocation between buckets
  • SET_SOURCE_FILE_RECEIVED_STATUS: Status management

For detailed information about individual procedures, refer to the package documentation.

Summary

PROCESS_SOURCE_FILE is the cornerstone of the FILE PROCESSOR system, providing a complete automated workflow for validating files and preparing them for Airflow+DBT processing pipelines. Its umbrella architecture ensures consistent file validation and preparation while comprehensive error handling and logging provide visibility and reliability for enterprise file processing operations that feed into downstream Airflow+DBT data workflows.