Skip to Content
ConfigurationTransform Configuration

Transform Configuration Guide

pg-translicator requires a transforms.yml file mounted at /app/config/transforms.yml. This file defines table and column transformations.

Key Features

  1. Deterministic Transformations: The same input always produces the same output, ensuring data consistency
  2. Type-Safe: Transforms are validated against column data types
  3. Selective Processing: Only specified tables/columns are transformed
  4. Referential Integrity: Consistent transformations preserve relationships

transforms.yml File Format

The transforms.yml file controls how data is transformed during replication. It uses YAML format with the following structure:

Basic Structure:

version: v1 tables: schema.table_name: column_name: TransformationType another_column: AnotherTransformationType

Simple vs Object Notation:

You can use either simple string format or object notation:

version: v1 tables: public.users: # Simple string transforms (shorthand format) name: FakeName email: FakeEmail # Object notation (equivalent to above) company: type: FakeCompany # Regex transforms require object notation phone: type: Regex pattern: '\(?\d{3}\)?[-.\\s]?\d{3}[-.\\s]?\d{4}' replacement: '(XXX) XXX-XXXX'

Available Transform Types

Personal Information (Gofakeit-based):

  • FakeName - Full name generation
  • FakeFirstName, FakeLastName - Individual name components
  • FakeEmail - Email address generation
  • FakePhone - Phone number generation
  • FakeSSN - Social Security Number (XXX-XX-XXXX format)
  • FakeDateOfBirth - Date of birth (YYYY-MM-DD format)
  • FakeUsername, FakePassword - Account credentials

Address Information (Gofakeit-based):

  • FakeStreetAddress - Full street address
  • FakeCity, FakeState, FakeStateAbbr - Location components
  • FakeZip - ZIP codes (XXXXX or XXXXX-XXXX format)
  • FakeCountry - Country names
  • FakeLatitude, FakeLongitude - Geographic coordinates

Business Information (Gofakeit-based):

  • FakeCompany - Company names
  • FakeJobTitle - Job/position titles
  • FakeIndustry - Industry names
  • FakeProduct, FakeProductName - Product information

Text and Content (Gofakeit-based):

  • FakeParagraph, FakeSentence, FakeWord - Text generation
  • FakeCharacters, FakeDigits - String generation

Financial Information (Gofakeit-based):

  • FakeCreditCardType, FakeCreditCardNum - Financial data
  • FakeCurrency - Currency codes

Date and Time (Gofakeit-based):

  • FakeMonth, FakeMonthNum, FakeWeekDay, FakeYear - Date/time components

Custom Transforms:

  • Bool - Boolean values (deterministic custom implementation)

Pattern-Based Transforms:

  • Regex - Apply custom regular expression patterns and replacements
  • Template - Generate values using Go templates with full row context

Password Transforms:

  • PasswordBcrypt - Bcrypt password hashing with configurable cost
  • PasswordScrypt - Scrypt password hashing with configurable parameters
  • PasswordPBKDF2 - PBKDF2 password hashing with configurable iterations
  • PasswordArgon2id - Argon2id password hashing with configurable parameters

Regex Transform Details

The Regex transform allows custom pattern-based data transformation:

column_name: type: Regex pattern: 'regex_pattern' replacement: 'replacement_string'

Features:

  • Uses Go’s RE2 regex syntax (safe subset of Perl regex)
  • Supports capture groups with $1, $2, etc. in replacements
  • No lookahead/lookbehind assertions
  • Linear time complexity guaranteed

Examples:

# Phone number standardization phone: type: Regex pattern: '\+?1?\s*\(?\(\d{3}\)\)?[-.\s]*(\d{3})[-.\s]*(\d{4})' replacement: '+1 (XXX) XXX-XXXX' # SSN partial masking (keep last 4 digits) ssn: type: Regex pattern: '(\d{3})-(\d{2})-(\d{4})' replacement: 'XXX-XX-$3' # IP address masking ip_address: type: Regex pattern: '\d+\.\d+\.\d+\.\d+' replacement: 'XXX.XXX.XXX.XXX' # Credit card partial masking (keep last 4 digits) card_number: type: Regex pattern: '(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})' replacement: 'XXXX-XXXX-XXXX-$4'

Template Transform Details

The Template transform allows generating values using Go’s text/template syntax with access to the full row context:

column_name: type: Template template: 'template_string'

Features:

  • Access all columns in the row using {{.column_name}}
  • Built-in helper functions for common transformations
  • Support for conditional logic and complex business rules
  • Schema-agnostic - works with any table structure

Helper Functions:

  • lower - Convert to lowercase: {{.name | lower}}
  • upper - Convert to uppercase: {{.name | upper}}
  • slugify - Create URL-friendly slugs: {{.title | slugify}}
  • before - Extract text before separator: {{.email | before "@"}}
  • after - Extract text after separator: {{.email | after "@"}}

Examples:

# Cross-column email generation email: type: Template template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com' # Username from full name username: type: Template template: '{{.first_name | lower}}_{{.last_name | lower}}' # URL-friendly slug from title slug: type: Template template: '{{.title | lower | slugify}}' # Domain extraction domain: type: Template template: '{{.email | after "@"}}' # Templates using transformed values from other columns name: FakeName # Generates fake name first email: type: Template template: '{{.name | lower | slugify}}@company.com' # Uses fake name, not original # Conditional logic status_label: type: Template template: '{{if .active}}ACTIVE{{else}}INACTIVE{{end}}: {{.name}}' # Complex business logic display_name: type: Template template: '{{.first_name}} {{.last_name}} ({{.department | upper}})'

Template Processing Order: Template transforms are processed after all other transforms, allowing them to access the fake/transformed values instead of original data. This enables powerful cross-column transformations using already-processed data.

Password Transform Details

Password transforms generate cryptographically secure password hashes using industry-standard algorithms. All password transforms support:

  • Template Processing: The cleartext field supports Go template syntax with full row context
  • Deterministic Hashing: Same input produces same hash for referential integrity (except bcrypt)
  • Configurable Parameters: Algorithm-specific settings with secure defaults
  • Salt Control: Optional use_salt parameter (defaults to true, not applicable to bcrypt)

Common Configuration:

column_name: type: PasswordAlgorithm cleartext: 'template_or_hardcoded_value' use_salt: true # optional, defaults to true # algorithm-specific parameters...

PasswordBcrypt

Uses bcrypt with configurable work factor. Recommended for most applications.

password: type: PasswordBcrypt cleartext: '{{.username}}-changeme' cost: 12 # optional, default: 10

Parameters:

  • cost: Work factor (4-31), higher = more secure but slower. Default: 10
  • cleartext: Template or hardcoded password value

Features:

  • Built-in random salt generation (non-deterministic)
  • 72-byte password limit (longer passwords truncated)
  • Industry standard, widely supported
  • Good balance of security and performance

Note: Unlike other password transforms, bcrypt is non-deterministic by design. It generates a random salt for each hash, making every output unique even with identical inputs. The salt is embedded in the hash output, allowing password verification to work correctly.

PasswordScrypt

Uses scrypt with configurable memory and CPU costs. Good for high-security applications.

password: type: PasswordScrypt cleartext: 'secure-{{.id}}' n: 262144 # optional, default: 131072 (2^17) r: 8 # optional, default: 8 p: 1 # optional, default: 1

Parameters:

  • n: CPU/memory cost (power of 2), higher = more secure. Default: 131072
  • r: Block size. Default: 8
  • p: Parallelization. Default: 1
  • cleartext: Template or hardcoded password value
  • use_salt: Enable deterministic salting (default: true)

Features:

  • Memory-hard algorithm
  • Resistant to GPU/ASIC attacks
  • Configurable memory usage
  • Salt$hash hex format output

PasswordPBKDF2

Uses PBKDF2-HMAC-SHA256. Required for FIPS-140 compliance.

password: type: PasswordPBKDF2 cleartext: 'legacy-password' iterations: 1000000 # optional, default: 600000 hash: 'SHA256' # optional, default: 'SHA256'

Parameters:

  • iterations: Number of iterations, higher = more secure. Default: 600000
  • hash: Hash function, currently only “SHA256” supported. Default: “SHA256”
  • cleartext: Template or hardcoded password value
  • use_salt: Enable deterministic salting (default: true)

Features:

  • FIPS-140 compliant
  • Widely supported standard
  • Configurable iteration count
  • Salt$hash hex format output

PasswordArgon2id

Uses Argon2id (recommended). Winner of Password Hashing Competition.

password: type: PasswordArgon2id cleartext: '{{.email | before "@"}}-2024' time: 3 # optional, default: 3 memory: 65536 # optional, default: 65536 (64MB) threads: 4 # optional, default: 4

Parameters:

  • time: Time cost (iterations). Default: 3
  • memory: Memory cost in KB. Default: 65536 (64MB)
  • threads: Parallelism degree. Default: 4
  • cleartext: Template or hardcoded password value
  • use_salt: Enable deterministic salting (default: true)

Features:

  • Most secure algorithm available
  • Resistant to all known attacks
  • Configurable memory/time/parallelism
  • Recommended for new systems
  • Salt$hash hex format output

Password Transform Examples:

# Basic hardcoded password user_password: type: PasswordBcrypt cleartext: 'changeme123' # Template-based password using other fields admin_password: type: PasswordArgon2id cleartext: '{{.username | lower}}-admin-{{.id}}' memory: 131072 # 128MB for higher security # Legacy system compatibility old_password: type: PasswordPBKDF2 cleartext: 'legacy-{{.account_id}}' iterations: 1000000 # High-security password secure_password: type: PasswordScrypt cleartext: '{{.email | before "@"}}-{{.created_date | year}}' n: 1048576 # Higher memory cost # Testing/development (lower cost for speed) test_password: type: PasswordBcrypt cleartext: 'test123' cost: 4 # Lower cost for faster testing

Configuration Guidelines

Creating Your transforms.yml:

  1. Start Simple: Begin with a minimal configuration and add tables gradually

    version: v1 tables: public.users: email: FakeEmail
  2. Identify Sensitive Data: Focus on columns containing:

    • Personal identifiers (names, emails, phone numbers)
    • Addresses and location data
    • Financial information
    • Any data subject to privacy regulations
  3. Test Transformations: Verify transforms work with your data types:

    • String columns → String transforms (Name, Email, etc.)
    • Integer columns → Integer transforms (Year, MonthNum, etc.)
    • Boolean columns → Bool transform
  4. Consider Relationships: Use consistent transforms for related data:

    public.users: email: FakeEmail public.user_profiles: user_email: FakeEmail # Same transform maintains relationship

Example Configurations

Email Transform Options:

version: v1 tables: public.users: # Generates random email addresses email: FakeEmail # Template-based email using transformed values work_email: type: Template template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com'

E-commerce Example:

version: v1 tables: public.customers: first_name: FakeFirstName last_name: FakeLastName email: FakeEmail phone: FakePhone street_address: FakeStreetAddress city: FakeCity state: FakeStateAbbr zip_code: FakeZip public.orders: customer_email: FakeEmail billing_address: FakeStreetAddress public.payments: cardholder_name: FakeName card_number: FakeCreditCardNum

Comprehensive Example with Regex and Templates:

version: v1 tables: public.users: # Simple string transforms (shorthand format) first_name: FakeFirstName last_name: FakeLastName # Object notation (equivalent to above) company: type: FakeCompany # Template transforms - use fake names generated above email: type: Template template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com' username: type: Template template: '{{.first_name | lower}}_{{.last_name | lower}}' display_name: type: Template template: '{{.first_name}} {{.last_name}} ({{.company | upper}})' # Regex transforms require object notation phone: type: Regex pattern: '\(?\(\d{3}\)\)?[-.\\s]?\(\d{3}\)[-.\\s]?\(\d{4}\)' replacement: '(XXX) XXX-XXXX' ssn: type: Regex pattern: '(\d{3})-(\d{2})-(\d{4})' replacement: 'XXX-XX-$3' # Preserves last 4 digits ip_address: type: Regex pattern: '\d+\.\d+\.\d+\.\d+' replacement: 'XXX.XXX.XXX.XXX' public.posts: title: FakeProductName # Generate URL-friendly slug from title slug: type: Template template: '{{.title | lower | slugify}}' # Extract domain from author email author_domain: type: Template template: '{{.author_email | after "@"}}' public.credit_cards: cardholder_name: FakeName # Partial masking - keeps last 4 digits visible card_number: type: Regex pattern: '(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})' replacement: 'XXXX-XXXX-XXXX-$4' cvv: type: Regex pattern: '\d+' replacement: 'XXX' public.user_accounts: username: FakeUsername # Password with template using fake username password: type: PasswordArgon2id cleartext: '{{.username}}-secure-2024' memory: 131072 # 128MB for high security # Admin accounts get special passwords admin_password: type: PasswordBcrypt cleartext: 'admin-{{.id}}-changeme' cost: 12 # Legacy system compatibility legacy_hash: type: PasswordPBKDF2 cleartext: 'legacy-{{.account_number}}' iterations: 1000000

Validation

  • Missing /app/config/transforms.yml → Service fails to start
  • Invalid YAML syntax → Parsing error at startup
  • Unknown transform types → Runtime error during processing
  • Type mismatches → Processing error for affected columns

Troubleshooting

“Required config file /app/config/transforms.yml not found”

  • Mount the config directory with transforms.yml to /app/config

Transform errors during processing

  • Verify transform types match column data types
  • Check YAML syntax is valid
  • Ensure all referenced tables exist in your database
Last updated on