Transform Configuration Guide
pg-translicator requires a transforms.yml file mounted at /app/config/transforms.yml. This file defines table and column transformations.
Key Features
- Deterministic Transformations: The same input always produces the same output, ensuring data consistency
- Type-Safe: Transforms are validated against column data types
- Selective Processing: Only specified tables/columns are transformed
- Referential Integrity: Consistent transformations preserve relationships
transforms.yml File Format
The transforms.yml file controls how data is transformed during replication. It uses YAML format with the following structure:
Basic Structure:
version: v1
tables:
schema.table_name:
column_name: TransformationType
another_column: AnotherTransformationTypeSimple vs Object Notation:
You can use either simple string format or object notation:
version: v1
tables:
public.users:
# Simple string transforms (shorthand format)
name: FakeName
email: FakeEmail
# Object notation (equivalent to above)
company:
type: FakeCompany
# Regex transforms require object notation
phone:
type: Regex
pattern: '\(?\d{3}\)?[-.\\s]?\d{3}[-.\\s]?\d{4}'
replacement: '(XXX) XXX-XXXX'Available Transform Types
Personal Information (Gofakeit-based):
FakeName- Full name generationFakeFirstName,FakeLastName- Individual name componentsFakeEmail- Email address generationFakePhone- Phone number generationFakeSSN- Social Security Number (XXX-XX-XXXX format)FakeDateOfBirth- Date of birth (YYYY-MM-DD format)FakeUsername,FakePassword- Account credentials
Address Information (Gofakeit-based):
FakeStreetAddress- Full street addressFakeCity,FakeState,FakeStateAbbr- Location componentsFakeZip- ZIP codes (XXXXX or XXXXX-XXXX format)FakeCountry- Country namesFakeLatitude,FakeLongitude- Geographic coordinates
Business Information (Gofakeit-based):
FakeCompany- Company namesFakeJobTitle- Job/position titlesFakeIndustry- Industry namesFakeProduct,FakeProductName- Product information
Text and Content (Gofakeit-based):
FakeParagraph,FakeSentence,FakeWord- Text generationFakeCharacters,FakeDigits- String generation
Financial Information (Gofakeit-based):
FakeCreditCardType,FakeCreditCardNum- Financial dataFakeCurrency- Currency codes
Date and Time (Gofakeit-based):
FakeMonth,FakeMonthNum,FakeWeekDay,FakeYear- Date/time components
Custom Transforms:
Bool- Boolean values (deterministic custom implementation)
Pattern-Based Transforms:
Regex- Apply custom regular expression patterns and replacementsTemplate- Generate values using Go templates with full row context
Password Transforms:
PasswordBcrypt- Bcrypt password hashing with configurable costPasswordScrypt- Scrypt password hashing with configurable parametersPasswordPBKDF2- PBKDF2 password hashing with configurable iterationsPasswordArgon2id- Argon2id password hashing with configurable parameters
Regex Transform Details
The Regex transform allows custom pattern-based data transformation:
column_name:
type: Regex
pattern: 'regex_pattern'
replacement: 'replacement_string'Features:
- Uses Go’s RE2 regex syntax (safe subset of Perl regex)
- Supports capture groups with
$1,$2, etc. in replacements - No lookahead/lookbehind assertions
- Linear time complexity guaranteed
Examples:
# Phone number standardization
phone:
type: Regex
pattern: '\+?1?\s*\(?\(\d{3}\)\)?[-.\s]*(\d{3})[-.\s]*(\d{4})'
replacement: '+1 (XXX) XXX-XXXX'
# SSN partial masking (keep last 4 digits)
ssn:
type: Regex
pattern: '(\d{3})-(\d{2})-(\d{4})'
replacement: 'XXX-XX-$3'
# IP address masking
ip_address:
type: Regex
pattern: '\d+\.\d+\.\d+\.\d+'
replacement: 'XXX.XXX.XXX.XXX'
# Credit card partial masking (keep last 4 digits)
card_number:
type: Regex
pattern: '(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})'
replacement: 'XXXX-XXXX-XXXX-$4'Template Transform Details
The Template transform allows generating values using Go’s text/template syntax with access to the full row context:
column_name:
type: Template
template: 'template_string'Features:
- Access all columns in the row using
{{.column_name}} - Built-in helper functions for common transformations
- Support for conditional logic and complex business rules
- Schema-agnostic - works with any table structure
Helper Functions:
lower- Convert to lowercase:{{.name | lower}}upper- Convert to uppercase:{{.name | upper}}slugify- Create URL-friendly slugs:{{.title | slugify}}before- Extract text before separator:{{.email | before "@"}}after- Extract text after separator:{{.email | after "@"}}
Examples:
# Cross-column email generation
email:
type: Template
template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com'
# Username from full name
username:
type: Template
template: '{{.first_name | lower}}_{{.last_name | lower}}'
# URL-friendly slug from title
slug:
type: Template
template: '{{.title | lower | slugify}}'
# Domain extraction
domain:
type: Template
template: '{{.email | after "@"}}'
# Templates using transformed values from other columns
name: FakeName # Generates fake name first
email:
type: Template
template: '{{.name | lower | slugify}}@company.com' # Uses fake name, not original
# Conditional logic
status_label:
type: Template
template: '{{if .active}}ACTIVE{{else}}INACTIVE{{end}}: {{.name}}'
# Complex business logic
display_name:
type: Template
template: '{{.first_name}} {{.last_name}} ({{.department | upper}})'Template Processing Order: Template transforms are processed after all other transforms, allowing them to access the fake/transformed values instead of original data. This enables powerful cross-column transformations using already-processed data.
Password Transform Details
Password transforms generate cryptographically secure password hashes using industry-standard algorithms. All password transforms support:
- Template Processing: The
cleartextfield supports Go template syntax with full row context - Deterministic Hashing: Same input produces same hash for referential integrity (except bcrypt)
- Configurable Parameters: Algorithm-specific settings with secure defaults
- Salt Control: Optional
use_saltparameter (defaults to true, not applicable to bcrypt)
Common Configuration:
column_name:
type: PasswordAlgorithm
cleartext: 'template_or_hardcoded_value'
use_salt: true # optional, defaults to true
# algorithm-specific parameters...PasswordBcrypt
Uses bcrypt with configurable work factor. Recommended for most applications.
password:
type: PasswordBcrypt
cleartext: '{{.username}}-changeme'
cost: 12 # optional, default: 10Parameters:
cost: Work factor (4-31), higher = more secure but slower. Default: 10cleartext: Template or hardcoded password value
Features:
- Built-in random salt generation (non-deterministic)
- 72-byte password limit (longer passwords truncated)
- Industry standard, widely supported
- Good balance of security and performance
Note: Unlike other password transforms, bcrypt is non-deterministic by design. It generates a random salt for each hash, making every output unique even with identical inputs. The salt is embedded in the hash output, allowing password verification to work correctly.
PasswordScrypt
Uses scrypt with configurable memory and CPU costs. Good for high-security applications.
password:
type: PasswordScrypt
cleartext: 'secure-{{.id}}'
n: 262144 # optional, default: 131072 (2^17)
r: 8 # optional, default: 8
p: 1 # optional, default: 1Parameters:
n: CPU/memory cost (power of 2), higher = more secure. Default: 131072r: Block size. Default: 8p: Parallelization. Default: 1cleartext: Template or hardcoded password valueuse_salt: Enable deterministic salting (default: true)
Features:
- Memory-hard algorithm
- Resistant to GPU/ASIC attacks
- Configurable memory usage
- Salt$hash hex format output
PasswordPBKDF2
Uses PBKDF2-HMAC-SHA256. Required for FIPS-140 compliance.
password:
type: PasswordPBKDF2
cleartext: 'legacy-password'
iterations: 1000000 # optional, default: 600000
hash: 'SHA256' # optional, default: 'SHA256'Parameters:
iterations: Number of iterations, higher = more secure. Default: 600000hash: Hash function, currently only “SHA256” supported. Default: “SHA256”cleartext: Template or hardcoded password valueuse_salt: Enable deterministic salting (default: true)
Features:
- FIPS-140 compliant
- Widely supported standard
- Configurable iteration count
- Salt$hash hex format output
PasswordArgon2id
Uses Argon2id (recommended). Winner of Password Hashing Competition.
password:
type: PasswordArgon2id
cleartext: '{{.email | before "@"}}-2024'
time: 3 # optional, default: 3
memory: 65536 # optional, default: 65536 (64MB)
threads: 4 # optional, default: 4Parameters:
time: Time cost (iterations). Default: 3memory: Memory cost in KB. Default: 65536 (64MB)threads: Parallelism degree. Default: 4cleartext: Template or hardcoded password valueuse_salt: Enable deterministic salting (default: true)
Features:
- Most secure algorithm available
- Resistant to all known attacks
- Configurable memory/time/parallelism
- Recommended for new systems
- Salt$hash hex format output
Password Transform Examples:
# Basic hardcoded password
user_password:
type: PasswordBcrypt
cleartext: 'changeme123'
# Template-based password using other fields
admin_password:
type: PasswordArgon2id
cleartext: '{{.username | lower}}-admin-{{.id}}'
memory: 131072 # 128MB for higher security
# Legacy system compatibility
old_password:
type: PasswordPBKDF2
cleartext: 'legacy-{{.account_id}}'
iterations: 1000000
# High-security password
secure_password:
type: PasswordScrypt
cleartext: '{{.email | before "@"}}-{{.created_date | year}}'
n: 1048576 # Higher memory cost
# Testing/development (lower cost for speed)
test_password:
type: PasswordBcrypt
cleartext: 'test123'
cost: 4 # Lower cost for faster testingConfiguration Guidelines
Creating Your transforms.yml:
-
Start Simple: Begin with a minimal configuration and add tables gradually
version: v1 tables: public.users: email: FakeEmail -
Identify Sensitive Data: Focus on columns containing:
- Personal identifiers (names, emails, phone numbers)
- Addresses and location data
- Financial information
- Any data subject to privacy regulations
-
Test Transformations: Verify transforms work with your data types:
- String columns → String transforms (Name, Email, etc.)
- Integer columns → Integer transforms (Year, MonthNum, etc.)
- Boolean columns → Bool transform
-
Consider Relationships: Use consistent transforms for related data:
public.users: email: FakeEmail public.user_profiles: user_email: FakeEmail # Same transform maintains relationship
Example Configurations
Email Transform Options:
version: v1
tables:
public.users:
# Generates random email addresses
email: FakeEmail
# Template-based email using transformed values
work_email:
type: Template
template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com'E-commerce Example:
version: v1
tables:
public.customers:
first_name: FakeFirstName
last_name: FakeLastName
email: FakeEmail
phone: FakePhone
street_address: FakeStreetAddress
city: FakeCity
state: FakeStateAbbr
zip_code: FakeZip
public.orders:
customer_email: FakeEmail
billing_address: FakeStreetAddress
public.payments:
cardholder_name: FakeName
card_number: FakeCreditCardNumComprehensive Example with Regex and Templates:
version: v1
tables:
public.users:
# Simple string transforms (shorthand format)
first_name: FakeFirstName
last_name: FakeLastName
# Object notation (equivalent to above)
company:
type: FakeCompany
# Template transforms - use fake names generated above
email:
type: Template
template: '{{.first_name | lower}}.{{.last_name | lower}}@company.com'
username:
type: Template
template: '{{.first_name | lower}}_{{.last_name | lower}}'
display_name:
type: Template
template: '{{.first_name}} {{.last_name}} ({{.company | upper}})'
# Regex transforms require object notation
phone:
type: Regex
pattern: '\(?\(\d{3}\)\)?[-.\\s]?\(\d{3}\)[-.\\s]?\(\d{4}\)'
replacement: '(XXX) XXX-XXXX'
ssn:
type: Regex
pattern: '(\d{3})-(\d{2})-(\d{4})'
replacement: 'XXX-XX-$3' # Preserves last 4 digits
ip_address:
type: Regex
pattern: '\d+\.\d+\.\d+\.\d+'
replacement: 'XXX.XXX.XXX.XXX'
public.posts:
title: FakeProductName
# Generate URL-friendly slug from title
slug:
type: Template
template: '{{.title | lower | slugify}}'
# Extract domain from author email
author_domain:
type: Template
template: '{{.author_email | after "@"}}'
public.credit_cards:
cardholder_name: FakeName
# Partial masking - keeps last 4 digits visible
card_number:
type: Regex
pattern: '(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})[\s-]?(\d{4})'
replacement: 'XXXX-XXXX-XXXX-$4'
cvv:
type: Regex
pattern: '\d+'
replacement: 'XXX'
public.user_accounts:
username: FakeUsername
# Password with template using fake username
password:
type: PasswordArgon2id
cleartext: '{{.username}}-secure-2024'
memory: 131072 # 128MB for high security
# Admin accounts get special passwords
admin_password:
type: PasswordBcrypt
cleartext: 'admin-{{.id}}-changeme'
cost: 12
# Legacy system compatibility
legacy_hash:
type: PasswordPBKDF2
cleartext: 'legacy-{{.account_number}}'
iterations: 1000000Validation
- Missing
/app/config/transforms.yml→ Service fails to start - Invalid YAML syntax → Parsing error at startup
- Unknown transform types → Runtime error during processing
- Type mismatches → Processing error for affected columns
Troubleshooting
“Required config file /app/config/transforms.yml not found”
- Mount the config directory with transforms.yml to /app/config
Transform errors during processing
- Verify transform types match column data types
- Check YAML syntax is valid
- Ensure all referenced tables exist in your database