Bootstrap Process
The bootstrap process loads existing data from your source database into Kasho while ensuring zero data loss during the transition.
Understanding Bootstrap States
The change-stream service (pg-change-stream or mysql-change-stream) operates in three states during its lifecycle:
WAITING
No replication position captured. Service is ready to begin bootstrap.
ACCUMULATING
Replication position captured. Accumulating changes during initial data load.
STREAMING
Normal operation. Streaming all changes to translicator.
How Bootstrap Works
The bootstrap process ensures data consistency by:
- Creating a consistent snapshot of the source database at a specific point in time
- Starting change accumulation from that exact point
- Loading the snapshot data into the target system
- Transitioning to streaming with all accumulated changes
This approach guarantees that no changes are lost during the initial data migration.
Running Bootstrap
Prerequisites
Before starting bootstrap:
- Ensure the change-stream service is running and in WAITING state
- Have sufficient disk space for the database dump
- Ensure network connectivity between all components
PostgreSQL
Option 1: Automated Bootstrap Script (Recommended)
The easiest way to bootstrap is using the provided script:
# Check pg-change-stream status
grpcurl -plaintext pg-change-stream:50051 change_stream.ChangeStream/GetStatus
# Should show state: WAITINGRun the bootstrap script inside the pg-change-stream container:
# Interactive mode - prompts before transitioning to streaming
docker exec -it <pg-change-stream-container> ./bootstrap-kasho-pg.sh
# Automatic mode - transitions without prompting
docker exec -it <pg-change-stream-container> \
env WAIT_FOR_BOOTSTRAP=true ./bootstrap-kasho-pg.shFinding Your Container Name Use docker ps to find your pg-change-stream container name. It will typically be
something like kasho-pg-change-stream-1 or similar based on your deployment method.
The script will:
- Create a temporary replication slot for consistent snapshot
- Signal
pg-change-streamto start accumulating changes - Take a database dump using the snapshot
- Convert the dump to change events
- Clean up temporary resources
- Transition to streaming mode
Option 2: Manual Bootstrap Process
For more control, you can run the bootstrap steps manually:
Click to expand manual bootstrap steps
Step 1: Create Temporary Snapshot
-- Connect to source database as kasho user
-- Create a temporary slot to get a consistent snapshot
SELECT slot_name, lsn, snapshot_name
FROM pg_create_logical_replication_slot('kasho_temp_slot', 'pgoutput', true);
-- Note the LSN and snapshot_name valuesStep 2: Start Accumulation
# Tell pg-change-stream to create permanent slot and start accumulating
grpcurl -plaintext \
-d '{"start_position": "<lsn-from-step-1>", "snapshot_name": "<snapshot-from-step-1>"}' \
pg-change-stream:50051 \
change_stream.ChangeStream/StartBootstrapStep 3: Take Database Dump
# Use the snapshot for consistency
pg_dump \
--snapshot=<snapshot-from-step-1> \
--no-owner \
--no-privileges \
--data-only \
"$PRIMARY_DATABASE_URL" > dump.sqlStep 4: Process Dump
# Convert dump to change events
docker run --rm \
-v $(pwd):/data \
--network your-network \
kashoio/kasho:latest \
/app/bin/pg-bootstrap-sync \
--dump-file=/data/dump.sql \
--kv-url=redis://redis:6379Step 5: Clean Up and Transition
-- Drop the temporary slot
SELECT pg_drop_replication_slot('kasho_temp_slot');# Transition to streaming mode
grpcurl -plaintext \
pg-change-stream:50051 \
change_stream.ChangeStream/CompleteBootstrapMonitoring Progress
During bootstrap, monitor the progress:
PostgreSQL
# Check current state
grpcurl -plaintext pg-change-stream:50051 change_stream.ChangeStream/GetStatus
# Watch logs
docker logs -f pg-change-stream
# Monitor accumulated changes
# The AccumulatedChanges count shows buffered changes during bootstrapLarge Database Considerations
For databases larger than 100GB:
PostgreSQL
-
Use compression:
pg_dump ... | gzip > dump.sql.gz -
Consider parallel dump:
pg_dump --jobs=4 ... -
Monitor disk space during the dump process
-
Run during low-traffic periods to minimize accumulated changes
Troubleshooting
PostgreSQL
”replication slot already exists”
This usually means a previous bootstrap wasn’t cleaned up properly:
-- Check existing slots
SELECT * FROM pg_replication_slots;
-- If kasho_slot exists and you want to start over
SELECT pg_drop_replication_slot('kasho_slot');Then restart pg-change-stream to return to WAITING state.
Bootstrap seems stuck
Check for:
- Network connectivity issues
- Disk space for dump file
- Long-running transactions blocking the snapshot
High memory usage during bootstrap
pg-bootstrap-sync processes the dump in batches. For very large tables, you may need to increase container memory limits.
After Bootstrap
Once bootstrap completes and the change-stream service is in STREAMING state:
- Verify data integrity - Compare row counts between source and target
- Monitor replication lag - Check that changes are flowing normally
- Remove dump files - Clean up temporary dump files to free disk space
Next Steps
- Learn about Transform Configuration
- Return to the Quick Start guide
- Review Configuration Options