Salesforce ETL is the disciplined process of taking data from multiple systems, shaping it to match Salesforce standards, and loading it into your production environment. Modern APIs handle massive volumes, yet volume alone does not guarantee accuracy or compliance.
This guide explains the three critical phases of ETL operations—extraction, transformation, and loading—along with the governance, security, and compliance requirements that must be embedded throughout each phase. You'll learn when to select specific APIs, how to automate cleansing and deduplication, and how to load data without breaking referential integrity while maintaining security and regulatory compliance at every step.
API Selection Guide
Before diving into ETL phases, understanding your API options is crucial for success. Each Salesforce API serves specific use cases, from high-volume batch operations to real-time synchronization needs. Selecting the wrong API can lead to unnecessary complexity, performance bottlenecks, and even failed integrations that require complete rework.
- Bulk API v2 handles high-volume operations asynchronously, processing up to 1 TB or 100 million records every 24 hours. It streamlines operations by accepting a single CSV file and handling internal chunking automatically, eliminating the batch management overhead of v1.
- REST API works for near real-time integrations or lightweight mobile interactions, ideal for synchronous operations requiring immediate responses.
- SOAP API with XML envelopes and rich metadata support remains essential for complex relationship queries in legacy ecosystems.
Choose Bulk API v2 for large data migrations, REST for real-time synchronization, and SOAP when working with established enterprise systems requiring extensive metadata operations.
1. Extraction
Precise extraction prevents downstream rework and sets the foundation for your entire ETL pipeline. When source pulls are clean, incremental, and secure, transformation and loading steps stay predictable and efficient. This phase determines not just what data enters your pipeline, but also establishes the security posture and compliance boundaries that protect your organization throughout the process.
Extraction Techniques
After selecting your API based on volume and latency requirements, minimize payloads through incremental extraction. Filters on LastModifiedDate or SystemModstamp pull only changed records, staying within governor limits and reducing network costs. For high-frequency REST calls, build retry logic that backs off when responses return HTTP 429. Poll Bulk jobs at measured intervals instead of flooding status endpoints.
Timing prevents user contention. Schedule nightly or early-morning runs to avoid peak hours, and monitor the Salesforce API usage dashboard to keep combined integrations below daily limits. Where datasets permit, run workers in parallel but cap concurrency below the organization's simultaneous-bulk-job limit to prevent queued jobs that slow the pipeline.
Security During Extraction
Extraction security starts with authentication and access control. Use scoped OAuth tokens with minimum required permissions and implement service accounts with role-based access control. Restrict extractor IP addresses through allowlists and never store passwords in scripts or configuration files.
Monitor extraction patterns for anomalies that could indicate data exfiltration attempts. Capture comprehensive audit trails with timestamps and record counts for each API call. Push these logs into security information and event management systems for real-time alerting.
Extraction Compliance Requirements
Modern privacy regulations affect what data you can extract. GDPR and HIPAA mandate data minimization—extract only necessary attributes as required by regulations. Document which fields are extracted and why, maintaining this documentation in version control for audit purposes.
For global operations, respect data residency requirements by using regional extraction endpoints where mandated. Implement consent verification before extracting personal data, especially for cross-border transfers.
2. Transformation
Transformation acts as the quality gate that stops flawed data from infiltrating Salesforce, serving as your last line of defense before data reaches production. This phase combines automated quality controls with data standardization while maintaining security and compliance requirements. Without proper transformation, even perfectly extracted data can corrupt your Salesforce environment through duplicate records, broken relationships, or compliance violations.
Data Standardization and Quality
Transform operations must address both structure and content. Each source system likely follows different conventions for formatting, field naming, and data representation. Your transformation layer must reconcile these differences while enforcing Salesforce's strict data requirements. The following standardization practices ensure consistency across all incoming data:
- Standardize dates, phone numbers, and addresses across source systems
- Handle nulls consistently based on Salesforce field requirements
- Apply field-level validation rules before attempting loads
- Use automation scripts to eliminate manual errors and accelerate throughput
- Apply identity resolution rules in Salesforce Data Cloud to merge duplicates proactively
- Validate data types against Salesforce object definitions to avoid load failures
Referential Integrity Management
Maintaining relationship chains across objects requires careful sequencing. Load parent records first, then children, and reference parents by indexed External IDs instead of fragile Salesforce record IDs. This practice eliminates orphaned records and keeps cross-system links intact even when source keys differ.
Security During Transformation
Data protection during transformation requires encryption at rest for any staging stores. Mask or tokenize personally identifiable information before it reaches sandbox environments. This prevents sensitive production data from leaking through development and testing environments.
Store transformation rules and field mappings in version control with proper access controls. Document transformation lineage from source through Salesforce for audit purposes. Implement approval workflows for changes to transformation logic that affects regulated data types.
Transformation Compliance Requirements
Apply transformation rules to limit data scope as required by regulations. Implement retention policies that flag records for purging based on regulatory timelines. For GDPR compliance, maintain transformation logic that can support data subject requests for information about how their data was processed.
Document all transformation rules and maintain this documentation alongside your code. Auditors will need evidence of how sensitive fields were handled throughout the transformation process.
3. Loading
Loading operations directly modify live Salesforce data and require careful planning to balance speed with safety. Unlike the extraction and transformation phases, which can be repeated without affecting production, loading demands precision with strong governance controls because mistakes here immediately impact business operations. This phase must handle everything from managing relationships between objects to implementing rollback procedures when loads fail, all while maintaining performance at scale.
Processing Approach
Bulk API v2 removes the overhead of v1's batch management. Teams post a single CSV, Salesforce handles internal chunking, and retrieves one consolidated result set. This streamlining matters at scale, maximizing throughput while minimizing client-side complexity.
Record Management Strategies
Upserts with External IDs prevent duplicates while preserving relationships. Designate a stable, unique identifier in each major object, index it, then reference that field in load files. When child objects include parent External IDs, Salesforce links hierarchies automatically without requiring preliminary ID lookups.
Batch sizing impacts performance even in Bulk API 2.0. For simple objects, target 10,000 rows to optimize platform chunking. When loading records with multiple relationships, reduce batch size to prevent locking conflicts. For circular references, switch to serial mode.
Verification and Recovery
After job completion, verify success before downstream systems consume the data. Post-load verification serves as your final quality checkpoint, catching issues before they affect business operations. A comprehensive verification strategy combines automated checks with recovery procedures that minimize the impact of any failures. Implement these verification steps as part of every load operation:
- Compare source and target record counts through automated scripts
- Query AsyncApexJob to extract failure details
- Trigger alerts for error rates exceeding thresholds
- Maintain point-in-time snapshots of impacted objects for rollback capability
- Automate rollback triggers that re-import snapshots when necessary
Security and Compliance During Loading
Loading requires the strictest security controls since it directly affects production data. Implement comprehensive logging for every load operation, maintaining immutable audit trails that capture who loaded what data and when. These logs must be detailed enough for forensic analysis if issues arise.
Enforce approval workflows before production loads, especially for regulated data types. Establish data stewardship roles with clear ownership for different data domains. Create quality agreements between departments that set measurable standards for acceptable error rates and data quality thresholds.
Platform Selection and Governance Considerations
ETL success depends on selecting tools that integrate seamlessly with Salesforce while providing appropriate governance capabilities throughout all three phases. The right platform choice affects both immediate operational efficiency and long-term scalability.
Essential Platform Capabilities
Evaluate solutions based on their ability to maintain governance across the entire pipeline. Look for platforms that provide comprehensive audit trails spanning extraction, transformation, and loading. Verify support for your required compliance certifications, such as HIPAA, SOX, or FedRAMP.
Consider deployment flexibility across cloud, hybrid, or on-premises architectures based on your data residency requirements. Test compatibility with existing data sources and target systems. Native Salesforce solutions like Flosum eliminate authentication complexity between systems while keeping governance artifacts inside the platform.
Building Your ETL Strategy
Effective Salesforce data integration requires coordinated execution across all three phases with governance embedded throughout. Select APIs based on volume and latency requirements. Transform data through systematic cleansing and validation. Load through External IDs with verification procedures, maintaining referential integrity.
Start by auditing current pipelines to identify governance gaps. Document transformation logic in version control. Stress-test load jobs in sandbox environments. Evaluate native solutions that keep data inside Salesforce while maintaining comprehensive security and compliance controls.
Building Your Salesforce ETL Strategy
Salesforce ETL requires more than just moving data between systems. Each phase—extraction, transformation, and loading—must incorporate security controls, compliance requirements, and governance practices from the start. This integrated approach reduces audit preparation time while eliminating gaps that create regulatory risk.
Native toolchains improve reliability and streamline compliance through consolidated logging and controls. By embedding governance throughout your ETL pipeline rather than treating it as an afterthought, you create sustainable processes that scale with your organization.
Request a demo with Flosum to see how our Salesforce-native platform automates extraction, transformation, and loading while maintaining comprehensive audit trails and security controls throughout the entire data pipeline.