Prepare Your Data and AWS S3 Environment for Archive Import

You are here:

Prepare Your Data and AWS S3 Environment for Archive Import

Successfully import offloaded Salesforce data into the Archive app.

Example To import data from a S3 bucket, first create a S3 bucket in one of the AWS regions. For more information, see Creating a General Purpose Bucket in the AWS S3 User Guide.

Configure S3 Bucket Permissions

To establish a secure connection, the service needs specific permissions to access your S3 bucket. Follow these steps to create a dedicated IAM policy for the IAM user that grants the necessary permissions.

Summary of Required Permissions

List Bucket Contents
- Action: s3:ListBucket
- Resource: arn:aws:s3:::my-bucket
- Condition: Only for objects with the prefix path/to/my/data/*

Read Objects

Action: s3:GetObject
Resource: arn:aws:s3:::my-bucket/path/to/my/data/*

Use this JSON policy template to grant the required permissions.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowListFolder",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::my-bucket",
      "Condition": {
        "StringLike": {
          "s3:prefix": "path/to/my/data/*"
        }
      }
    },
    {
      "Sid": "AllowReadObjects",
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/path/to/my/data/*"
    }
  ]
}

Meet Data Import Prerequisites

Your imported data files are required to meet these requirements.

Data Validation
- In your current schema, make sure each CSV file's header includes at least 80% of the fields defined for that object.
- The total size of all data in the AWS S3 bucket can't exceed 3 TB.
File Format
- Make sure that each file is in CSV format.
- Each CSV file contains records for only one Salesforce object type, such as all Case records in Case.csv.
- In your current schema, make sure each CSV file name exactly matches the object's API name.
File Headers
- The first row of each CSV is the header row.
- Headers contain the exact API names of the fields for that object, such as FirstName.
Location
- Place all imported data files in a single S3 bucket.

When your S3 environment is configured, and your data is formatted correctly, initiate the data load from AWS S3 to Archive.

Structure Files and Attachments for Archive Import
To import ContentVersion files and Attachment records into the Archive app, prepare your physical files and the corresponding metadata, stored in CSV files, so that Archive can correctly process and link your data.
Initiate the Data Load from AWS S3 to the Archive App
Start the process of loading your prepared data from an AWS S3 bucket into the secure Archive app staging area. This action is the first active step in the import process and prepares your data for the final archival step.
Create an Import Policy in the Archive App
After your offloaded data has been loaded successfully into the Archive app staging area, create and run a special import policy to move your staged data permanently into your archive.
Archive Import FAQs
Frequently asked questions about Archive Import in the Archive app.

Structure Files and Attachments for Archive Import

To import ContentVersion files and Attachment records into the Archive app, prepare your physical files and the corresponding metadata, stored in CSV files, so that Archive can correctly process and link your data.

Make sure that you meet the general data and AWS S3 environment requirements for Archive Import.

General CSV Requirements

The Id column, which contains an 18-character Salesforce ID, is mandatory for all CSV files in the import process.
A CSV header that includes the Id and all other relevant standard and custom fields defined for the object in your Salesforce schema.

Import ContentVersion Files

To import ContentVersion files:

In your S3 bucket, create a directory named ContentDocuments.
Place all your physical files, such as PDF and PNG, inside the ContentDocuments directory.

Note It's critical that each file name matches the original file name first uploaded into Salesforce. If you have two different files with the same name, this process fails.

Note If the same file is associated with multiple records in Salesforce, insert only one copy of that file in the ContentDocuments directory. The Archive app attaches that single file to all records that reference it.
Provide these CSV files. You import these files together, so there's no required import order.
- ContentDocument.csv: This file creates the parent ContentDocument records, which act as containers for file versions. These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentDocument ID.
  - All standard fields like Description and all custom fields like My_Custom__c from your ContentDocument schema.
- ContentVersion.csv: This file links the file version to its root ContentDocument record. These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentVersion ID.
    Note Make sure to match the ID with the file name in your S3 bucket.
  - ContentDocumentId: The ID from your ContentDocument.csv. This ID links the version to its container.
  - PathOnClient: The exact original file name of the corresponding physical file in the ContentDocuments directory.
  - VersionData: This field is required to be empty.
  - All standard and custom fields from your ContentVersion schema.
- ContentDocumentLink.csv: This file links the ContentDocument record and all its versions to its associated root Salesforce records, such as Accounts and Cases. This These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentDocumentLink ID.
  - ContentDocumentId: The ID of your ContentDocument.csv. This ID links the file version to its container.
  - LinkedEntityId: The Salesforce ID of the record associated with the file.
  - All standard and custom fields from your ContentDocumentLink schema.

Import Attachment Records

The process for importing standard attachments follows a simpler logic than ContentVersions.

In your S3 bucket, create a directory named Attachments.
Place all your attachment files inside the Attachments directory.

Note Each file is required to match the original file name uploaded to Salesforce. If you have two different files with the same name, this process fails.
Create a CSV file named Attachment.csv. This file contains the metadata and the root record link for every attachment file in the directory. These CSV fields are mandatory.
- Id: The attachment's unique 18-character Attachment ID.
- ParentId: The Salesforce ID of the record associated with this attachment.
- Name: This field is required to contain the exact original file name of the corresponding physical file in the Attachments directory.
- All standard and custom fields from your Attachment schema.
Note If you have a large volume of attachments, you can split the metadata into multiple indexed files, such as Attachment_1.csv and Attachment_2.csv.
Note All CSV files are required to remain in the root folder. Don’t place them inside the ContentDocuments or Attachments subdirectories.

Solve File Name Collisions

A file name collision happens when two different physical files share the original name. You can't place both files in the same S3 directory. To solve this conflict:

Rename the physical file in your S3 bucket from its original name to its unique 18-character Salesforce ID followed by the file suffix. For example:
- Attachment record invoice.pdf becomes 00P5g000002aBCDEFGA.pdf
- ContentVersion file report.pdf becomes 0685g000001ABCDEFE.pdf
Add this new ID-based file name in the appropriate CSV field so that the Archive app can find it.

For ContentVersion files, update the PathOnClient field.

For Attachment records, update the Name field.

Validate and Initiate the Data Load

After you structure your S3 directories and place your corresponding CSVs in the root of the S3 bucket, you can initiate the data load.

We recommend that you validate the import process first.

Go to the Activities tab in Archive.
Before you start the archive, you can download the activity log to validate that all CSVs and files loaded successfully.
This log shows the number of records, such as Cases and Attachments, and a list of the physical files that are ready for import. You can use this log to identify failures, such as missing files, before you commit to the archive.

Initiate the Data Load from AWS S3 to the Archive App

Start the process of loading your prepared data from an AWS S3 bucket into the secure Archive app staging area. This action is the first active step in the import process and prepares your data for the final archival step.

Configure your AWS S3 bucket and prepare your data files for Archive Import.
Verify that the files to be imported are structured correctly.
Verify that no other data import or policy process is running. A new process can only be started after the previous one has fully completed.

Initiate the Data Load

Warning To prevent errors, make sure that your S3 source folder contains only the CSV files you plan to import. Remove any test files, backups, or unrelated reports before starting.

From the Archive home page, click the Import Data icon.
Enter these connection details for your S3 bucket.
- AWS Access Key ID
- AWS Secret Access Key
- S3 Bucket Path
  Example: //my-example-bucket/path/to/folder
- S3 Region
  Example: us-east-2
To begin the process, click Import.

Note Import is available only from an S3 bucket.

Monitor the Data Load Process

The system first validates the S3 connection and file formats. If an error occurs, the process stops, and this message is shown: "Upon successful validation the data is loaded into the secure Archive staging area."

To view the status of the Import-data-load activity, go to the Activities tab. Progress is measured by the number of files processed.

Note The Import-data-load action is only visible on the Activities tab and doesn't appear in the Recent Activities section on the home page.

When the Import-data-load activity is completed, create and run an import policy.

Create an Import Policy in the Archive App

After your offloaded data has been loaded successfully into the Archive app staging area, create and run a special import policy to move your staged data permanently into your archive.

Import policies are only available after data is loaded successfully into the staging area. Verify that the Import-data-load activity has completed successfully before proceeding with a new one.

Go to the Policies tab.
Select New.
Select Archive from Imported Data.
Configure your import policy settings.

Important
To archive 1–2 million records daily, archive the entire object tree as a single process. In your Import Policy, select Archive All Related Records as the related object selection method. This option reduces processing overhead and optimizes S3 file creation.

Key Characteristics of an Import Policy

The scheduler is turned off. Use the Run Now option.
The policy automatically processes all staged records for the selected root object. Filters aren't available.
The policy processes data from the staging area, not your live org, so there's no performance impact.

Relationship Archiving Settings

Setting	Description
Archive All Related Records	This option automatically archives all child records connected to the root record via a lookup relationship.
Don't Archive Related Records	This option performs a targeted archive. The targeted archive includes only the primary records that match your query and their directly dependent child records in a cascade-delete relationship. All other related records, such as those in standard lookup relationships like Contacts on an Account, are excluded from Archive.
Manually Select Relationships	This option provides control to select exactly which direct child relationships are included during archiving. After selecting this option, a list of available child relationships appears, and you select the specific objects to be archived along with the root object.

Run and Monitor the Import Policy

To start the final archiving process, click Run Now on the import policy.
Track the status in the Activities tab.
The process is logged with the action type Import-data-archive.

Note Each archive process is allotted up to 1 million root records. If you plan to import more than 1 million root records, the archive process splits into as many separate processes, each with their own activity, as required until the import policy is complete.

The system is required to complete the full Import-data-load and Import-data-archive cycle for the current dataset before a new import can begin. When the Import-data-archive activity is complete, your data is stored securely, and you can start a new import cycle.

Retention Policy

Select the criteria from these sources.

Archived Date
Created Date
Last Modified Date

Select the amount of time that you want to retain the archived data. The minimum retention period is one month, and the maximum is 99 years and 11 months.

Creating a retention period of zero years and zero months means data is purged automatically within 24 hours.

Archive Import FAQs

Frequently asked questions about Archive Import in the Archive app.

Q: What do I do if I can't connect to my AWS S3 bucket?

A: If you can't connect to your S3 bucket, verify your credentials and permissions. Make sure that you have the correct access key, secret key, and path.

Q: What happens if the file names or data are incorrect during the Import Data loading stage?

A: If the file names or data are incorrect, identify them from a list in the audit file in the Activities tab. Fix the errors and try the loading process again.

Q: If an import is in progress, can I start a new one?

A: No, if an import is running, you can't start a new one. An error message appears in the user interface (UI) that says that another process is in progress. Wait for the current import to finish, or contact Salesforce Support. After you archive all the imported records, you can start a new import. Make sure to import all the required data in the same process.

Q: Should I separate my import into multiple jobs by file type, such as loading tasks first and then images?

A: No. For optimal performance, include all CSVs and content files in a single import job. Running separate jobs for different file types, such as loading tasks in one job and images in another job, requires Archive to transfer the database to and from S3 repeatedly. This repetition significantly slows down the loading process.

Q: Can I run multiple archive policies at the same time?

A: No, you can only run one archive policy at a time. If you try to run another policy, you get an error message in the UI. Because you're working in the same database, you can't have one archive process delete data while another process is reading it.

Q: How do I select relations when archiving imported data?

A: When you define your policy, specify what lookup relationships to include. Master-detail relationships are included automatically.

If you aren't sure which lookup relationships to include, select All Relations. This option makes sure that the app archives the root record and all its related child records. If you unarchive a related child record later, the app restores the entire record hierarchy, starting from the root record.

Q: How can I monitor the progress of my archive process?

A: You can monitor the progress of your archive process by reviewing the progress bar in the Activities tab. The progress bar shows the percentage of archived records out of the total number of records.

Q: What happens if the archive process fails to delete records from the secure staging area?

A: If the archive process fails to delete records from the staging area, an error message appears in the audit file, but the archive continues to complete. To prevent duplicates, the system doesn't archive the same record more than one time, even if a cleanup error occurs after the archive process.

Q: If the archive process gets stuck, can I restart it manually?

A: Yes, if the archive process gets stuck, and a new process doesn't start automatically, you can restart it manually. However, if you experience this issue, contact Salesforce Support to verify that everything is working correctly.

Q: What do I do if I don't see the audit file in the Activities tab?

A: If you don't see the audit file or activity logs in the Activities tab, check your filters. Verify that you're not filtering by a start or end date that excludes the current activity. If the issue persists, contact Salesforce Support for assistance.

Q: How many records can I archive in a single archive process?

A: A single archive process can handle up to one million records. If you have more than one million records to archive, the process splits into multiple processes, each handling one million records. This process is automated.

Q: If there are leftover records from a previous archive, can I start a new import?

A: No, you must archive the leftover records first before starting a new import. If you get stuck, Salesforce Support can provide information on what records are missing and require archiving.

Q: How long does the archive process take?

A: The time it takes to archive records from the staging areas depends on the number of records in your database, and how many you're archiving. The progress bar in the Activities tab can help you estimate the time.

Q: What if I see a message that says an object is archived, but it wasn't?

A: This message doesn't appear if the object wasn't archived. If you see this error, it's from a previous failed delete operation. Contact Salesforce Support to resolve the issue and verify that the records are archived properly.

Did this article solve your issue?

Let us know so we can improve!

Prepare Your Data and AWS S3 Environment for Archive Import

Configure S3 Bucket Permissions

Meet Data Import Prerequisites

See Also