Prepare Your Data and AWS S3 Environment for Archive Import

You are here:

Prepare Your Data and AWS S3 Environment for Archive Import

To successfully import offloaded Salesforce data into the Archive managed package, you must prepare your AWS S3 environment and format your data files correctly.

Note This content relates to Archive. For Salesforce Archive, see Store Data Externally with Salesforce Archive.

Configure AWS S3 Bucket Permissions

To establish a secure connection, the service needs specific permissions to access your AWS S3 bucket. Follow these steps to create a dedicated IAM policy for the IAM user that grants the necessary permissions.

Summary of Required Permissions

List Bucket Contents
- Action: s3:ListBucket
- Resource: arn:aws:s3:::my-bucket
- Condition: Only for objects with the prefix path/to/my/data/*

Read Objects

Action: s3:GetObject
Resource: arn:aws:s3:::my-bucket/path/to/my/data/*

Use this JSON policy template to grant the required permissions.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowListFolder",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::my-bucket",
      "Condition": {
        "StringLike": {
          "s3:prefix": "path/to/my/data/*"
        }
      }
    },
    {
      "Sid": "AllowReadObjects",
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::my-bucket/path/to/my/data/*"
    }
  ]
}

Meet Data Import Prerequisites

Your imported data files must meet these requirements.

Data Validation Requirements
- In your current schema, make sure each CSV file's header includes at least 80% of the fields defined for that object.
- The total size of all data in the AWS S3 bucket can't exceed 3TB.
File Format
- Make sure each file is in CSV format.
- Each CSV file contains records for only one Salesforce object type, such as all Case records in Case.csv.
- In your current schema, make sure each CSV file name exactly matches the object's API name.
File Headers
- The first row of each CSV must be a header row.
- The headers must contain the exact API names of the fields for that object, such as FirstName.
- For objects with physical files, you must include the field that contains the file name. Specifically:
  - For Attachment, CSV header must include Name.
  - For ContentVersion, CSV header must include PathOnClient.
Location
- Place all imported data files in a single AWS S3 bucket.

When your AWS S3 environment is configured, and your data is correctly formatted, initiate the Data Load from AWS S3 to Archive.

Structure Files and Attachments for Archive Import
To import ContentVersion files and Attachment records into the Archive managed package, prepare your physical files and the corresponding metadata, stored in CSV files, so that Archive can correctly process and link your data.
Initiate the Data Load from AWS S3 to Archive
Start the process of loading your prepared data from an AWS S3 bucket into the secure staging area. This action is the first active step in the import process and prepares your data for the final archival step in the Archive managed package.
Create and Run an Archive Import Policy
After your offloaded Salesforce data has been successfully loaded into the Archive staging area, you must create and run a special import policy to complete the process. Configure, execute, and monitor this policy to move your staged data permanently into the Archive managed package.
Archive Import FAQs
Frequently asked questions about Archive Import.

Structure Files and Attachments for Archive Import

To import ContentVersion files and Attachment records into the Archive managed package, prepare your physical files and the corresponding metadata, stored in CSV files, so that Archive can correctly process and link your data.

Note This content relates to Archive. For Salesforce Archive, see Store Data Externally with Salesforce Archive.

Make sure that you meet the general data and AWS S3 environment requirements for Archive Import.

General CSV Requirements

The Id column, which contains an 18-character Salesforce ID, is mandatory for all CSV files in the import process.
A CSV header that includes the Id and all other relevant standard and custom fields defined for the object in your Salesforce schema.

Import ContentVersion Files

To import ContentVersion files:

In your S3 bucket, create a directory named ContentDocuments.
Place all your physical files, such as PDF and PNG, inside the ContentDocuments directory.

Note It's critical that each file name matches the original file name first uploaded into Salesforce. If you have two different files with the same name, this process fails.

Note If the same file is associated with multiple records in Salesforce, insert only one copy of that file in the ContentDocuments directory. The Archive managed package attaches that single file to all records that reference it.
Provide these CSV files. You import these files together, so there's no required import order.
- ContentDocument.csv: This file creates the parent ContentDocument records, which act as containers for file versions. These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentDocument ID.
  - All standard fields like Description and all custom fields like My_Custom__c from your ContentDocument schema.
- ContentVersion.csv: This file links the file version to its root ContentDocument record. These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentVersion ID.
    Note This ID must match the file name in your S3 bucket.
  - ContentDocumentId: The ID from your ContentDocument.csv. This ID links the version to its container.
  - PathOnClient: The exact original file name of the corresponding physical file in the ContentDocuments directory.
  - VersionData: This field must be empty.
  - All standard and custom fields from your ContentVersion schema.
- ContentDocumentLink.csv: This file links the ContentDocument record and all its versions to its associated root Salesforce records, such as Accounts and Cases. This These CSV fields are mandatory.
  - Id: The file's unique 18-character ContentDocumentLink ID.
  - ContentDocumentId: The ID of your ContentDocument.csv. This ID links the file version to its container.
  - LinkedEntityId: The Salesforce ID of the record associated with the file.
  - All standard and custom fields from your ContentDocumentLink schema.

Import Attachment Records

The process for importing standard attachments follows a simpler logic than ContentVersions.

In your S3 bucket, create a directory named Attachments.
Place all your attachment files inside the Attachments directory.

Note Each file must match the original file name uploaded to Salesforce. If you have two different files with the same name, this process fails.
Create a CSV file named Attachment.csv. This file contains the metadata and the root record link for every attachment file in the directory. These CSV fields are mandatory.
- Id: The attachment's unique 18-character Attachment ID.
- ParentId: The Salesforce ID of the record associated with this attachment.
- Name: This field must contain the exact original file name of the corresponding physical file in the Attachments directory.
- All standard and custom fields from your Attachment schema.
Note If you have a large volume of attachments, you can split the metadata into multiple indexed files, such as Attachment_1.csv and Attachment_2.csv.

Note All CSV files must remain in the root folder. Do not place them inside the ContentDocuments or Attachments subdirectories.

Solve File Name Collisions

A file name collision happens when two different physical files share the original name. You can't place both files in the same S3 directory. To solve this conflict:

Rename the physical file in your S3 bucket from its original name to its unique 18-character Salesforce ID followed by the file suffix. For example:
- Attachment record invoice.pdf becomes 00P5g000002aBCDEFGA.pdf
- ContentVersion file report.pdf becomes 0685g000001ABCDEFE.pdf
Add this new ID-based file name in the appropriate CSV field so that the Archive managed package can find it.

For ContentVersion files, update the PathOnClient field.

For Attachment records, update the Name field.

Validate and Initiate the Data Load

After you structure your S3 directories and place your corresponding CSVs in the root of the S3 bucket, you can initiate the data load.

We recommend that you validate the import process first.

Go to the Activities tab in Archive.
Before you start the archive, you can download the activity log to validate that all CSVs and files loaded successfully.
This log shows the number of records, such as Cases and Attachments, and a list of the physical files that are ready for import. You can use this log to identify failures, such as missing files, before you commit to the archive.

Initiate the Data Load from AWS S3 to Archive

Start the process of loading your prepared data from an AWS S3 bucket into the secure staging area. This action is the first active step in the import process and prepares your data for the final archival step in the Archive managed package.

Note This content relates to Archive. For Salesforce Archive, see Store Data Externally with Salesforce Archive.

Confirm you have configured your S3 bucket and prepared your data files as described in Prepare Your Data and AWS S3 Environment for Archive Import.
If you are importing files, ensure they are structured correctly as described in Structure Files and Attachments for Archive Import.
Ensure no other data import or policy process is currently running. A new process can only be started after the previous one has fully completed.

Initiate the Data Load

Warning To prevent errors, ensure your S3 source folder contains only the CSV files you intend to import. Remove any test files, backups, or unrelated reports before starting.

From the Archive Home Page, click the Import Data icon.
Enter the connection details for your S3 bucket:
- AWS Access Key ID
- AWS Secret Access Key
- S3 bucket Path
  Example: //my-example-bucket/path/to/folder
- S3 Region
  Example: us-east-2
Click Import to begin the process.

Note Import is currently available only from an S3 bucket.

Monitor the Data Load Process

Validation: The system first validates the Amazon S3 connection and file formats. If an error occurs, the process stops, and a message is displayed.
Staging: Upon successful validation, the data is loaded into the secure Archive staging area.
Track Progress: Go to the Activities tab to view the status of the Import-data-load activity. Progress is measured by the number of files processed.

Note The Import-data-load action is only visible on the Activities tab and doesn't appear in the Recent Activities section on the homepage.

When the Import-data-load activity in completed, you must Create and Run an Import Policy.

Create and Run an Archive Import Policy

After your offloaded Salesforce data has been successfully loaded into the Archive staging area, you must create and run a special import policy to complete the process. Configure, execute, and monitor this policy to move your staged data permanently into the Archive managed package.

Note This content relates to Archive. For Salesforce Archive, see Store Data Externally with Salesforce Archive.

This feature is only available after data is successfully loaded into the staging area. Ensure the Import-data-load activity has completed successfully before proceeding.

Create an Archive Import Policy

The option to create an import policy appears only after the system detects that data exists in the staging area.

Go to the Policies tab.
Select New Archive Policy.
Click the Archive from Imported Data tab.
Configure the policy settings as needed.

Important
To archive 1–2 million records daily, archive the entire object tree as a single process. In your Import Policy, select Archive All Related Records. This reduces processing overhead and optimizes S3 file creation.

Key Characteristics of an Import Policy

On-Demand Only: The scheduler is disabled. You must use the Run Now option.
Runs on Staged Data: The policy processes data from the staging area, not your live Salesforce org, so there's no performance impact.
No Query Customization: The policy automatically processes all staged records for the selected root object. Filters aren't available.

Configure Relationship Archiving

Don't Archive Related Records: This setting performs a targeted archive. It includes only the primary records that match your query and their directly dependent child records (those in a cascade-delete relationship). All other related records, such as those in standard lookup relationships, such as Contacts on an Account, are excluded from Archive.
Archive All Related Records: Select this option to automatically archive all child records connected to the root record via a lookup relationship. This is a comprehensive way to ensure that all related data is archived together.
Manually Select Relationships This option provides specific control, allowing you to choose exactly which direct child relationships to include in the archive (first level only). After selecting this, a list of available child relationships appears, and you can select the specific objects to be archived along with the root sObject.

Run and Monitor the Archive Import Policy

Execute the Policy: Use the Run Now option on the policy to start the final archiving process.
Monitor the Archive Job
- Track the status in the Activities tab. The job is logged with the action type Import-data-archive.
- If a job is aborted, the state of the staged data is saved so the job can be resumed later.

The system must complete the full Import-data-load and Import-data-archive cycle for the current dataset before a new import can begin. Once the Import-data-archive activity is complete, your data is securely stored, and you're free to start a new import cycle.

Retention Policy

Select the criteria from these sources.

Archived Date
Created Date
Last Modified Date

Select the amount of time you want to retain the archived data. The minimum retention period is one month, and the maximum is 99 years and 11 months.

Creating a retention period of zero years and zero months means data is automatically purged within 24 hours.

Archive Import FAQs

Frequently asked questions about Archive Import.

Note This content relates to Archive. For Salesforce Archive, see Store Data Externally with Salesforce Archive.

Q: What do I do if I can't connect to my Amazon S3 bucket?

A: If you can't connect to your S3 bucket, check your credentials and permissions. Make sure that you have the correct access key, secret key, and path.

Q: What happens if the file names or data are incorrect during the Import Data loading stage?

A: If the file names or data are incorrect, you can identify them from a list in the audit file in the Activities tab. You can fix these errors and try the loading process again.

Q: If an import is in progress, can I start a new one?

A: No, if an import is running, you can't start a new one. An error message appears in the UI that says that another process is already in progress. Wait for the current import to finish, or contact support to open the lock. After you archive all the imported records, you can start a new import. Make sure to import all the required data in the same process.

Q: Should I separate my import into multiple jobs by file type, such as loading tasks first and then images?

A: No. For optimal performance, include all CSVs and content files in a single import job. Running separate jobs for different file types, such as loading tasks in one job and images in another job, requires Archive to transfer the database to and from S3 repeatedly. This repetition significantly slows down the loading process.

Q: Can I run multiple archive policies at the same time?

A: No, you can only run one archive policy at a time. If you try to run another policy, you get an error message in the UI. Because you're working in the same database, you can't have one archive process delete data while another process is reading it.

Q: How should I select relations when archiving imported data?

A: When you define your policy, specify what lookup relationships to include. Master-detail relationships are included automatically.

If you aren't sure which lookup relationships to include, select All Relations. This option makes sure that the app archives the root record and all its related child records. If you unarchive a related child record later, the app restores the entire record hierarchy, starting from the root record.

Q: How can I monitor the progress of my archive process?

A: You can monitor the progress of your archive process by using a progress bar in the Activities tab. It shows the percentage of archived records out of the total number of records.

Q: What happens if the archive process fails to delete records from the secure staging area?

A: If the archive process fails to delete records from the staging area, an error message appears in the audit file, but the archive continues to complete. To prevent duplicates, the system doesn't archive the same record more than one time, even if a cleanup error occurs after the process.

Q: If the archive process gets stuck, can I restart it manually?

A: Yes, if the archive process gets stuck, and a new process doesn't start automatically, you can restart it manually. However, if you experience this issue, contact support to make sure that everything is working correctly.

Q: What do I do if I don't see the audit file in the Activities tab?

A: If you don't see the audit file or activity logs in the Activities tab, check your filters. Make sure you're not filtering by a start or end date that excludes the current activity. If the issue persists, contact support for assistance.

Q: How many records can be archived in a single job?

A: A single archive job can handle up to one million records. If you have more than one million records to archive, the process splits into multiple jobs, each handling one million records. This action is automated.

Q: If there are leftover records from a previous archive, can I start a new import?

A: No, you must archive the leftover records first before starting a new import. If you get stuck, support can provide information on what records are missing and require archiving.

Q: How long does the archive process take?

A: The time it takes to archive records from the staging areas depends on the number of records in your database and how many you're archiving. The progress bar in the UI in the Activities tab helps you estimate the time.

Q: What if I see a message that says an object is archived, but it wasn't?

A: This message doesn't appear if the object wasn't archived. If you see this error, it's from a previous failed delete operation. Contact support to resolve the issue and make sure that the records are properly archived.

Did this article solve your issue?

Let us know so we can improve!

Prepare Your Data and AWS S3 Environment for Archive Import

Configure AWS S3 Bucket Permissions

Meet Data Import Prerequisites

Structure Files and Attachments for Archive Import

Import ContentVersion Files

Import Attachment Records

Solve File Name Collisions

Validate and Initiate the Data Load

Initiate the Data Load from AWS S3 to Archive

Initiate the Data Load

Monitor the Data Load Process

Create and Run an Archive Import Policy

See Also

Create an Archive Import Policy

Run and Monitor the Archive Import Policy

Retention Policy

Archive Import FAQs

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Prepare Your Data and AWS S3 Environment for Archive Import

Configure AWS S3 Bucket Permissions

Meet Data Import Prerequisites

Structure Files and Attachments for Archive Import

Import ContentVersion Files

Import Attachment Records

Solve File Name Collisions

Validate and Initiate the Data Load

Initiate the Data Load from AWS S3 to Archive

Initiate the Data Load

Monitor the Data Load Process

Create and Run an Archive Import Policy

See Also

Create an Archive Import Policy

Run and Monitor the Archive Import Policy

Retention Policy

Archive Import FAQs