You are here:
Unstructured Data File Formats and Connectors
Data 360 supports multiple file formats and connector types for unstructured data.
Supported File Formats for Unstructured Data
Data 360 supports these file formats for unstructured data and search index configurations.
| FILE Format | File Extension | Search indexing notes | harmonization notes |
|---|---|---|---|
| Limited support for content formatted as tables. | These elements are supported:
These elements are not supported:
|
||
| Text | .txt | ||
| HTML | .html | Limited support for content formatted as tables. | Supported |
| CSV | .csv | Each row is a chunk. | |
| Log | .log | Each row is a chunk. | |
| Excel | .xlsx | Limited support for content formatted as tables. | |
| PowerPoint | .pptx | Chunked by heading. Limited support for content formatted as tables. | |
| Rich text format | .rtf | Chunked by heading or paragraph. Limited support for content formatted as tables. | |
| Word | .docx | Chunked by heading or paragraph. Limited support for content formatted as tables. | Supported |
| Messages | .msg | Chunked by body and attachments. | |
| .eml | Chunked by body, headers, and attachments. | ||
| XML | .xml | Chunked by XML element tag. | |
| Markdown | .md | Supported |
| FILE Format | File Extension | SEARCH INDEXING Notes |
|---|---|---|
| MPEG | .mpg | Audio only |
| MPEG-3 | .mp3 | Audio only |
| MPEG-4 | .mp4 | Audio only |
| MPGA | .mpga | Audio only |
| Ogg | .ogg | Audio only |
| Wav | .wav | Audio only |
| FLAC | .flac | Audio only |
| WebM | .webm | Audio only |
| JPEG (Beta) | .jpg | Image-based files. Embeddings include extracted text and captions The maximum size of an image that can be processed is 20 MB. |
| PNG (Beta) | .png | Image-based files. Embeddings include extracted text and captions. The maximum size of an image that can be processed is 20 MB. |
| PDF (Beta) | PDFs with Images. Maximum size of a PDF that can be processed is 100 MB |
Unstructured Data Connectors
Data 360 has several connectors built for ingesting unstructured data from third-party applications or sources. You can also create a data lake object to ingest unstructured data through an existing connector.
- Amazon S3
- Azure Storage
- Box
- GitHub
- Google Cloud Storage
- Google Drive
- Guru
- Microsoft SharePoint
- MuleSoft Direct
- Web Content (Crawler)
- Web Content (Sitemap)
- YouTube
- Zendesk
Additionally, you can ingest ingest unstructured data from file attachments on Salesforce objects or upload files directly into Agentforce Data Libraries.
Unstructured Data Limits and Guidelines
For limits and guidelines related to supported file types, see Data 360 Limits and Guidelines.
For billing info, see Billing Considerations for Unstructured Data and Search Index.
Monitor Unstructured Data Connectors
For methods of monitoring your unstructured data streams and ingestion progress, see Monitor the Status of Unstructured Connectors Data Ingestion.

