Considerations and FAQs When Using Marketing Cloud APIs

This article explains key considerations for successfully using Marketing Cloud Engagement Web APIs. It covers the importance of implementing automatic retries for transient errors, adjusting timeouts, and proper logging methods to ensure stable API operation.

Key Consideration Points

Point 1: Implement Automatic Retries (Important)

API requests may occasionally return temporary errors (primarily HTTP Status Codes such as 500 or 429). The causes vary—from temporary API service load to issues on the internet route—but in real-world usage, these temporary errors are inevitable and will occur irregularly. Therefore, you must anticipate the occurrence of temporary errors in advance and implement a retry mechanism.

Specifically, add error handling logic within the application issuing the API requests. Implement a mechanism to automatically retry API requests at appropriate intervals and frequencies when a 500-range error occurs. When resending requests, it is essential to handle them from the perspective of whether the process might be executed twice (including pre- and post-processing) or if the results might differ from the intended outcome.

On the other hand, for cases such as 400 Bad Request or 401 Unauthorized, where the issue lies with the client-side request, retrying will basically never succeed. In these cases, stop automatic retries, correct the request based on the error message, and then resend.

Point 2: Consider Adjusting API Client Timeout Settings (Including Intermediary Devices)

Network latency and various other factors can cause API responses to take longer than usual. If the timeout settings on the system side using the API are inappropriate, various problems may occur. While there is no universal recommended value—as it depends on your application and system requirements—consider the following perspectives comprehensively to determine the optimal value for your system.

Frequent Errors and Duplicate Processing from Excessively Short Timeouts
If the timeout is too short, the client may treat a request as failed even when MCE has already processed it successfully. If automatic retries (Point 1) are triggered in this state, it can cause unintended side effects such as duplicate data registration.
Settings That Account for MCE Limits
To reduce this risk, configure the client-side timeout so that it does not expire before the maximum processing time specified in the API Limits and Guidelines.
Considerations for Intermediary Network Devices
API requests typically pass through proxy servers, firewalls, and other intermediary devices within your network — not just the application layer. Even if the application-side timeout is sufficient, if an intermediary device terminates the connection before a response is received, consider whether the issue lies in the timeout settings of those devices.

Point 3: Maintain Detailed Logs

To enable quick and accurate investigation when an API request errors, design your system to record detailed communication logs on the API client side. However, from the perspective of security and privacy protection, you must clearly distinguish between information to keep and information to exclude.

Information to Log (Retain as Completely as Possible)

Timestamp, HTTP method, request URL, HTTP status code
Request and response headers and bodies (excluding personal information)

Information NOT to Log (Masking Required)

If the following information appears in plaintext in logs, it creates risks of unauthorized access and data leakage. Mask (e.g., replace with ***) or hash this information before logging. Do not include this information in support inquiries.

Authentication credentials such as Access Tokens and Client Secrets
Personal information (e.g., email addresses, phone numbers, names)

Responding to 4xx Errors and Opening Support Cases

4xx errors such as 400 Bad Request indicate a problem with the request payload. Start by reviewing the error details in the response body and cross-referencing the API Errors documentation.

If no problem is found in the request and you wish to request an investigation from our support team, please mask confidential information and provide all request/response data.

※ If you provide only unique application logs or information that has been excessively processed or omitted, we may not be able to grasp the exact behavior on the MCE side, and consequently, may be unable to provide appropriate insights or solutions.

Point 4: Minimize Token Acquisition as Much as Possible

Acquiring a token (POST /v2/token) for every single process you execute will simply double your request count. A high number of requests increases the likelihood of encountering temporary errors or hitting request limits. Since tokens have a 20-minute expiration period, you should consider reusing an acquired token as much as possible within its active lifespan.

Point 5: Manage Client Secrets and Plan for Regular Rotation

Client Secrets expire 180 days after generation. To ensure systematic rotation, please track and manage the locations where they are used (Servers, SSJS, etc.), associated installed package names, and expiration dates, and plan for regular updates.

Additionally, hardcoding Client Secrets directly within your code carries risks, such as exposure to MCE administrators or API application developers/managers, and the potential for overlooking necessary updates during rotation. Consider management methods such as using configuration files or secret management services for applications, or storing them as encrypted values in Data Extensions (DEs) for reference when using SSJS.

Reference) Rotate OAuth 2.0 Client Secret

Point 6: Avoid Direct API Calls from End-User Devices (Browsers or Mobile Apps)

Architectures that execute APIs directly from browsers or mobile app devices involve the following problems. As a general rule, we strongly recommend a "Server-Mediated (Proxy) Architecture."

Security: Storing authentication credentials (Client ID and Secret) on the app side to execute APIs carries a high risk of "spoofing" or "unauthorized use" by malicious third parties. Once embedded in an app, this information is easily extracted, and it is difficult for the customer to completely control or conceal it.
Reduced Maintainability: When API changes or version upgrades are required, you must force all users to update their apps, which incurs significant time and cost for the migration. By using a proxy server, API changes can be absorbed on the server side, enabling flexible system modifications independent of the client. Furthermore, because Client Secrets expire every 180 days, an architecture where the client directly holds the secret forces mobile users to constantly update to the latest version containing a valid secret. This not only complicates release management for the provider but also imposes a continuous update burden on end users.
Negative Impact on User Experience (UX): When requests are made directly from a device, momentary network interruptions or API service maintenance can directly cause screen freezes or inconsistent data displays. By utilizing caching or controlling alternative displays on the proxy server side, the impact on end users can be minimized.

Point 7: Optimize Request Frequency and Handle 429 Errors (Rate Limiting)

To maintain the stable operation of the overall system, the API returns an HTTP 429 Too Many Requests error and temporarily restricts (throttles) requests when a large volume of requests is made from a specific account in a short period. In such cases, consider the following for stable utilization:

Respect the Retry-After Header: If a 429 error is returned, check the response header for the Retry-After value (in seconds). If you ignore this wait time and retry immediately, you may repeatedly receive 429 errors.
Traffic Leveling (Load Distribution): API rate limit thresholds are dynamic. Even if the total traffic volume is within limits, "spikes" caused by concentrated requests at a specific moment (e.g., exactly at XX:00:00) can trigger limits. We recommend leveling the load by staggering processing start times.
Reducing Request Count (Batching): In addition to avoiding redundant token acquisitions (Point 4), consider whether requests can be consolidated into batch operations where possible. (e.g., using Send email message to multiple recipients to send to multiple recipients in a single request.)

For additional guidance, refer to: Prevent Rate-Limiting

Recommended Information to Provide When Opening a Support Case

When requesting an investigation regarding API errors, please provide the following to enable prompt and accurate root cause analysis. If the error has occurred multiple times, 2–3 representative instances of the same error are sufficient.

Basic Request Information (Timestamp, Method, URL)
Required to determine when, where, and what operation was performed. Include the request timestamp (with time zone), URL, and HTTP method.
Detailed HTTP Communication Logs (Headers and Body)
Provide all headers and the body for both the request and response. This is the primary information needed to identify the specific cause of an error (such as configuration issues or limit violations).

[NOTE]

*For security, be sure to mask (e.g., replace with ***), hash, or remove Access Tokens, Client Secrets, and personal information.

*For sporadic 500 errors, as described below, temporary issues are a common cause. A detailed investigation is most productive when a clear increase in error frequency is observed. If that is the case, please report the approximate trend — such as when it began and by how much the frequency has increased.

Common Situation 1: Discrepancies in Error Counts Between API Client and Server

In some cases, the error details or occurrence counts recorded on the API client side may differ from the records on our (MCE) side.

Common Example

500 errors logged on the API client side: 10 occurrences
500 errors recorded on the MCE side: 0 occurrences

In such cases, the most likely explanation is that a proxy server or intermediary device (WAF, API Gateway, etc.) within your network is independently generating errors due to timeouts or other reasons, and terminating the connection before the request reaches MCE.

The detailed communication logs recommended in Point 3 are useful for determining whether the error source is MCE or an intermediary device. For example, if the response body contains information that is atypical of MCE API errors and appears to originate from a network device, we recommend having your network administrator review the logs of route devices before contacting support.

Common Situation 2: Detailed Investigation Requests Regarding HTTP 500 Errors

HTTP 500 errors can be triggered by a wide range of external factors—including unstable internet routes, momentary interruptions of intermediary devices, or client-side timeout settings—in addition to temporary internal issues within our service.

Many of these cases are difficult to pinpoint, or even if identified, have no solution other than "retrying." To ensure we can prioritize resources for more critical errors, please categorize the event based on the following criteria before requesting an investigation.

1. Transient Events (Cases where identifying the root cause is difficult)

These are errors that are "statistically inevitable" due to factors like temporary network congestion. Even with a post-mortem investigation, these events often leave no clear evidence in system logs, making it extremely likely that the investigation will conclude as "Reason Unknown."

Patterns of Occurrence:
- Low Frequency / Non-consecutive: Only a few requests fail within several minutes, or occur irregularly a few times over several days.
- Non-reproducible: The request succeeds normally when retried under the same conditions.
Our Policy on These Cases: Tracking low-frequency, sporadic events requires significant manual effort to isolate specific communications from vast system logs. Furthermore, most of these cases do not stem from persistent issues like system failures within MCE. Consequently, individual investigations are likely to conclude as "Reason Unknown," "Unavoidable Transient External Factor," or "Localized Overload." In many cases, providing a response will take time, and the ultimate guidance will be a recommendation to retry.
Recommended Action: Consider implementing automatic retry logic using algorithms such as Exponential Backoff within your application. This allows for automatic recovery and ensures stable operation with minimal business impact. Rather than tracking individual successes or failures in isolation, we recommend monitoring the trend of error counts—focusing on whether there is a clear upward trend compared to before or if the error content being returned is something never seen before. (This also helps determine if the situation has escalated to Category 2 below).

2. Persistent or Reproducible Events (Cases where investigation is effective)

These are cases where a clear abnormal trend is observed, and there is a high probability of identifying the cause through server-side log analysis or process scrutiny.

Patterns of Occurrence:
- High Frequency / Persistent: The 500 error rate has clearly risen above normal levels and continues without self-healing.
- Full Reproducibility: The error occurs 100% of the time (or with very high probability) with specific operations or parameter combinations.
Our Policy on These Cases: These cases suggest persistent issues such as system faults, resource overload, or logical errors in the API. To identify the cause promptly, please verify the situation and submit a support case along with the "Recommended Information to Provide" mentioned above.

Additional Notes:

The transient and sporadic network errors described in this section are common to all web-based communications, including SFTP as well as APIs. (Refer to the article on SFTP).
Our investigations primarily focus on verifying the status and functionality of the API-related services provided by our company.

FAQ

Q) Are there any recommended retry strategies?

A) While it depends on the use case, we recommend exponential backoff — gradually increasing the interval between retry attempts (e.g., retry after 3 seconds, then 10 seconds, then 60 seconds) — optionally combined with random jitter. If a Retry-After header (indicating the wait time before retrying) is returned with a 429 error, prioritize that value over your own backoff calculation and wait for the specified duration.

Q) I want to confirm information about the limits like request caps and timeouts.

A) Basic limitations for our API are documented in the "API Limits and Guidelines" below. However, even within these limits, temporary restrictions (HTTP 429 Too Many Requests) may be applied if a large number of high-load requests are executed in a short period.

Reference 1: API Limits and Guidelines
Reference 2: Rate Limiting Errors

Q) What is the specific "limit" or threshold that triggers a 429 error?

A) There is no fixed, published limit. The throttling threshold varies dynamically based on the overall system load at any given moment and the specific processing requirements of the request (e.g., database load). Therefore, you may receive a 429 response even if your API traffic volume is at its usual levels.
If 429 errors persist despite implementing Point 4 (Token reuse) and Point 7 (Peak distribution), please contact Support with details regarding the occurrence and detailed logs.

Q) If an error occurs during data registration, updates, automation, or email sends, is it okay to retry as-is?

A) No. Performing a simple retry upon an error can lead to risks such as duplicate sends (sending the same message multiple times) or data inconsistency (unintended updates or sends occurring because the data changed by the time of the retry).

Key Considerations for Retries: If the API client treats a request as an error (e.g., due to a timeout) without waiting for a response from MCE, there is a possibility that the process succeeded on the MCE side. When implementing automatic retries, you must ensure that they do not lead to duplicate sends or executions.

Specific Example:

Transactional Messaging API (Email): In this API, if the client does not specify a messageKey, the server generates one automatically. If the client retries while the server has already successfully accepted the original request, there is a risk of duplicate sends. (If the client does specify the key, retrying with the same key will result in an error as that key cannot be reused within a certain period).

Recommended Measures:

Obtain the messageKey from the send request response.
Use Retrieve status of an email message to confirm the send result.
Execute a retry only if it is confirmed as unsent.

Q: Can you assist with implementing the API client — for example, how to write or configure it in Python or cURL?
A) We are unable to provide official guidance on implementation details specific to programming languages or external tools in support cases. However, if you suspect that an error related to a specific implementation is caused by MCE behavior, please contact us with your isolation findings and the raw HTTP request/response logs captured during the issue.