You are here:
Python Data Stream Retrievals
Python scripts are an effective, fast growing and easy to use technology, and they are now supported in Marketing Cloud Intelligence as a Retrieval Method for TotalConnect Data Streams. Although in some cases the same results can be achieved using HTTP requests, Python retrievals are more robust while still being flexible, powerful, and relatively easy to set up.
Setup
Being a data source connection method, the same procedure is followed as when connecting any new data source, which means you’ll first have to make sure you’re in the Connect & Mix tab. Under the Data Stream List section, click Add New.
Scroll down to the Technical Vendors section and start the Data Stream setup process from there.
Alternatively, you can also start the setup process by choosing to create a TotalConnect Data Stream and then setting Python as the Retrieval Method.
What Can It Be Used For?
Python can be useful for retrieving data from different sources in different formats. The fact that Python is a script makes it flexible in dealing with varying formats, requirements, and scenarios. Below are some examples of use cases that are best addressed by using a Python-based connector:
- Retrieving data via an API connection where multiple HTTP calls are required in order to
handle any of the following cases:
- Obtaining an OAuth 2.0 token.
- Retrieving Entities.
- Looping Through Calls.
- Paginating through the response.
- Parsing the data during the retrieval process itself (prior to having it available in the mapping screen) so that it’s easier to work with within Marketing Cloud Intelligence.
Technical Implementation
Execution Requirements
- Version: Python 3.9
- Processing Limit: Maximum 10 minutes.
- Memory Limit: Maximum 1.5 GB
- Output: The script must generate a dataset compatible with a Data Stream.
Blocked Libraries
These libraries can’t be used. Attempting to import these results in script failure:
ossubprocessshutilptysysimportlibctypessocketpicklemarshalcoderunpymultiprocessingsignalcommandspdbwebbrowserantigravityturtletempfilepathlibglobfcntlresourceposix- _
thread threading
Blocked Functions & Patterns
These built-in functions and access patterns are restricted:
Functions: exec(), eval(),
and compile()
Patterns: importlib.import_module()
__import__() (direct use) and __builtins__
We recommend that you develop your script and test it externally (in your own Python environment) so that it’s easier for you to validate and debug. You can use the publicly available Marketing Cloud Intelligence Package to build your script and test it. Use the “pip install” command to install the Marketing Cloud Intelligence Package on your local environment. Note that using ‘save_csv’ and ‘save’ in your local environment will print the results instead of saving them to Marketing Cloud Intelligence. Do the following:
- Start by Importing the relevant Packages and include the Marketing Cloud Intelligence package.
- Build the logic as per your requirements using the Supported Packages.
- Use the Marketing Cloud Intelligence Package to programmatically write rows to a csv file, which will be uploaded to the Data Stream (see below a table with a summary of the functions supported in our Package).
| Function | Input | Description |
|---|---|---|
add_row()
|
list(array) of the values in a specific row | Use this function to write a single row of data to a csv file. |
add_rows()
|
list of several rows' values | Use this function to write multiple rows to a csv file. |
save()
|
Use this to end the script and save the final csv file to Marketing Cloud Intelligence. File is uploaded to your data stream. | |
save_csv()
|
a csv formatted string of the full input file | Use instead of the above functions in case you have a csv-formatted string that consists of all of the data you want to write to Marketing Cloud Intelligence. |
log()
|
Text | For debugging purposes you can use this function to print text to the Data Stream log. |
There are two different ways to write Python scripts:
- Internal - The Python module is read via Marketing Cloud Intelligence data streams.
- External - The script is run via a local IDE.
The examples below show how to write internal scripts. If you are writing external scripts, you must write the scripts as follows:
- Call the datorama module via “from datorama.datorama import *” instead of “import datorama”
- Remove “dataorama.” from the beginning of each function.
Here you can see a live example of how this looks:
from datorama.datorama import *
#Generating a list of values for our headers' row
row_values_list = ['date','campaign','impression','clicks']
# Writing a new row to the csv input file from values list
add_row(row_values_list)
Example 1 - using add_row() and save():
import datorama
#Generating a list of values for our headers' row
row_values_list = ['date','campaign','impression','clicks']
# Writing a new row to the csv input file from values list
datorama.add_row(row_values_list)
#Generating two more lists containing row values
#Note that the values of a specific column are in the same position in their row's list as the position of their header in the headers' row
row2 = ['01-12-2017','Campaign 1',15,6] row3 = ['02-12-2017','Campaign 2',20,10]
# Creating a list of the two rows we want to write at once
multiple_rows = [row2,row3]
# Writing several rows at once to the csv by sending a list of multiple values' lists
datorama.add_rows(multiple_rows)
# Saving the csv file to upload it to our data stream
datorama.save()
Example 2 - using save_csv():
import datorama
#Generate the exact same file by providing a csv formatted string of the file's content
csv_string = '''"date","campaign","impression","clicks"\n\ "01-12-2017","Campaign 1", 15, 6\n\ "02-12-2017","Campaign 2", 20, 10'''
#Saving the csv file, based on the input string
datorama.save_csv(csv_string)
Both scripts will produce the following csv file:
| Date | Campaign | Impressions | Clicks |
|---|---|---|---|
| 01-12-2017 | Campaign 1 | 15 | 6 |
| 02-12-2017 | Campaign 2 | 20 | 10 |
Supported Python Packages
The Python Retrieval Method supports a limited list of Python Packages, which are commonly used for handling data retrieval. Importing and using a non-supported Package will result in in script failure. If your needs cannot be met by using one of the existing Supported Packages, and can only be met by using a Package that is not currently supported, please log a Feature Request with your CSM so that the requested package can be considered to be added to the Supported Packages list in the future. The Supported Packages are those supported by ‘AWS Lambda’ as well as the ones shown below:
- Requests
- re
- ftplib
- Pandas
- oauthlib
- cloudstorage
- boto3
- requests_oauthlib
- webapp2
- boto
- Numpy
- logging
- json
- tweepy
- datetime
- oauth2client
- googleapiclient
- apiclient
- httplib2
- Googleads
- pyasn1
- Facebookads
- tabula-py (the syntax to import this package is: 'import tabula')
- xlrd
- mlxtend
- Selenium
- simple_salesforce
- scrapy
- time
- base64
- crypt
Once you’ve validated your script, click ‘Next’ to retrieve a file, and to proceed to the Mapping screen where you map the file the same way you would map any other TotalConnect Data Stream.
Examples
This script retrieves data about Demographic Statistics, By Zip Code, published by data.cityofnewyork.us
import datorama import urllib2
import jsonurl = "https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.json?accessType=DOWNLOAD"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
response_str = response.read().decode('utf-8')
response_json = json.loads(response_str)
columns_js = response_json["meta"]["view"]["columns"]
rows_list = response_json["data"]
headers = []
for column in columns_js: headers.append(column["name"])
datorama.add_row(headers)
datorama.add_rows(rows_list)
datorama.save()This script retrieves publicly available data from Twitter for different Screen Names, with Metrics such as Follows, Comments, etc. Using it requires having a valid App registered with Twitter:
import datoramaimport tweepyimport datetimeimport jsonimport
pandas as pdscreen_names = ["NASCAR", "F1", "PGATOUR", "MotoGP", "IndyCar", "ufc", "WWE"]
report_fields = ["screen_name", "statuses_count","followers_count","friends_count","listed_count","favourites_count" ]
#Twitter API
credentialsconsumer_key = ""consumer_secret = ""access_key = ""access_secret = ""
#Calculating the date of the run
now = datetime.datetime.now()
date = now.strftime("%d-%m-%Y")
#authorize twitter, initialize
tweepyauth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
full_report = pd.DataFrame()
#Pulling the report from Twitter per screen namefor screen_name in screen_names:
get_user = api.get_user(screen_name = screen_name)
#Using pandas DataFrame to manipulate the report
df = pd.DataFrame(get_user._json)
#Filtering the report only on requested fields
requested_df = df[report_fields]
#Adding the screen name to the data set
requested_df["account"] = screen_name
#Appending the current screen name's data set to the full report of all screen names
full_report =full_report.append(requested_df, ignore_index=True)
#Adding the run date in the "date" column of the data
setfull_report["date"] = date
#Creating a csv string of the data
setcsv_string = full_report.to_csv()
#Saving the csv
datorama.save_csv(csv_string)
