Python Data Stream Retrievals

You are here:

Python Data Stream Retrievals

Python scripts are an effective, fast growing and easy to use technology, and they are now supported in Marketing Cloud Intelligence as a Retrieval Method for TotalConnect Data Streams. Although in some cases the same results can be achieved using HTTP requests, Python retrievals are more robust while still being flexible, powerful, and relatively easy to set up.

Setup

Being a data source connection method, the same procedure is followed as when connecting any new data source, which means you’ll first have to make sure you’re in the Connect & Mix tab. Under the Data Stream List section, click Add New.

Scroll down to the Technical Vendors section and start the Data Stream setup process from there.

Alternatively, you can also start the setup process by choosing to create a TotalConnect Data Stream and then setting Python as the Retrieval Method.

What Can It Be Used For?

Python can be useful for retrieving data from different sources in different formats. The fact that Python is a script makes it flexible in dealing with varying formats, requirements, and scenarios. Below are some examples of use cases that are best addressed by using a Python-based connector:

Retrieving data via an API connection where multiple HTTP calls are required in order to handle any of the following cases:
- Obtaining an OAuth 2.0 token.
- Retrieving Entities.
- Looping Through Calls.
- Paginating through the response.
Parsing the data during the retrieval process itself (prior to having it available in the mapping screen) so that it’s easier to work with within Marketing Cloud Intelligence.

Technical Implementation

Execution Requirements

Version: Python 3.9
Processing Limit: Maximum 10 minutes.
Memory Limit: Maximum 1.5 GB
Output: The script must generate a dataset compatible with a Data Stream.

Blocked Libraries

These libraries can’t be used. Attempting to import these results in script failure:

os
subprocess
shutil
pty
sys
importlib
ctypes
socket
pickle
marshal
code
runpy
multiprocessing
signal
commands
pdb
webbrowser
antigravity
turtle
tempfile
pathlib
glob
fcntl
resource
posix
_thread
threading

Blocked Functions & Patterns

These built-in functions and access patterns are restricted:

Functions: exec(), eval(), and compile()

Patterns: importlib.import_module() __import__() (direct use) and __builtins__

We recommend that you develop your script and test it externally (in your own Python environment) so that it’s easier for you to validate and debug. You can use the publicly available Marketing Cloud Intelligence Package to build your script and test it. Use the “pip install” command to install the Marketing Cloud Intelligence Package on your local environment. Note that using ‘save_csv’ and ‘save’ in your local environment will print the results instead of saving them to Marketing Cloud Intelligence. Do the following:

Start by Importing the relevant Packages and include the Marketing Cloud Intelligence package.
Build the logic as per your requirements using the Supported Packages.
Use the Marketing Cloud Intelligence Package to programmatically write rows to a csv file, which will be uploaded to the Data Stream (see below a table with a summary of the functions supported in our Package).

Supported Package Functions
Function	Input	Description
`add_row()`	list(array) of the values in a specific row	Use this function to write a single row of data to a csv file.
`add_rows()`	list of several rows' values	Use this function to write multiple rows to a csv file.
`save()`		Use this to end the script and save the final csv file to Marketing Cloud Intelligence. File is uploaded to your data stream.
`save_csv()`	a csv formatted string of the full input file	Use instead of the above functions in case you have a csv-formatted string that consists of all of the data you want to write to Marketing Cloud Intelligence.
`log()`	Text	For debugging purposes you can use this function to print text to the Data Stream log.

There are two different ways to write Python scripts:

Internal - The Python module is read via Marketing Cloud Intelligence data streams.
External - The script is run via a local IDE.

The examples below show how to write internal scripts. If you are writing external scripts, you must write the scripts as follows:

Call the datorama module via “from datorama.datorama import *” instead of “import datorama”
Remove “dataorama.” from the beginning of each function.

Here you can see a live example of how this looks:

from datorama.datorama import *
 
#Generating a list of values for our headers' row
row_values_list = ['date','campaign','impression','clicks']
 
# Writing a new row to the csv input file from values list
add_row(row_values_list)

Example 1 - using add_row() and save():

import datorama 
#Generating a list of values for our headers' row 
row_values_list = ['date','campaign','impression','clicks'] 
# Writing a new row to the csv input file from values list 
datorama.add_row(row_values_list) 
#Generating two more lists containing row values 
#Note that the values of a specific column are in the same position in their row's list as the position of their header in the headers' row 
row2 = ['01-12-2017','Campaign 1',15,6] row3 = ['02-12-2017','Campaign 2',20,10] 
# Creating a list of the two rows we want to write at once 
multiple_rows = [row2,row3] 
# Writing several rows at once to the csv by sending a list of multiple values' lists 
datorama.add_rows(multiple_rows) 
# Saving the csv file to upload it to our data stream 
datorama.save()

Example 2 - using save_csv():

import datorama 
#Generate the exact same file by providing a csv formatted string of the file's content 
csv_string = '''"date","campaign","impression","clicks"\n\ "01-12-2017","Campaign 1", 15, 6\n\ "02-12-2017","Campaign 2", 20, 10'''
#Saving the csv file, based on the input string 
datorama.save_csv(csv_string)

Both scripts will produce the following csv file:

Date	Campaign	Impressions	Clicks
01-12-2017	Campaign 1	15	6
02-12-2017	Campaign 2	20	10

Supported Python Packages

The Python Retrieval Method supports a limited list of Python Packages, which are commonly used for handling data retrieval. Importing and using a non-supported Package will result in in script failure. If your needs cannot be met by using one of the existing Supported Packages, and can only be met by using a Package that is not currently supported, please log a Feature Request with your CSM so that the requested package can be considered to be added to the Supported Packages list in the future. The Supported Packages are those supported by ‘AWS Lambda’ as well as the ones shown below:

Requests
re
ftplib
Pandas
oauthlib
cloudstorage
boto3
requests_oauthlib
webapp2
boto
Numpy
logging
json
tweepy
datetime
oauth2client
googleapiclient
apiclient
httplib2
Googleads
pyasn1
Facebookads
tabula-py (the syntax to import this package is: 'import tabula')
xlrd
mlxtend
Selenium
simple_salesforce
scrapy
time
base64
crypt

Once you’ve validated your script, click ‘Next’ to retrieve a file, and to proceed to the Mapping screen where you map the file the same way you would map any other TotalConnect Data Stream.

Examples

This script retrieves data about Demographic Statistics, By Zip Code, published by data.cityofnewyork.us

import datorama import urllib2
     import jsonurl = "https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.json?accessType=DOWNLOAD"
     req = urllib2.Request(url)
     response = urllib2.urlopen(req)
     response_str = response.read().decode('utf-8')
     response_json = json.loads(response_str)
     columns_js = response_json["meta"]["view"]["columns"]
     rows_list = response_json["data"]
     headers = []
     for column in columns_js: headers.append(column["name"])
     datorama.add_row(headers)
     datorama.add_rows(rows_list)
     datorama.save()

This script retrieves publicly available data from Twitter for different Screen Names, with Metrics such as Follows, Comments, etc. Using it requires having a valid App registered with Twitter:

import datoramaimport tweepyimport datetimeimport jsonimport 
     
     pandas as pdscreen_names = ["NASCAR", "F1", "PGATOUR", "MotoGP", "IndyCar", "ufc", "WWE"]
     report_fields = ["screen_name", "statuses_count","followers_count","friends_count","listed_count","favourites_count" ]
     #Twitter API 
     credentialsconsumer_key = ""consumer_secret = ""access_key = ""access_secret = "" 
     #Calculating the date of the run
     now = datetime.datetime.now()
     date = now.strftime("%d-%m-%Y")
     #authorize twitter, initialize 
     tweepyauth = tweepy.OAuthHandler(consumer_key, consumer_secret)
     auth.set_access_token(access_key, access_secret)
     api = tweepy.API(auth)
     full_report = pd.DataFrame()
     #Pulling the report from Twitter per screen namefor screen_name in screen_names: 
     get_user = api.get_user(screen_name = screen_name) 
     #Using pandas DataFrame to manipulate the report 
     df = pd.DataFrame(get_user._json) 
     #Filtering the report only on requested fields 
     requested_df = df[report_fields] 
     #Adding the screen name to the data set 
     requested_df["account"] = screen_name 
     #Appending the current screen name's data set to the full report of all screen names 
     full_report =full_report.append(requested_df, ignore_index=True)
     #Adding the run date in the "date" column of the data 
     setfull_report["date"] = date
     #Creating a csv string of the data 
     setcsv_string = full_report.to_csv()
     #Saving the csv 
     datorama.save_csv(csv_string)

Did this article solve your issue?

Let us know so we can improve!

Python Data Stream Retrievals

Setup

What Can It Be Used For?

Technical Implementation

Blocked Functions & Patterns

Example 1 - using add_row() and save():

Example 2 - using save_csv():

Supported Python Packages

Examples

General Information

Required Cookies

Functional Cookies

Advertising Cookies

General Information

Required Cookies

Functional Cookies

Advertising Cookies

Cookie List

Product Area

Feature Impact

Edition

Experience

Python Data Stream Retrievals

Setup

What Can It Be Used For?

Technical Implementation

Blocked Functions & Patterns

Example 1 - using add_row() and save():

Example 2 - using save_csv():

Supported Python Packages

Examples