This post will walk you through how to use the Marketo merge leads REST API endpoint to merge leads in bulk with Python code that will allow you to set custom rules for how you want to prioritize conflicting values for the same fields on different leads.
If it is your first time using the Marketo REST API or you need a quick refresher then check out the Marketo API Quick-Start Guide to see how to make your first Marketo REST API requests in Postman before transitioning to making requests in code or in the Zapier automation tool.
You can find the main Python script and all the supporting functions linked in the paragraphs below in the Bulk Merge Github repository.
Getting Access Token
Marketo Merge Leads
The mergeLead function makes a post request to the Marketo merge leads endpoint and takes in the winning lead id, an array of losing lead ids, and a boolean value to indicate whether or not the leads should be merged in your CRM tool e.g. Salesforce.
When the “mergeInCRM” parameter is set to false the winning lead and all losing lead ids can be merged in a single call. However, when this parameter is set to true Marketo is restricted to merging only 2 leads at a time so if there is a triple duplicate i.e. three leads with the same email address, then the winner must first be merged with one of the losers and once this is complete the new resultant merged lead must then be merged with the other loser. Therefore, to achieve this functionality a for loop is used to iterate through each of the loser ids and successively merge them with the winning id.
Rules for Winning Field Values
When manually merging leads in Marketo with discrepancies between the lead field values, the interface allows you to choose which values you want to win and therefore keep on the resultant merged lead. The Priority.py script does the exact same thing, using custom rules specified by you to determine which values are prioritized for lead fields in a merge.
To do this it uses the following functions to compare the disparate values of the leads for the same field.
Get First Created Date
createdAt function returns the earliest created at date and the list index at which it occurs
Get Maximum Lead Score
leadScore function returns the maximum score and the list index at which it occurs
Get First Non-Null Value
notNull function returns the first non-null value to appear in the list, excluding null-like values such as “empty”, “unknown”, “n/a”, and ‘none’. If no such values exist this function returns the value and index of the first list element.
Define Field Value Rankings
priority function makes use of a dictionary, which contains a list for each field where the list is sorted from values of highest priority to lowest priority e.g.
'leadSource': ["Advertising", "Paid Search", "Organic"].
The list of prioritized values is then iterated through starting from the highest priority value e.g. “Advertising”, and each value from the prioritized list is compared to each lead value. If either of the lead values match the prioritized value then the value and index of this lead value are returned.
Else the for loop continues with the next highest priority value to see if this is present in the lead values. In this way, values for a lead field can be prioritized according to a custom user-specified order so that these values will be retained on the merged lead.
Get First False Value
boolTest function was built for the “Unsubscribed” and “Account Blocked” boolean fields where the priority was to retain a TRUE value for either of these fields if they existed e.g. you do not want to email the merged lead if one of the constituent leads was unsubscribed. Therefore, this function returns the value and index of the first TRUE value in the list of lead values.
Call Appropriate Prioritization Function
ruler function gets called from the main program within the Priority.py script and then takes the list of lead values given for the field being evaluated and passes them into the rules dictionary. The rules dictionary maps a field to the function (one of the functions described above) that should be called to get the highest priority value for that field from the lead values that were passed in.
The appendDict function does what it says on the tin. It takes an input dictionary and for each value this dictionary has for a key it will append each value to the value(s) already in the master dictionary for the same key.
Updating Winning Lead Values
The createUpdateLead function makes a post request to the create/update lead endpoint to update a lead with the winning field values after the merge is complete. The winning lead id and winning field values are passed in JSON format to the payload of the request to make the update.
Although this endpoint allows for updating 300 leads at once in a single call, it was found to be safer to update each winning lead with the winning field values straight after the merge was complete in case the script were to fail before 300 leads were merged meaning that none of the winning leads would have been updated.
Marketo Merge Leads Main Script
Getting Lead Data & Access Token
Before running the main merging script, go to the “Possible Duplicates” system smart list in Marketo, sort by email address (the script assumes that the input data is sorted by email), and then export all the duplicates from a view that contains all the fields you are interested in setting up priority rules for when merging.
N.B. This script was originally configured to pull lead data using the API, which returns createdAt date in ISO 8601 (2016-11-03T19:17:57Z) standard and boolean values as either “TRUE” or “FALSE”. When exporting from Marketo as a CSV the date needs to be converted to ISO 8601 standard and the 1’s or 0’s representing boolean values need to be converted to “TRUE” or “FALSE”. Importing the CSV into a sheet tool will get this done quickly or you can modify the
boolTest functions in the priority script to work with the default exported values. To save on API calls and make the script faster to run it is recommended to pull in the lead information via CSV rather than relying on the API to pull in lead information from lead ids.
Once the data has been extracted from Marketo it can then be imported into the main script as a dictionary using the Pandas library in Python. Before looping through this dictionary of duplicates, it is necessary to get an access token from Marketo and ensure that there are more than 60secs left on the token life so that there is enough time to complete the merge and update the winning lead in a single iteration of the loop. This is achieved using nested while loops with the inner while loop checking that token life is greater than 60 secs before every run and if this is not true the outer while loop will wait until the access token has expired and then get a new access token with 3600secs of life.
Getting, Storing, & Prioritizing Lead Values
Since the input dictionary is sorted by email address the field values from successive rows with the same email are pulled into the
field_dict dictionary which stores the lead values using the field names as the keys. Once this is done the comparison of the lead values for each field is achieved using a for loop to iterate through each one of the field keys in
If the values for a particular field match then this value is put in the
final_dict dictionary for that field, otherwise the field values are evaluated using the custom rules declared in the Priority.py script using the
ruler function (see the “Rules for Winning Field Values” section above)
The first non-null
sfdcLeadId value is stored in
final_dict and the corresponding index can be used to pull in the
'mcUserId__c" values from the same lead who has this
sfdcLeadId value and who will be the winner of the merge. If none of the leads exist in Salesforce then the
createdAt values of the lead who was created first are stored in
"Lead_Source_Detail__c" and the three
"utm_xxx__c" values are obtained from the lead who has the highest priority
Marketo Merge Leads & Update Winner
Once all the prioritized values have been stored in
final_dict, they are logged in a timestamped text file along with the
field_dict values before the script proceeds to merge the two leads by making a request to the Marketo merge leads endpoint.
The response of the merge is logged and then if the merge is successful the newly merged lead will be updated with the prioritized values for the fields stored in
final_dict. Sometimes even though the script logic selects a winning id (see section above) Marketo has an overriding rule that if a lead and contact are being merged in SFDC then the contact id will always win.
Therefore in some cases, the supposed “winning” lead id specified in
final_dict will be the losing id if this “winning” id was a lead and the other id corresponded to a contact. If this is the case
"status":"skipped" will be returned in the response meaning it is necessary to update one of the other lead ids stored in
field_dict with the prioritized values using a while loop to iterate over the remaining ids until the actual winner is found and updated.
You can avoid the need for this while loop and ensure that the winner selected in your Python logic always matches that of Marketo’s inbuilt logic. This can be done by modifying the main merging script to pull in the
sfdcType field to ensure that an SFDC contact will always be selected as the winner if the other people to be merged are SFDC leads.
N.B. Out of the 13,000 leads I merged with this script there were a stubborn 1000 leads who would not be merged because of a vague 611 error returned from SFDC and a lot of these just had to be merged manually. If you are encountering this error and have too many leads to merge manually I can give some tips to try that showed some success in merging leads with a previous 611 error. Just leave a comment below 🙂
Rinse & Repeat
After merging, a 0.2-second delay is implemented so that Marketo’s REST API rate limit of 100 calls in 20 seconds is not exceeded. Finally, the remaining time on the token is calculated at the end of the inner while loop and if this time is greater than 60secs and there are still more leads to merge then the next iteration of this while loop will proceed.
If the remaining time is below 60secs, the script breaks out into the outer while loop and then waits for this time to elapse before requesting a new access token and diving back into the inner while loop. This process will be repeated until all the leads in the CSV file have been merged.
Take a look at the merging script in action by viewing a sample output file from the merging script.
Marketo Merge Leads Automatically Upon Creation
Great! You are now able to clean up that HUGE pile of duplicates that has been eyeballing you for the past year by taking this GitHub code and making it your own.
In the scenario, where you are not able to stop the cause of Marketo duplicates or you are in the process or working on a fix and you do not want these duplicates to build up then you can merge leads automatically upon creation using Zapier.
Take a look at the Marketo Merge Automatically with Zapier post to see how you can use the Marketo merge leads endpoint that you know and love so well to merge leads as soon as they are created in Marketo.