Sitecore “Batch” Connector, part 2: Import Content into Sitecore XM/XP

Introduction

This is the 2nd post of my “Sitecore Batch Import” series, describing the “Batch Importer” part of the entire solution, which imports previously downloaded content in JSON format into Sitecore XM/XP.

Please refer to the Sitecore “Batch” Connector, part 1: Download Content from Content Hub for an overview of the challenge and solution overall.

All posts in this Sitecore “Batch” Connector series

Import Approach

I’m using Sitecore PowerShell to read, parse, and import JSON content into Sitecore XM/XP CMS. The scheduled PowerShell task serves as a trigger, which wakes up every XX minutes and runs the script, described below.

The PowerShell script, described below is performing the following steps:

  • Configuration Mapping: It reads the mapping configurations from specific items within the Sitecore content tree, located at /sitecore/system/Modules/CMP/Config. These configurations include all necessary connection strings and mapping details. For a more comprehensive understanding, please refer to this Sitecore documentation article.
  • Entity Mapping and Processing: For every mapped entity, the script locates new or unprocessed JSON files. It processes each file by invoking a PowerShell function, passing along the entity name and the file’s path as parameters.
  • Item Creation and Update: Inside the processing function, it reads the file contents as an array of objects, proceeding to create or update the corresponding Sitecore items.

Ensuring Sitecore ID Idempotency

When creating new items in the Sitecore content tree, the script enforces the use of item IDs provided in the source file. This contrasts with Sitecore’s default behavior of generating unique GUIDs for each new item. This ensures consistency, as the same item created multiple times retains the same Sitecore ID.

This method also streamlines the creation of relational fields in Sitecore (such as lists, droplists, and multilists), which represent related entities in Content Hub, negating the need for Sitecore index lookups. This not only enhances performance during synchronization but also eliminates index-related errors observed during high-volume item imports through Sitecore Connect for Content Hub.

Configuring the Scheduled PowerShell Job

The Sitecore PowerShell job is set to execute the CH-ImportFileBatch.ps1 script every XX minutes. You can find this script in the provided GitHub repository.

Detailed Breakdown of PowerShell Scripts

CH-ImportFileBatch.ps1 Script

I’m adding some extra comments, to explain what this script is doing and posting it here as is.

# /sitecore/system/Modules/PowerShell/Script Library/CMP/Web API/CmpRun

Import-Function -Name CH-ImportFileRun
$startTimestamp = Get-Date
Write-Output "Started run at", $startTimestamp.ToString()
$configItems = New-Object 'System.Collections.Generic.Dictionary[[string],[string]]'
# CMP Mapping configuration. Refer to Sitecore documentation for more details: https://doc.sitecore.com/xp/en/developers/connect-for-ch/51/connect-for-content-hub/cmp-items-in-the-content-editor.html
$mappingConfigRoot = Get-Item -Path "master:/sitecore/system/Modules/CMP/Config"
# Read mapped entity names from the mapping configuration 
Get-ChildItem -Path "master:/sitecore/system/Modules/CMP/Config" -Recurse | Where-Object { $_.TemplateName -eq "Entity Mapping" } | ForEach-Object {
    $entityDefinition = $_["EntityTypeSchema"]
    $configItems.Add( $_.Name , $entityDefinition)
}
# Incoming folder is where all new/unprocessed JSON files are. The download process places new files here
$incomingFolderPath = [Sitecore.MainUtil]::MapPath("/App_Data/ContentHubData/Incoming") 
# Processed folder is where data files are moves to after processing
$processedFolderPath = [Sitecore.MainUtil]::MapPath("/App_Data/ContentHubData/Processed") 

$totalFileCounter = 0
$batchStart = Get-Date
$message = "CMP: Starting batch. Started at: {0}" -f $batchStart
Write-Output $message
Write-Log $message -Log Info

if ($utcTime.Hour -ge 1 -and $utcTime.Hour -lt 4) {
    $message = "CMP: making exception to CMP processing run at {0} UTC" -f $utcTime
    Write-Output $message
    Write-Log $message -Log Info
} else {
    try {
        foreach ($key in $configItems.Keys) {
            $runStart = Get-Date
            $message = "CMP: Started processing run: {0}. Started at {1}" -f $key, $runStart
            Write-Output message
            Write-Log $message -Log Info

            # Lookup all files, containing the entity name in the filename
            $configItemsPath = "master:/sitecore/system/Modules/CMP/Config/{0}" -f $key
            $filePattern = "{0}_*" -f $configItems[$key]
            $files = Get-ChildItem -Path $incomingFolderPath -Filter $filePattern | Sort-Object -Property Name

            # process all found files
            foreach ($file in $files) {
                $fileRunStart = Get-Date
                $incomingFilePath = "{0}\{1}" -f $incomingFolderPath, $file
                $message = "CMP: Started processing of incoming file. Mapping config item: {0}, File Path: {1}. Started at {2}" -f $configItemsPath, $incomingFilePath, $fileRunStart
                Write-Output $message
                Write-Log $message -Log Info
                
                CH-ImportFileRun $configItemsPath $incomingFilePath $false
                
                $totalFileCounter++
                $processedFilePath = "{0}\{1}" -f $processedFolderPath, $file
                Write-Host "file paths: ", $incomingFilePath, $processedFilePath
                Move-Item -Path $incomingFilePath -Destination $processedFilePath -Force
                
                # Delete all previously processed files of this kind
                $excludeFromDeleteFilter = $file.ToString()
                Write-Host "excludeFromDeleteFilter: ", $excludeFromDeleteFilter
                Get-ChildItem -Path $processedFolderPath -Filter $filePattern | Where-Object { $_.Name -notlike $excludeFromDeleteFilter } | Remove-Item
                
                $fileRunEnd = Get-Date
                $fileRunRuntime = $fileRunEnd - $fileRunStart
                $message = "CMP: Finished processing of incoming file. Mapping config item: {0}, File Path: {1}. Started at; {2}. Finished at: {3}. Total seconds: {4}" -f $configItemsPath, $incomingFilePath, $fileRunStart, $fileRunEnd, $fileRunRuntime.TotalSeconds
                Write-Output $message
                Write-Log $message -Log Info
            }
            
            $runEnd = Get-Date
            $runRuntime = $runEnd - $runStart
            $message = "CMP: Finished processing run: {0}. Started at: {1}. Finished At: {2}. Total seconds: {3}" -f $key, $runStart, $runEnd, $runRuntime.TotalSeconds        
            Write-Output $message
            Write-Log $message -Log Info
            
        }
        $batchEnd = Get-Date
        $batchRuntime = $batchEnd - $batchStart
        $message = "CMP: Finished batch. Started at: {0}. Finished at: {1} Total seconds: {2}. Files processed: {3}." -f $batchStart, $batchEnd, $batchRuntime.TotalSeconds, $totalFileCounter
        Write-Output $message
        Write-Log $message -Log Info
    }
    catch {
        $message = "CMP: Error running the batch. {0}" -f $_
        Write-Log $message -Log Error
    }
}

CH-ImportFileRun.ps1

This PowerShell script contains the function called from the above script when each content JSON file is processed. The whole file is a bit too large for this post, so please refer to this public GitHub repo for the entire file. I added comments to provide some more clarity.

In a nutshell, here’s what this script is doing:

  1. The function accepts three parameters:
    • $mappingConfigItemPath: A string specifying the path to a configuration item in Sitecore.
    • $jsonFilePath: A string specifying the path to a JSON file containing content data to be imported.
    • $skipExisting: A boolean flag indicating whether to skip importing content if it already exists in Sitecore.
  2. Read the mapping configuration
    1. The function defines two custom classes: MappedField and EntityMappingConfig. These classes are used to structure and hold information related to mapping fields between the JSON data and Sitecore.
    2. Retrieve information from the configuration item, such as the entity type schema, bucket, template, and item name property.
    3. Retrieve the template item from Sitecore based on the specified template ID and collect information about its fields.
    4. Initialize a dictionary called $mappedFieldsList to store information about mapped fields.
  3. Read the content of the JSON file specified by $jsonFilePath and parse it as JSON data.
  4. Use a bulk update context within Sitecore to improve performance by disabling certain events.
  5. Iterate through the JSON data, processing each object and create/update Sitecore items from it:
    • Check if an item exists and if its fields are the same
    • If there are changes, it updates the Sitecore item with the new data.
    • If the Sitecore item doesn't exist, it creates a new one using the specified template and assigns properties based on the JSON data.
  6. Log any errors or exceptions that occur during the import process.
  7. After processing all JSON objects, record the end timestamp, calculate the total runtime, and log a message indicating the completion of the import process.
  8. Finally, handle any exceptions that may occur during the entire import process and log error messages accordingly.