allBlogsList

How to find duplicate items in Sitecore CMS

Overview

An example PowerShell script to find duplicate items by name under a given root path.

Getting Started

Sometimes, for one reason or another, we end up having to deal with duplicate items in the Sitecore content tree. For example, we had a sync process, syncing a massive number of items from the outside system into item buckets in Sitecore - things went wrong on the sync side and we ended up having thousands of duplicate items on the production Sitecore instance. Not great… Finding and deleting those duplicate items manually can be an option if the number of dupes is small, but what if we have to deal with thousands of them?

I wrote a PowerShell script to deal with the duplicates and we ended up using it a number of times. I'll share it here along with some comments in code, hope this will be useful.

Disclaimer

This script is just an example and needs to be used with extra caution, especially on PROD environments: don't hold me responsible for accidental deletion of data in YOUR instance :)

PowerShell script to find and delete or report duplicate items

# Define selections for the following dialog
$options = @{
    "Delete Duplicates"="delete"
    "Show Report"="report"
}

# Promt user to select a root path and an action to perform on the found duplicate items
$props = @{
    Title = "Find Duplicate Items in Bucket by Item Name"
    Description = "Find duplicate items by name under a given path."
    OkButtonName = "Run"
    CancelButtonName = "Cancel"
    Parameters = @(
        @{ Name = "rootPath"; Title = "Root Path"; Editor = "droptree"; Source = "/sitecore/content" },
        @{ Name = "selectedAction"; Title="What to do with the duplicates"; Options=$options; Tooltip="Choose delete or jsut show duplicates."}
    )
}

# Read user selections
$result = Read-Variable @props
if($result -ne "ok") {
    Close-Window
    Exit
}

$rootPath = "master:" + $rootPath.FullPath

$criteria = @(
    @{ Filter = "Equals"; Field = "_templatename"; Value = "Bucket"; Invert=$true}, 
    @{ Filter = "DescendantOf"; Value = (Get-Item $rootPath) }
)

# Find duplicate items using master index. YOUR index name may be different, so tweak it here if needed
$props = @{
    Index = "sitecore_master_index"
    Criteria = $criteria
}

# Create a collection to store duplicates found
$duplicateItems = [System.Collections.Generic.List[psobject]]::new()
# ...and another collection to keep all items matching the search criteria 
$allItems = Find-Item @props | Sort-Object -Property Name | Select-Object -Property ItemId, Name, Path, Updated
$counter = 0;

# Iterate through all items and find duplicates by comparing each item with all the rest
for ($i = 0; $i -lt $allItems.count; $i++) 
{
    if ($allItems[$i].Name -eq $allItems[$i+1].Name) 
    {
        if($newDupe -eq $False)
        {
            $newDupe = $True
            $counter++
        }
        $duplicateItems.Add([pscustomobject]@{Counter = $counter; ItemId = $allItems[$i].ItemId; Path=$allItems[$i].Path; Updated=$allItems[$i].Updated})
        $duplicateItems.Add([pscustomobject]@{Counter = $counter; ItemId = $allItems[$i+1].ItemId; Path=$allItems[$i+1].Path; Updated=$allItems[$i+1].Updated})
    }
    else
    {
        $newDupe = $False
    }
}

$newDupe = $True;
$duplicateItemsGrouped = [System.Collections.Generic.List[psobject]]::new()
$dupesList = '';
$counter = 0;

# Finally, depending on selected action in the above dialog: delete duplicate items or just show the report
# This is where YOU need to be extra careful – don't hold me responsible for accidental deletion of data in YOUR instance :)
if($selectedAction -eq 'delete') {
    foreach ($item in $duplicateItems | Sort-Object -Property @{Expression = "Counter"; Descending = $False}, @{Expression = "Updated"; Descending = $True} -Unique )
    {
        if($counter -eq $item.counter)
        {
            $dupesList = $dupesList + $item.ItemId.ToString();
            $itemToDelete = Get-Item $item.ItemId;
            Write-Host 'deleting item ' $itemToDelete.ID;
            $itemToDelete | Remove-Item;
        }
        else
        {
            if($dupesList -ne '')
            {
                Write-Host ($item.counter-1) $firstItem $dupesList
                $dupesList = '';
            }
            $counter = $item.counter;
            $firstItem = $item.ItemId.ToString();
            $dupesList = '';
        }
    }
}
else if($selectedAction -eq 'report') {
     $duplicateItems | Sort-Object -Property @{Expression = "Counter"; Descending = $False}, @{Expression = "Updated"; Descending = $True} -Unique | Show-ListView -Property `
        @{ Name = "Counter"; Expression = { $_.Counter } },
        @{ Name = "ID"; Expression = { $_.ItemId } },
        @{ Name = "Path"; Expression = { $_.Path } },
        @{ Name = "Updated"; Expression = { $_.Updated } }
}