Extending Sitecore Search API Crawlers for Multi-Language Support

Creating a Sitecore Search API Crawler for Multi-Locale Content

Today, I’ll walk through how I created a Sitecore Search API crawler designed to handle multi-locale content, along with a request extractor to dynamically handle locales.

The example in the Sitecore Search documentation is great for setting up an API crawler with a trigger, request extractor, and document extractor. However, it doesn’t cover how to set up a localized API crawler that can handle multiple locales.

To achieve multi-locale support, I followed the documentation with a few additional steps. Here’s the breakdown:

Step 1: Create Triggers for Each Locale

I created multiple triggers, one per locale, by passing the language in the GraphQL query used to fetch data from Sitecore CMS.

Trigger for English (en):

{
"query":"query getRegions($language: String!,$path: String) {item(language: $language, path: $path) { id   name    children(first: 1000, includeTemplateIDs: \"{0DDC216D-7E20-46FE-929F-76B42444A239}\") {      results {   id     name      }    }  }}",
  "variables": {
    "language": "en",
	"path":"/sitecore/content/<Headless Tenant>/<Headless Site>/Home/test"
  }
}

Trigger for Arabic (ar):

{
"query":"query getRegions($language: String!,$path: String) {item(language: $language, path: $path) { id   name    children(first: 1000, includeTemplateIDs: \"{0DDC216D-7E20-46FE-929F-76B42444A239}\") {      results {   id     name      }    }  }}",
  "variables": {
    "language": "ar",
	"path":"/sitecore/content/<Headless Tenant>/<Headless Site>/Home/test"
  }
}

Each trigger simply changes the "language" parameter (e.g., "en" or "ar").

Step 2: Create a Request Extractor (Optional)

In my case, I needed a request extractor to dynamically pass the locale to the GraphQL query. I extracted the language value from the request body like this:

 "variables": {
                 "language": JSON.parse(request.body).variables.language,
		  "path":path
  }

This allows the crawler to reuse the logic regardless of the locale used in the trigger.

Step 3: Configure Available Locales

In Sitecore Search, ensure that all the supported locales are added under Available Locales in the source configuration. This step enables locale-specific indexing.

Step 4: Configure a JavaScript Locale Extractor

To properly map the language values to locale codes, I used a JavaScript locale extractor.

I extracted the language value from the request body

function extract(request, response) {
    const locales = ['ko','ms','ar','de','pt','ru','es','fr','he','id','it','nl','zh'];
    const language = JSON.parse(request.body).variables.language;

    switch (language) {
        case 'ko': return "ko-kr";
        case 'ms': return "ms-my";
        case 'ar': return "ar-sa";
        case 'de': return "nl-de";
        case 'pt': return "pt-br";
        case 'ru': return "ru-ru";
        case 'es': return "eo-es";
        case 'fr': return "fr-fr";
        case 'he': return "he-il";
        case 'id': return "id-id";
        case 'it': return "it-it";
        case 'nl': return "nl-nl";
        case 'zh': return "zh-cn";
        default: return "en-us";
    }
}

Note: Adjust mappings according to your localization standards.

Step 5: Create a Localized Document Extractor

In the document extractor configuration, I enabled the “Localized” toggle. This ensures that documents are indexed based on the extracted locale.

With these steps, I successfully created a Sitecore Search API crawler capable of indexing multi-locale content efficiently.

Hope this helps someone working on a similar requirement!

Leave a comment