Search for Hugo & Docusaurus – Analysis, Recommendation & Implementation Guide

Status: 16.09.2025 (Europe/Berlin)

Short Summary (for the impatient)

Recommendation: Use Typesense (self‑hosted or Typesense Cloud with EU region, e.g. Frankfurt) + DocSearch Scraper (Typesense Fork) as crawler.

Unified tooling for Hugo and Docusaurus.
GDPR/EU residency possible (self‑hosting or Cloud region EU).
Frontends:
- Docusaurus: docusaurus-theme-search-typesense
- Hugo: typesense-docsearch.js or typesense-instantsearch-adapter

Why not Algolia DocSearch (free program)?

Data residency: global replication; no guarantee for EU‑only.
Free only for public open‑source docs; Hugo site may not be eligible.
No self‑hosting; tooling would not be fully identical.

Architecture at a Glance

Crawler: DocSearch Scraper (Typesense Fork) crawls publicly available pages and generates records.
Index: Records are written to Typesense (self‑hosting or Cloud in EU).
Frontend:
- Docusaurus: Theme plugin uses the Typesense index.
- Hugo: Easy integration via typesense-docsearch.js (modal like Algolia) or via InstantSearch widgets with typesense-instantsearch-adapter.

Advantage: Same backend and same indexer for both sites, separate collections possible (e.g. site_hugo, site_docs).

Setup Variants

Docker Compose (1 node to start, later HA possible)
Own domain, TLS (reverse proxy like Caddy/Traefik/NGINX)
Search‑only API key for frontend

Example: docker-compose.yml

version: '3.8'
services:
  typesense:
    image: typesense/typesense:latest
    command: ["--data-dir=/data", "--api-key=${TYPESENSE_API_KEY}", "--enable-cors"]
    ports: ["8108:8108"]
    volumes:
      - ./data:/data
    environment:
      - TZ=Europe/Berlin

B) Typesense Cloud (easy, EU region)

Select region at cluster creation EU (Frankfurt, Zurich, Paris, …).
Generate API keys in the dashboard (Admin, Search‑only).

In both cases: In the frontend use only the Search‑Only Key.

Indexing with DocSearch Scraper (Typesense Fork)

1) Requirements

Live URL or locally reachable via host.docker.internal
Sitemap improves coverage & speed
- Hugo: /sitemap.xml is generated by default (set BaseURL)
- Docusaurus: enable @docusaurus/plugin-sitemap

2) Two separate configurations (one per site)

Example: docsearch.hugo.json

{
  "index_name": "site_hugo",
  "start_urls": ["https://www.example.com/"],
  "sitemap_urls": ["https://www.example.com/sitemap.xml"],
  "stop_urls": ["/tags/", "/categories/"],
  "selectors": {
    "lvl0": {"selector": "nav[role='navigation'] a.active", "global": true, "default_value": "Hugo"},
    "lvl1": "article h1, header h1",
    "lvl2": "article h2",
    "lvl3": "article h3",
    "text": "article p, article li, article td"
  }
}

Example: docsearch.docusaurus.json

{
  "index_name": "site_docs",
  "start_urls": ["https://docs.example.com/"],
  "sitemap_urls": ["https://docs.example.com/sitemap.xml"],
  "stop_urls": ["/blog/tags/"],
  "selectors": {
    "lvl0": {"selector": ".navbar__title", "global": true, "default_value": "Docs"},
    "lvl1": "article h1, header h1",
    "lvl2": "article h2",
    "lvl3": "article h3",
    "text": "article p, article li, table td"
  }
}

Docusaurus specificity: lvl1 => article h1, header h1 (instead of only header h1).

3) Run Scraper (Docker)

.env (adjust for Cloud, specify host/IP for self‑hosting):

TYPESENSE_API_KEY=xyz_admin_or_write_key
TYPESENSE_HOST=search.example.com
TYPESENSE_PORT=443
TYPESENSE_PROTOCOL=https

Run (Hugo):

docker run -it --rm   --env-file .env   -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)"   typesense/docsearch-scraper:0.11.0

Run (Docusaurus):

docker run -it --rm   --env-file .env   -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)"   typesense/docsearch-scraper:0.11.0

Tip: Run in CI/CD after deploy so that new content is indexed immediately.

Docusaurus – Frontend Integration

1) Enable Sitemap

npm i @docusaurus/plugin-sitemap --save

docusaurus.config.js:

plugins: [
  ['@docusaurus/plugin-sitemap', {/* optional */}],
];

2) Install Typesense Theme

npm i docusaurus-theme-search-typesense@next --save
# or: yarn add docusaurus-theme-search-typesense@next

3) Add Configuration

docusaurus.config.js (excerpt):

module.exports = {
  themes: ['docusaurus-theme-search-typesense'],
  themeConfig: {
    typesense: {
      typesenseCollectionName: 'site_docs',
      typesenseServerConfig: {
        nodes: [
          { host: 'search.example.com', port: 443, protocol: 'https' }
        ],
        apiKey: 'SEARCH_ONLY_KEY'
      },
      typesenseSearchParameters: {
        per_page: 8
      },
      contextualSearch: true
    }
  }
};

Security: In the repo use only the Search‑Only Key.

Hugo – Frontend Integration (two ways)

Way A: `typesense-docsearch.js` (quick, modal like Algolia)

Insert into Hugo layout (e.g. layouts/partials/head.html):

<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/typesense-docsearch-css@0.3.0" />

Before </body> (e.g. layouts/_default/baseof.html):

<div id="searchbar"></div>
<script src="https://cdn.jsdelivr.net/npm/typesense-docsearch.js@3.4"></script>
<script>
  docsearch({
    container: '#searchbar',
    typesense: {
      server: {
        nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
        apiKey: 'SEARCH_ONLY_KEY'
      },
      collectionName: 'site_hugo'
    },
    searchParameters: { per_page: 8 }
  });
</script>

Advantage: Minimal effort, UI as usual, no additional bundler.

Include scripts (without build tool):

<script src="https://cdn.jsdelivr.net/npm/instantsearch.js@4"></script>
<script src="https://cdn.jsdelivr.net/npm/typesense-instantsearch-adapter@2"></script>

Container & init (e.g. layouts/partials/footer.html):

<div id="searchbox"></div>
<div id="hits"></div>

<script>
  const adapter = new TypesenseInstantSearchAdapter({
    server: {
      nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
      apiKey: 'SEARCH_ONLY_KEY'
    },
    additionalSearchParameters: { query_by: 'title,content' }
  });
  const searchClient = adapter.searchClient;

  const search = instantsearch({
    indexName: 'site_hugo',
    searchClient
  });

  search.addWidgets([
    instantsearch.widgets.searchBox({ container: '#searchbox' }),
    instantsearch.widgets.hits({
      container: '#hits',
      templates: {
        item: (hit) => `<article>
          <a href="${hit.url}">${hit.title || hit.hierarchy_lvl1 || 'Result'}</a>
          <p>${(hit._snippetResult && hit._snippetResult.text && hit._snippetResult.text.value) || ''}</p>
        </article>`
      }
    }),
    instantsearch.widgets.pagination({ container: '#pagination' })
  ]);

  search.start();
</script>

Content fields (query_by) depend on which fields the scraper generates (e.g. title, text, hierarchy).

Relevance Tuning & Content Coverage

Stop URLs: possibly exclude changelog, archived areas.
Meta Tags (optional): make pages filterable by tags, e.g.
```
<meta name="docsearch:language_tag" content="en" />
<meta name="docsearch:version_tag" content="1.2.4" />
```
Then e.g. filter_by: "language_tag:=en" as search parameter.
Weighting: In Typesense weight fields via query_by_weights (e.g. title:4,text:1).
Synonyms: Add terms (plural/singular, German/English).
Snippets/Highlighting: better UX in result template.

CI/CD – Automatic Re‑Indexing

GitHub Actions (example job):

name: index-docs
on:
  workflow_dispatch:
  push:
    branches: [ main ]
    paths:
      - 'hugo/**'
      - 'docs/**'
jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install jq
        run: sudo apt-get update && sudo apt-get install -y jq
      - name: Run scraper (Hugo)
        run: |
          docker run --rm             --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }}             -e TYPESENSE_HOST=${{ secrets.TS_HOST }}             -e TYPESENSE_PORT=443             -e TYPESENSE_PROTOCOL=https             -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)"             typesense/docsearch-scraper:0.11.0
      - name: Run scraper (Docs)
        run: |
          docker run --rm             --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }}             -e TYPESENSE_HOST=${{ secrets.TS_HOST }}             -e TYPESENSE_PORT=443             -e TYPESENSE_PROTOCOL=https             -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)"             typesense/docsearch-scraper:0.11.0

Recommendation: Start job after successful deploy so the crawler fetches the fresh pages.

Security & Operations

Search‑Only Key in frontend, Admin/Write Key only in CI/servers.
CORS: restrict to your domains (for self‑hosting).
Enforce TLS (HTTPS).
Backups: regularly backup Typesense data directory.
Monitoring: watch memory/CPU (RAM‑based).

Alternatives (brief)

Pagefind (purely static, client‑side): no server needed, but no shared backend index for both sites.
Meilisearch: similar to Typesense; also suitable for your EU/self‑hosting use case. Since you want a unified setup for Hugo & Docusaurus and the strong DocSearch toolchain (scraper + UI), Typesense is usually the shorter path.

Checklist

Typesense (self‑hosted or Cloud, EU region) ready
Search‑Only Key created
Two scraper configs (Hugo + Docs) created
Sitemap(s) available
Scraper tested locally
Docusaurus theme integrated
Hugo frontend (DocSearch.js or InstantSearch) integrated
CI job for re‑indexing after deploy

Common Pitfalls & Tips

Local tests: Scraper in Docker cannot see localhost – use host.docker.internal.
Ports: Some scraper setups expect port 80. For local tests possibly use proxy.
Docusaurus lvl1 selector: article h1, header h1 (not just header h1).
Large sites: maintain stop_urls and sitemap_urls cleanly to reduce crawl time.
Multilingual: separate via meta tags (docsearch:*_tag) and filter_by.

Search for Hugo & Docusaurus – Analysis, Recommendation & Implementation Guide

Short Summary (for the impatient)

Architecture at a Glance

Setup Variants

B) Typesense Cloud (easy, EU region)

Indexing with DocSearch Scraper (Typesense Fork)

1) Requirements

2) Two separate configurations (one per site)

3) Run Scraper (Docker)

Docusaurus – Frontend Integration

1) Enable Sitemap

2) Install Typesense Theme

3) Add Configuration

Hugo – Frontend Integration (two ways)

Way B: InstantSearch Widgets (flexible UI, facets etc.)

Relevance Tuning & Content Coverage

CI/CD – Automatic Re‑Indexing

Security & Operations

Alternatives (brief)

Checklist

Common Pitfalls & Tips

Further Reading

Short Summary (for the impatient)​

Architecture at a Glance​

Setup Variants​

A) Self‑Hosting (recommended, full control / GDPR)​

B) Typesense Cloud (easy, EU region)​

Indexing with DocSearch Scraper (Typesense Fork)​

1) Requirements​

2) Two separate configurations (one per site)​

3) Run Scraper (Docker)​

Docusaurus – Frontend Integration​

1) Enable Sitemap​

2) Install Typesense Theme​

3) Add Configuration​

Hugo – Frontend Integration (two ways)​

Way A: typesense-docsearch.js (quick, modal like Algolia)​

Way B: InstantSearch Widgets (flexible UI, facets etc.)​

Relevance Tuning & Content Coverage​

CI/CD – Automatic Re‑Indexing​

Security & Operations​

Alternatives (brief)​

Checklist​

Common Pitfalls & Tips​

Further Reading​

Short Summary (for the impatient)

Architecture at a Glance

Setup Variants

A) Self‑Hosting (recommended, full control / GDPR)

B) Typesense Cloud (easy, EU region)

Indexing with DocSearch Scraper (Typesense Fork)

1) Requirements

2) Two separate configurations (one per site)

3) Run Scraper (Docker)

Docusaurus – Frontend Integration

1) Enable Sitemap

2) Install Typesense Theme

3) Add Configuration

Hugo – Frontend Integration (two ways)

Way A: `typesense-docsearch.js` (quick, modal like Algolia)

Way B: InstantSearch Widgets (flexible UI, facets etc.)

Relevance Tuning & Content Coverage

CI/CD – Automatic Re‑Indexing

Security & Operations

Alternatives (brief)

Checklist

Common Pitfalls & Tips

Further Reading