Skip to main content

Search for Hugo & Docusaurus – Analysis, Recommendation & Implementation Guide

Status: 16.09.2025 (Europe/Berlin)


Short Summary (for the impatient)

Recommendation: Use Typesense (self‑hosted or Typesense Cloud with EU region, e.g. Frankfurt) + DocSearch Scraper (Typesense Fork) as crawler.

  • Unified tooling for Hugo and Docusaurus.
  • GDPR/EU residency possible (self‑hosting or Cloud region EU).
  • Frontends:
    • Docusaurus: docusaurus-theme-search-typesense
    • Hugo: typesense-docsearch.js or typesense-instantsearch-adapter

Why not Algolia DocSearch (free program)?

  • Data residency: global replication; no guarantee for EU‑only.
  • Free only for public open‑source docs; Hugo site may not be eligible.
  • No self‑hosting; tooling would not be fully identical.

Architecture at a Glance

  1. Crawler: DocSearch Scraper (Typesense Fork) crawls publicly available pages and generates records.
  2. Index: Records are written to Typesense (self‑hosting or Cloud in EU).
  3. Frontend:
    • Docusaurus: Theme plugin uses the Typesense index.
    • Hugo: Easy integration via typesense-docsearch.js (modal like Algolia) or via InstantSearch widgets with typesense-instantsearch-adapter.

Advantage: Same backend and same indexer for both sites, separate collections possible (e.g. site_hugo, site_docs).


Setup Variants

  • Docker Compose (1 node to start, later HA possible)
  • Own domain, TLS (reverse proxy like Caddy/Traefik/NGINX)
  • Search‑only API key for frontend

Example: docker-compose.yml

version: '3.8'
services:
typesense:
image: typesense/typesense:latest
command: ["--data-dir=/data", "--api-key=${TYPESENSE_API_KEY}", "--enable-cors"]
ports: ["8108:8108"]
volumes:
- ./data:/data
environment:
- TZ=Europe/Berlin

B) Typesense Cloud (easy, EU region)

  • Select region at cluster creation EU (Frankfurt, Zurich, Paris, …).
  • Generate API keys in the dashboard (Admin, Search‑only).

In both cases: In the frontend use only the Search‑Only Key.


Indexing with DocSearch Scraper (Typesense Fork)

1) Requirements

  • Live URL or locally reachable via host.docker.internal
  • Sitemap improves coverage & speed
    • Hugo: /sitemap.xml is generated by default (set BaseURL)
    • Docusaurus: enable @docusaurus/plugin-sitemap

2) Two separate configurations (one per site)

Example: docsearch.hugo.json

{
"index_name": "site_hugo",
"start_urls": ["https://www.example.com/"],
"sitemap_urls": ["https://www.example.com/sitemap.xml"],
"stop_urls": ["/tags/", "/categories/"],
"selectors": {
"lvl0": {"selector": "nav[role='navigation'] a.active", "global": true, "default_value": "Hugo"},
"lvl1": "article h1, header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"text": "article p, article li, article td"
}
}

Example: docsearch.docusaurus.json

{
"index_name": "site_docs",
"start_urls": ["https://docs.example.com/"],
"sitemap_urls": ["https://docs.example.com/sitemap.xml"],
"stop_urls": ["/blog/tags/"],
"selectors": {
"lvl0": {"selector": ".navbar__title", "global": true, "default_value": "Docs"},
"lvl1": "article h1, header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"text": "article p, article li, table td"
}
}

Docusaurus specificity: lvl1 => article h1, header h1 (instead of only header h1).

3) Run Scraper (Docker)

.env (adjust for Cloud, specify host/IP for self‑hosting):

TYPESENSE_API_KEY=xyz_admin_or_write_key
TYPESENSE_HOST=search.example.com
TYPESENSE_PORT=443
TYPESENSE_PROTOCOL=https

Run (Hugo):

docker run -it --rm   --env-file .env   -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)"   typesense/docsearch-scraper:0.11.0

Run (Docusaurus):

docker run -it --rm   --env-file .env   -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)"   typesense/docsearch-scraper:0.11.0

Tip: Run in CI/CD after deploy so that new content is indexed immediately.


Docusaurus – Frontend Integration

1) Enable Sitemap

npm i @docusaurus/plugin-sitemap --save

docusaurus.config.js:

plugins: [
['@docusaurus/plugin-sitemap', {/* optional */}],
];

2) Install Typesense Theme

npm i docusaurus-theme-search-typesense@next --save
# or: yarn add docusaurus-theme-search-typesense@next

3) Add Configuration

docusaurus.config.js (excerpt):

module.exports = {
themes: ['docusaurus-theme-search-typesense'],
themeConfig: {
typesense: {
typesenseCollectionName: 'site_docs',
typesenseServerConfig: {
nodes: [
{ host: 'search.example.com', port: 443, protocol: 'https' }
],
apiKey: 'SEARCH_ONLY_KEY'
},
typesenseSearchParameters: {
per_page: 8
},
contextualSearch: true
}
}
};

Security: In the repo use only the Search‑Only Key.


Hugo – Frontend Integration (two ways)

Way A: typesense-docsearch.js (quick, modal like Algolia)

Insert into Hugo layout (e.g. layouts/partials/head.html):

<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/typesense-docsearch-css@0.3.0" />

Before </body> (e.g. layouts/_default/baseof.html):

<div id="searchbar"></div>
<script src="https://cdn.jsdelivr.net/npm/typesense-docsearch.js@3.4"></script>
<script>
docsearch({
container: '#searchbar',
typesense: {
server: {
nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
apiKey: 'SEARCH_ONLY_KEY'
},
collectionName: 'site_hugo'
},
searchParameters: { per_page: 8 }
});
</script>

Advantage: Minimal effort, UI as usual, no additional bundler.

Way B: InstantSearch Widgets (flexible UI, facets etc.)

Include scripts (without build tool):

<script src="https://cdn.jsdelivr.net/npm/instantsearch.js@4"></script>
<script src="https://cdn.jsdelivr.net/npm/typesense-instantsearch-adapter@2"></script>

Container & init (e.g. layouts/partials/footer.html):

<div id="searchbox"></div>
<div id="hits"></div>

<script>
const adapter = new TypesenseInstantSearchAdapter({
server: {
nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
apiKey: 'SEARCH_ONLY_KEY'
},
additionalSearchParameters: { query_by: 'title,content' }
});
const searchClient = adapter.searchClient;

const search = instantsearch({
indexName: 'site_hugo',
searchClient
});

search.addWidgets([
instantsearch.widgets.searchBox({ container: '#searchbox' }),
instantsearch.widgets.hits({
container: '#hits',
templates: {
item: (hit) => `<article>
<a href="${hit.url}">${hit.title || hit.hierarchy_lvl1 || 'Result'}</a>
<p>${(hit._snippetResult && hit._snippetResult.text && hit._snippetResult.text.value) || ''}</p>
</article>`
}
}),
instantsearch.widgets.pagination({ container: '#pagination' })
]);

search.start();
</script>

Content fields (query_by) depend on which fields the scraper generates (e.g. title, text, hierarchy).


Relevance Tuning & Content Coverage

  • Stop URLs: possibly exclude changelog, archived areas.
  • Meta Tags (optional): make pages filterable by tags, e.g.
    <meta name="docsearch:language_tag" content="en" />
    <meta name="docsearch:version_tag" content="1.2.4" />
    Then e.g. filter_by: "language_tag:=en" as search parameter.
  • Weighting: In Typesense weight fields via query_by_weights (e.g. title:4,text:1).
  • Synonyms: Add terms (plural/singular, German/English).
  • Snippets/Highlighting: better UX in result template.

CI/CD – Automatic Re‑Indexing

GitHub Actions (example job):

name: index-docs
on:
workflow_dispatch:
push:
branches: [ main ]
paths:
- 'hugo/**'
- 'docs/**'
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install jq
run: sudo apt-get update && sudo apt-get install -y jq
- name: Run scraper (Hugo)
run: |
docker run --rm --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }} -e TYPESENSE_HOST=${{ secrets.TS_HOST }} -e TYPESENSE_PORT=443 -e TYPESENSE_PROTOCOL=https -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
- name: Run scraper (Docs)
run: |
docker run --rm --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }} -e TYPESENSE_HOST=${{ secrets.TS_HOST }} -e TYPESENSE_PORT=443 -e TYPESENSE_PROTOCOL=https -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0

Recommendation: Start job after successful deploy so the crawler fetches the fresh pages.


Security & Operations

  • Search‑Only Key in frontend, Admin/Write Key only in CI/servers.
  • CORS: restrict to your domains (for self‑hosting).
  • Enforce TLS (HTTPS).
  • Backups: regularly backup Typesense data directory.
  • Monitoring: watch memory/CPU (RAM‑based).

Alternatives (brief)

  • Pagefind (purely static, client‑side): no server needed, but no shared backend index for both sites.
  • Meilisearch: similar to Typesense; also suitable for your EU/self‑hosting use case. Since you want a unified setup for Hugo & Docusaurus and the strong DocSearch toolchain (scraper + UI), Typesense is usually the shorter path.

Checklist

  • Typesense (self‑hosted or Cloud, EU region) ready
  • Search‑Only Key created
  • Two scraper configs (Hugo + Docs) created
  • Sitemap(s) available
  • Scraper tested locally
  • Docusaurus theme integrated
  • Hugo frontend (DocSearch.js or InstantSearch) integrated
  • CI job for re‑indexing after deploy

Common Pitfalls & Tips

  • Local tests: Scraper in Docker cannot see localhost – use host.docker.internal.
  • Ports: Some scraper setups expect port 80. For local tests possibly use proxy.
  • Docusaurus lvl1 selector: article h1, header h1 (not just header h1).
  • Large sites: maintain stop_urls and sitemap_urls cleanly to reduce crawl time.
  • Multilingual: separate via meta tags (docsearch:*_tag) and filter_by.

Further Reading

  • Typesense docs: DocSearch Scraper (Typesense Fork), Theme guide, InstantSearch adapter.
  • Docusaurus: Sitemap plugin, Theme styling.
  • Hugo: Sitemap usually active automatically; set baseURL correctly.

If you like, I can adapt the two scraper configs specifically for your domains/structure and provide ready‑to‑use PRs for Hugo & Docusaurus.