Search for Hugo & Docusaurus – Analysis, Recommendation & Implementation Guide
Status: 16.09.2025 (Europe/Berlin)
Short Summary (for the impatient)
Recommendation: Use Typesense (self‑hosted or Typesense Cloud with EU region, e.g. Frankfurt) + DocSearch Scraper (Typesense Fork) as crawler.
- Unified tooling for Hugo and Docusaurus.
- GDPR/EU residency possible (self‑hosting or Cloud region EU).
- Frontends:
- Docusaurus:
docusaurus-theme-search-typesense - Hugo:
typesense-docsearch.jsortypesense-instantsearch-adapter
- Docusaurus:
Why not Algolia DocSearch (free program)?
- Data residency: global replication; no guarantee for EU‑only.
- Free only for public open‑source docs; Hugo site may not be eligible.
- No self‑hosting; tooling would not be fully identical.
Architecture at a Glance
- Crawler: DocSearch Scraper (Typesense Fork) crawls publicly available pages and generates records.
- Index: Records are written to Typesense (self‑hosting or Cloud in EU).
- Frontend:
- Docusaurus: Theme plugin uses the Typesense index.
- Hugo: Easy integration via
typesense-docsearch.js(modal like Algolia) or via InstantSearch widgets withtypesense-instantsearch-adapter.
Advantage: Same backend and same indexer for both sites, separate collections possible (e.g.
site_hugo,site_docs).
Setup Variants
A) Self‑Hosting (recommended, full control / GDPR)
- Docker Compose (1 node to start, later HA possible)
- Own domain, TLS (reverse proxy like Caddy/Traefik/NGINX)
- Search‑only API key for frontend
Example: docker-compose.yml
version: '3.8'
services:
typesense:
image: typesense/typesense:latest
command: ["--data-dir=/data", "--api-key=${TYPESENSE_API_KEY}", "--enable-cors"]
ports: ["8108:8108"]
volumes:
- ./data:/data
environment:
- TZ=Europe/Berlin
B) Typesense Cloud (easy, EU region)
- Select region at cluster creation EU (Frankfurt, Zurich, Paris, …).
- Generate API keys in the dashboard (Admin, Search‑only).
In both cases: In the frontend use only the Search‑Only Key.
Indexing with DocSearch Scraper (Typesense Fork)
1) Requirements
- Live URL or locally reachable via
host.docker.internal - Sitemap improves coverage & speed
- Hugo:
/sitemap.xmlis generated by default (set BaseURL) - Docusaurus: enable
@docusaurus/plugin-sitemap
- Hugo:
2) Two separate configurations (one per site)
Example: docsearch.hugo.json
{
"index_name": "site_hugo",
"start_urls": ["https://www.example.com/"],
"sitemap_urls": ["https://www.example.com/sitemap.xml"],
"stop_urls": ["/tags/", "/categories/"],
"selectors": {
"lvl0": {"selector": "nav[role='navigation'] a.active", "global": true, "default_value": "Hugo"},
"lvl1": "article h1, header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"text": "article p, article li, article td"
}
}
Example: docsearch.docusaurus.json
{
"index_name": "site_docs",
"start_urls": ["https://docs.example.com/"],
"sitemap_urls": ["https://docs.example.com/sitemap.xml"],
"stop_urls": ["/blog/tags/"],
"selectors": {
"lvl0": {"selector": ".navbar__title", "global": true, "default_value": "Docs"},
"lvl1": "article h1, header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"text": "article p, article li, table td"
}
}
Docusaurus specificity:
lvl1=>article h1, header h1(instead of onlyheader h1).
3) Run Scraper (Docker)
.env (adjust for Cloud, specify host/IP for self‑hosting):
TYPESENSE_API_KEY=xyz_admin_or_write_key
TYPESENSE_HOST=search.example.com
TYPESENSE_PORT=443
TYPESENSE_PROTOCOL=https
Run (Hugo):
docker run -it --rm --env-file .env -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
Run (Docusaurus):
docker run -it --rm --env-file .env -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
Tip: Run in CI/CD after deploy so that new content is indexed immediately.
Docusaurus – Frontend Integration
1) Enable Sitemap
npm i @docusaurus/plugin-sitemap --save
docusaurus.config.js:
plugins: [
['@docusaurus/plugin-sitemap', {/* optional */}],
];
2) Install Typesense Theme
npm i docusaurus-theme-search-typesense@next --save
# or: yarn add docusaurus-theme-search-typesense@next
3) Add Configuration
docusaurus.config.js (excerpt):
module.exports = {
themes: ['docusaurus-theme-search-typesense'],
themeConfig: {
typesense: {
typesenseCollectionName: 'site_docs',
typesenseServerConfig: {
nodes: [
{ host: 'search.example.com', port: 443, protocol: 'https' }
],
apiKey: 'SEARCH_ONLY_KEY'
},
typesenseSearchParameters: {
per_page: 8
},
contextualSearch: true
}
}
};
Security: In the repo use only the Search‑Only Key.
Hugo – Frontend Integration (two ways)
Way A: typesense-docsearch.js (quick, modal like Algolia)
Insert into Hugo layout (e.g. layouts/partials/head.html):
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/typesense-docsearch-css@0.3.0" />
Before </body> (e.g. layouts/_default/baseof.html):
<div id="searchbar"></div>
<script src="https://cdn.jsdelivr.net/npm/typesense-docsearch.js@3.4"></script>
<script>
docsearch({
container: '#searchbar',
typesense: {
server: {
nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
apiKey: 'SEARCH_ONLY_KEY'
},
collectionName: 'site_hugo'
},
searchParameters: { per_page: 8 }
});
</script>
Advantage: Minimal effort, UI as usual, no additional bundler.
Way B: InstantSearch Widgets (flexible UI, facets etc.)
Include scripts (without build tool):
<script src="https://cdn.jsdelivr.net/npm/instantsearch.js@4"></script>
<script src="https://cdn.jsdelivr.net/npm/typesense-instantsearch-adapter@2"></script>
Container & init (e.g. layouts/partials/footer.html):
<div id="searchbox"></div>
<div id="hits"></div>
<script>
const adapter = new TypesenseInstantSearchAdapter({
server: {
nodes: [{ host: 'search.example.com', port: 443, protocol: 'https' }],
apiKey: 'SEARCH_ONLY_KEY'
},
additionalSearchParameters: { query_by: 'title,content' }
});
const searchClient = adapter.searchClient;
const search = instantsearch({
indexName: 'site_hugo',
searchClient
});
search.addWidgets([
instantsearch.widgets.searchBox({ container: '#searchbox' }),
instantsearch.widgets.hits({
container: '#hits',
templates: {
item: (hit) => `<article>
<a href="${hit.url}">${hit.title || hit.hierarchy_lvl1 || 'Result'}</a>
<p>${(hit._snippetResult && hit._snippetResult.text && hit._snippetResult.text.value) || ''}</p>
</article>`
}
}),
instantsearch.widgets.pagination({ container: '#pagination' })
]);
search.start();
</script>
Content fields (
query_by) depend on which fields the scraper generates (e.g.title,text,hierarchy).
Relevance Tuning & Content Coverage
- Stop URLs: possibly exclude changelog, archived areas.
- Meta Tags (optional): make pages filterable by tags, e.g.
Then e.g.
<meta name="docsearch:language_tag" content="en" />
<meta name="docsearch:version_tag" content="1.2.4" />filter_by: "language_tag:=en"as search parameter. - Weighting: In Typesense weight fields via
query_by_weights(e.g.title:4,text:1). - Synonyms: Add terms (plural/singular, German/English).
- Snippets/Highlighting: better UX in result template.
CI/CD – Automatic Re‑Indexing
GitHub Actions (example job):
name: index-docs
on:
workflow_dispatch:
push:
branches: [ main ]
paths:
- 'hugo/**'
- 'docs/**'
jobs:
scrape:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install jq
run: sudo apt-get update && sudo apt-get install -y jq
- name: Run scraper (Hugo)
run: |
docker run --rm --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }} -e TYPESENSE_HOST=${{ secrets.TS_HOST }} -e TYPESENSE_PORT=443 -e TYPESENSE_PROTOCOL=https -e "CONFIG=$(cat docsearch.hugo.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
- name: Run scraper (Docs)
run: |
docker run --rm --env TYPESENSE_API_KEY=${{ secrets.TS_API_KEY }} -e TYPESENSE_HOST=${{ secrets.TS_HOST }} -e TYPESENSE_PORT=443 -e TYPESENSE_PROTOCOL=https -e "CONFIG=$(cat docsearch.docusaurus.json | jq -r tostring)" typesense/docsearch-scraper:0.11.0
Recommendation: Start job after successful deploy so the crawler fetches the fresh pages.
Security & Operations
- Search‑Only Key in frontend, Admin/Write Key only in CI/servers.
- CORS: restrict to your domains (for self‑hosting).
- Enforce TLS (HTTPS).
- Backups: regularly backup Typesense data directory.
- Monitoring: watch memory/CPU (RAM‑based).
Alternatives (brief)
- Pagefind (purely static, client‑side): no server needed, but no shared backend index for both sites.
- Meilisearch: similar to Typesense; also suitable for your EU/self‑hosting use case. Since you want a unified setup for Hugo & Docusaurus and the strong DocSearch toolchain (scraper + UI), Typesense is usually the shorter path.
Checklist
- Typesense (self‑hosted or Cloud, EU region) ready
- Search‑Only Key created
- Two scraper configs (Hugo + Docs) created
- Sitemap(s) available
- Scraper tested locally
- Docusaurus theme integrated
- Hugo frontend (DocSearch.js or InstantSearch) integrated
- CI job for re‑indexing after deploy
Common Pitfalls & Tips
- Local tests: Scraper in Docker cannot see
localhost– usehost.docker.internal. - Ports: Some scraper setups expect port
80. For local tests possibly use proxy. - Docusaurus lvl1 selector:
article h1, header h1(not justheader h1). - Large sites: maintain
stop_urlsandsitemap_urlscleanly to reduce crawl time. - Multilingual: separate via meta tags (
docsearch:*_tag) andfilter_by.
Further Reading
- Typesense docs: DocSearch Scraper (Typesense Fork), Theme guide, InstantSearch adapter.
- Docusaurus: Sitemap plugin, Theme styling.
- Hugo: Sitemap usually active automatically; set
baseURLcorrectly.
If you like, I can adapt the two scraper configs specifically for your domains/structure and provide ready‑to‑use PRs for Hugo & Docusaurus.