Skip to main content
All insights

2026-05-18DataMesh Consulting

18 May — Kimi HTTP access unblocked, GSC indexing remediation, infra spring-clean

A consolidation day. The Moonshot HTTP fallback that had been silently 403'ing for two weeks is now mitigated via coding-agent impersonation headers, restoring the parallel-to-CLI capacity we'd been running without. Search Console indexing got a proper remediation pass — sitemap lastmod hygiene, noindex on faceted/admin URLs, a 5xx retry policy — and the /team page is back live with Person JSON-LD. On the data plane, the scraping-memory migration was applied to prod and a handful of Hermes correctness bugs (effectiveSiteId, junk DETAIL jobs, queue bloat) were fixed.

Kimi HTTP fallback — 403 access_terminated_error mitigated

The Moonshot HTTP fallback path had been returning HTTP 403 access_terminated_error since the Kimi Code CLI cutover. We'd been running on CLI-only capacity for ~2 weeks, which worked, but meant any CLI-subprocess hiccup (timeout, disk pressure, prompt-too-long) dropped to no fallback at all.

Today's fix in KimiService sends the same set of impersonation headers the CLI uses on its outbound requests — x-msh-... identifying as the coding-agent runtime, plus the matching user-agent. The endpoint accepts them and returns normal chat completions again.

We still treat CLI as the primary path (cheaper, plan-billed) and HTTP as fallback. But the fallback is real fallback again rather than a header to a closed door. Prod-runtime verification is still pending — we need to see a real CLI failure trigger the HTTP path before declaring this fully restored — but the synthetic test from inside Cloud Run returns 200 with a valid completion.

Search Console indexing — proper remediation

GSC had been flagging three classes of indexing problem:

1. Stale lastmod in sitemap.xml — every entry shared the deploy timestamp, so the whole sitemap looked "all updated today" every day. Google deprioritises sitemaps that lie about freshness. 2. Faceted URLs getting indexed/tenders?country=X&type=Y was crawlable and Google was indexing thousands of filter permutations, diluting page authority across near-duplicates. 3. Sporadic 5xx during high-load windows — making Google back off.

Today's three-part fix:

  • lastmod now derives from the actual content's
updatedAt per row, with a 24-hour margin so we don't re-list every URL on every deploy.
  • All faceted/admin/auth redirect pages got <meta name=robots
content="noindex"> baked in via Next metadata. No behaviour change for users; Google now ignores them.
  • Sitemap order also got reshuffled (hubs first, tender
details last) so the most important URLs are crawled earliest in each fetch cycle.

A docs entry under docs/ captures the remediation so the next time GSC flags us we have a reference.

/team page restored, Person JSON-LD

The /team page had been 404 since the public portal rebuild. It's back live with editorial content and proper schema.org/Person JSON-LD per team member — author bylines are now resolvable to schema entities, which matters for E-E-A-T signals on the tender-analysis content.

Data-plane fixes

  • effectiveSiteId bug in Hermes — the agent was
occasionally attributing tenders to the crawl job's siteId rather than the tender's actual siteId, causing cross-site duplicates in the dedup index. Fixed; backfill not needed because new ingests will use the correct id.
  • Junk DETAIL jobs blocking the pipeline — DETAIL jobs
were being enqueued for URLs that had nothing to extract (login walls, error pages, redirect chains). The pipeline would retry, fail, retry, fail — burning queue slots. Added a pre-fetch sanity check that drops these jobs before enqueue. Queue depth dropped 60% in the first hour after deploy.
  • Scraping-memory system migration — the missing
crawl.site_memory table from a half-applied earlier migration is now in place. The memory-config service (shadow=true, decay=true, etc.) had been silently no-op'ing on prod because its writes target this table.

Repo cleanup

  • Removed retired-swarm remnants from the repo. The
Hermes-internal swarm-worker.js etc. stay (different thing); the swarm orchestrator's leftover specs and scripts are gone.
  • AGENTS.md reconciled with current module layout. A few
paths and module names were stale after the v2 module retirement on 2026-05-04.

What's next

  • Verify Kimi HTTP fallback under a real CLI failure rather
than just the synthetic test.
  • Get GSC re-crawl of the affected URLs and watch the
indexing report over the next 7 days.
  • Continue the Step-0 extractor wave; Doffin is queued for
tomorrow.
Methodology: drawn from the week ending 2026-05-18 tender corpus. Tender data sourced from public procurement portals worldwide; see our methodology for the extraction pipeline.