Harvest Pipeline Speed: Node vs Python vs azd Template Ingest
By FrootAI Engine Team · CC0-1.0 · Data: GitHub Actions runner (4 vCPU, 16 GB RAM, Ubuntu 22.04), Azure OpenAI PTU (East US)
Fastest
Node 4.2s
Results by Category
Milliseconds (median, Node implementation, n=50 repos)
View data table
| Category | Node pipeline |
|---|---|
| S1 Discover | 320 |
| S2 Fetch | 480 |
| S3 Extract | 1,200 |
| S4 Retrieve | 650 |
| S5 Scaffold | 980 |
| S6 Compose | 380 |
| S7 Customize | 190 |
Key Findings
- 1.Node implementation completes S1-S7 in 4.2s median (50th percentile) — 38% faster than Python (6.8s) and 65% faster than azd baseline (12.1s).
- 2.The speed advantage is primarily in S3 Extract (Ajv2020 validation 3x faster than Python jsonschema) and S6 Compose (V8 template literal performance).
- 3.LLM call latency (S3 + S5) accounts for 60-70% of total wall-clock time across all implementations — the non-LLM stages are where Node's advantage concentrates.
- 4.Python implementation is competitive on S4 Retrieve (numpy-backed similarity search) — only 5% slower than Node's WASM-based alternative.
- 5.azd baseline includes interactive prompts and Azure ARM deployment validation that add 4-5s of overhead not present in the FrootAI pipeline (which defers deployment to a separate step).
Methodology
We benchmarked the end-to-end harvest pipeline (S1 Discover → S7 Customize) across 50 GitHub repos spanning 8 programming languages and 3 cloud providers. Each repo was processed 5 times and the median wall-clock time recorded. Three implementations were compared: 1. **Node (FrootAI)**: The production `frootai-core` pipeline running on Node.js 22 with the Ajv2020 validator, text-embedding-3-large for retrieval, and GPT-4o for extraction + scaffolding. 2. **Python (FrootAI)**: A feature-equivalent Python implementation using the same LLM calls, jsonschema validator, and identical prompt templates. 3. **azd baseline**: Microsoft's `azd init` + `azd provision` workflow for the same repos, measuring from repo URL to deployable template. Note: azd does not perform fact-extraction or composition — this comparison measures time-to-deployable-output, not feature parity. All runs used the same Azure OpenAI deployment (GPT-4o, East US, PTU-reserved) to eliminate API latency variance. Runs were on a GitHub Actions runner (4 vCPU, 16 GB RAM, Ubuntu 22.04). The 7 pipeline stages were also measured individually to identify bottlenecks.
Reproduce This Study
Run the benchmark yourself. The script, sample data, and expected output are in the repository:
git clone https://github.com/frootai/frootai-core cd frootai-core/scripts/benchmarks/pipeline-speed bash run.shView script on GitHub
Cite This Study
@misc{frootai2026speed,
title={Harvest Pipeline Speed: Node vs Python vs azd Template Ingest},
author={FrootAI Engine Team},
year={2026},
url={https://frootai.dev/lab/pipeline-speed}
}FrootAI Engine Team. (2026). Harvest Pipeline Speed: Node vs Python vs azd Template Ingest. FrootAI Lab. https://frootai.dev/lab/pipeline-speed
FrootAI Engine Team, "Harvest Pipeline Speed: Node vs Python vs azd Template Ingest," FrootAI Lab, 2026. [Online]. Available: https://frootai.dev/lab/pipeline-speed