Performance

How fast is Pyvorin?

We benchmark honestly because we have to - our customers are engineers, not mugs. No cherry-picking, no hidden configs, no synthetic micro-benchmarks rigged to make us look good. Here is what you actually get.

5–50×
Numerical workloads
Loops, arithmetic, reductions
2–10×
String processing
Parsing, tokenisation, ETL
3–15×
Data structure ops
List/dict/set manipulation
1.2–2×
I/O-bound code
File reads, network waits

Why native code beats the interpreter

CPython is solid engineering, but it is still an interpreter. Every operation hits the same dispatch loop. Pyvorin skips that entirely.

1

Type-specialised machine instructions

In CPython, a + b calls PyNumber_Add, which checks the types of both operands, dispatches to the correct C function, increments reference counts, and handles overflow. That is ~50–100 CPU instructions.

We know a and b are integers at compile time. We emit a single addq instruction. On x86_64, that is 1 CPU instruction. The speedup on tight loops is dramatic - often 20–50× for simple arithmetic.

2

No Global Interpreter Lock

CPython’s GIL ensures only one thread executes Python bytecode at a time. Even on a 64-core machine, CPU-bound Python is effectively single-threaded unless you use multiprocessing.

Compiled functions release the GIL before entering native code. You can call the same compiled function from multiple Python threads and they will execute in parallel on all cores. This alone can provide a 4–16× throughput improvement on multi-core servers.

3

SIMD vectorisation

Modern CPUs have vector registers that can process 4, 8, or 16 values simultaneously (AVX2, AVX-512, NEON). CPython cannot use these because every operation is individually dispatched.

LLVM’s auto-vectoriser transforms loops like sum(x[i] for i in range(N)) into SIMD instructions. On AVX2, this processes 4 doubles or 8 floats per instruction - a theoretical 4–8× speedup on top of the interpreter removal.

4

Direct memory access

Python lists are arrays of pointers to PyObject. Accessing my_list[i] requires pointer indirection, a type check, and a reference count increment.

Pyvorin’s list runtime uses contiguous C arrays. When types are known, my_list[i] compiles to a single memory access with a bounds check. For typed numeric arrays, the bounds check can even be hoisted out of the loop by LLVM.

5

Loop unrolling and inlining

Function calls in Python are expensive: build a frame, push locals, execute the call, pop the frame, return. In tight loops, this dominates runtime.

LLVM inlines small functions and unrolls short loops, eliminating call overhead and enabling cross-loop optimisations. A loop that calls math.sqrt on each iteration becomes a series of inlined vsqrtpd SIMD instructions.

Infrastructure cost impact

Speed is not just about user experience - it is about not burning money on CPU time you do not need. A 10× speedup means 1/10th the compute. On AWS or GCP, that translates directly to your monthly bill.

Before Pyvorin £2,400 / month
After Pyvorin (10× speedup) £240 / month
Annual savings £25,920

Based on 4× c6i.2xlarge instances at roughly £0.13/hr running CPU-bound Python 24/7. Your mileage will vary depending on workload, cloud provider, and how good your finance team is at spotting waste.

Example: ETL pipeline

CPython runtime 4.2 hours
Pyvorin runtime 18 minutes
Speedup 14×
Cloud cost (before) £142 / run
Cloud cost (after) £10 / run

Daily ETL on 50M rows: JSON parsing, validation, aggregation, CSV output. Running on AWS c6i.4xlarge spot instances.

Our benchmarking methodology

We publish every detail of how we measure so you can reproduce our results.

Correctness first

Every benchmark result is validated against CPython ground truth. If we produce a different answer, the benchmark is marked as failed - regardless of how fast it ran.

Warm runs only

We report the minimum of multiple warm runs, not the first run. First-run times include compilation overhead, which is tracked separately. Cache hits load in < 5 ms.

Full disclosure

We publish hardware specs, Python version, our version, CPU features, and the exact benchmark source code. You can run the same benchmarks on your own hardware.

See the numbers for yourself

Browse our public benchmark database with 48 real-world workloads.