- C 90.5%
- CMake 4.1%
- Shell 2.1%
- C++ 1.5%
- Makefile 1%
- Other 0.7%
## Summary Three related changes to `turnutils_uclient` that together unblock the loadgen from being the bottleneck when benchmarking the relay: 1. **Sender thread pool** (`--sender-threads <N>`, max 4, auto-bumped to 2 at `-m >= 4`). Mirrors the listener pool that landed in #1911. Each sender thread owns its own libevent base, a session shard (round-robin assigned at allocation time via `elem->sender_id`), and a 100 µs timer that runs the burst loop just like the legacy main-thread `timer_handler` did. Send-side counters (`tot_send_messages`, `tot_send_bytes`, `tot_send_dropped`, `load_sent_packets`) and the completion accumulators in `client_timer_handler` (`total_loss` / `total_latency` / `total_jitter`) are written into per-thread cache-line-aligned slabs and reduced into the globals after `pthread_join`. This avoids the cross-core atomic-counter contention that the listener-pool work already documented. 2. **UDP-GSO send batching** in `send_buffer` for the plain-UDP path. The sender pool opens a thread-local batch window around its per-tick iteration; within the window, `send_buffer` copies the payload into a per-thread slot and appends to a scatter-gather `iov[]`. On flush: - **If `count > 1` and all segments share the same size** → one `sendmsg(2)` with a `UDP_SEGMENT` cmsg. - **If GSO is unavailable** (kernel returns `EINVAL`/`ENOPROTOOPT`/`EOPNOTSUPP`) → sticky-disable per thread, fall back to `sendmmsg(2)` over the same iov array. - **Per-entry `send(2)`** as the final fallback for whatever sendmmsg refused (EAGAIN tail, etc.). Auto-flush triggers: different fd (next session in iteration), different segment size, batch capacity (64), or end of iteration. 3. **`recv_pps` in `print_load_generator_rate`**, alongside the existing `send_pps`. Once the sender pool + GSO let uclient push >>1 Mpps of UDP, the meaningful end-to-end metric is the round-trip count, not the send-side count — the relay/peer pipeline drops 95+% of packets when uclient outpaces it. The progress line now reads: send_pps=6012928.00, recv_pps=101486.00, total_sent=112975924, total_recv=1853369 ## Why Benchmarking `--multiplex-client` / `--multiplex-peer` on a c-4 DigitalOcean droplet, the loadgen's single-threaded `timer_handler` saturated one CPU around 300 kpps regardless of `-m`. The relay was never put under real pressure, so the multiplex paths' value couldn't be measured. With this patch the loadgen can produce >6 Mpps from a single c-4 droplet, far above the relay's per-thread saturation point, so the bottleneck moves to the server where it belongs. ## Benchmark — multiplex-client turnserver, c-4 loadgen, m=4, 20 s | Round | OLD (master) | NEW (this PR) | Lift | |-------|--------------|---------------|------| | 1 | 246k send_pps | 7.48M | 30.4× | | 2 | 459k | 6.06M | 13.2× | | 3 | 360k | 5.07M | 14.1× | | **avg** | **355k** | **6.20M** | **17.5×** | Throughput cap shifts from loadgen to relay. End-to-end recv_pps (which is now first-class in the progress line) is ~100 kpps in this configuration — limited by the relay, not uclient. ## Design notes - **Cache-line alignment** on `uclient_sender` mirrors the listener-pool's slab pattern. Same false-sharing trap, same fix. - **Main-thread timer slows to 10 ms** when the sender pool is engaged. The main timer still fires for lifecycle / `__turn_getMSTime` refresh, but `timer_handler` early-returns when `num_sender_threads > 0` so we don't burn a core on no-op 100 µs ticks. - **Stop ordering**: `stop_sender_threads()` runs before `stop_listener_threads()` — the senders own session mutation (wmsgnum, to_send_timems, shutdown), so joining them first prevents a race where a listener accumulates a stat into a session whose owning sender is still iterating it. - **UDP-GSO copy**: the per-slot memcpy is intentional. The caller (`client_write`) reuses `elem->out_buffer` across burst iterations, so pointing `iov[i]` at the session buffer would alias all entries to the most recent payload. A rotating per-session output ring would eliminate the copy — left out of this PR because the kernel-side savings from collapsing N sendmsg into one GSO sendmsg dominate the per-packet copy cost at the rates we measured. - **Linux-only**: send-side batching machinery is gated by `#if defined(__linux__)`. Non-Linux builds get no-op `uclient_send_batch_begin`/`_end` and `uclient_tx_enqueue` returns false, falling through to the legacy `send(2)` loop. ## Test plan - [x] macOS local build (Apple Silicon, AppleClang). Sender-pool code paths compile under both Linux and non-Linux gates. - [x] `clang-format-15 --dry-run --Werror` clean. - [x] Linux build on a c-4 Ubuntu 24.04 droplet (`cmake -DCMAKE_BUILD_TYPE=Release`). - [x] `--help` includes the new `--sender-threads` option with valid-range hint; out-of-range values rejected. - [x] Benchmark on two c-4 droplets in nyc1 against `turnserver --multiplex-client`: 3 alternating rounds OLD vs NEW, +17.5× average send-side lift (data table above). - [x] `print_load_generator_rate` output verified — `send_pps`, `recv_pps`, `total_sent`, `total_recv` all populated and consistent across listener slab reductions. ## Limitations - `--multiplex-peer` is not driven by this PR. uclient's pattern (each `-m N` opens two internal sessions per client that share the same peer port) hits the multiplex-peer "one allocation per peer endpoint" rule; benchmarking that flag at high concurrency requires a separate small change (per-session secondary peer port) — not in scope here. - The wider per-round variance under the sender pool (rounds in our bench ranged 13×–30× lift) is timing/scheduler noise at small per-thread shards. Smoothens out as `-m` and per-thread session counts grow. |
||
|---|---|---|
| .github | ||
| .vscode | ||
| cmake | ||
| docker | ||
| docs | ||
| examples | ||
| filc | ||
| fuzzing | ||
| man/man1 | ||
| rpm | ||
| scripts | ||
| src | ||
| tests | ||
| turndb | ||
| .clang-format | ||
| .clang-tidy | ||
| .dockerignore | ||
| .gitignore | ||
| AUTHORS.md | ||
| authors.sh | ||
| ChangeLog | ||
| CLAUDE.md | ||
| CMakeLists.txt | ||
| configure | ||
| CONTRIBUTING.md | ||
| INSTALL | ||
| iwyu-ubuntu.imp | ||
| LICENSE | ||
| make-man.sh | ||
| Makefile.in | ||
| postinstall.txt | ||
| README.md | ||
| README.turnadmin | ||
| README.turnserver | ||
| README.turnutils | ||
| release.sh | ||
| STATUS.md | ||
| vcpkg.json | ||
Docker Hub | GitHub Container Registry | Quay.io
Coturn TURN server
coturn is a free open source implementation of TURN and STUN Server. The TURN Server is a VoIP media traffic NAT traversal server and gateway.
Installing / Getting started
Linux distros may have a version of coturn which you can install by
apt install coturn
turnserver --log-file stdout
Or run coturn using docker container:
docker run -d -p 3478:3478 -p 3478:3478/udp -p 5349:5349 -p 5349:5349/udp -p 49152-65535:49152-65535/udp coturn/coturn
See more details about using docker container Docker Readme
Developing
Dependencies
coturn requires following dependencies to be installed first
- libevent2
Optional
- openssl (to support TLS and DTLS, authorized STUN and TURN)
- libmicrohttp and prometheus-client-c (prometheus interface)
- MariaDB/MySQL (user database)
- Hiredis (user database, monitoring)
- SQLite (user database)
- PostgreSQL (user database)
Building
git clone git@github.com:coturn/coturn.git
cd coturn
./configure
make
Features
STUN specs:
- RFC 3489 - "classic" STUN
- RFC 5389 - base "new" STUN specs
- RFC 5769 - test vectors for STUN protocol testing
- RFC 5780 - NAT behavior discovery support
- RFC 7443 - ALPN support for STUN & TURN
- RFC 7635 - oAuth third-party TURN/STUN authorization
TURN specs:
- RFC 5766 - base TURN specs
- RFC 6062 - TCP relaying TURN extension
- RFC 6156 - IPv6 extension for TURN
- RFC 7443 - ALPN support for STUN & TURN
- RFC 7635 - oAuth third-party TURN/STUN authorization
- RFC 8016 - Mobility with Traversal Using Relays around NAT (TURN)
- DTLS support (http://tools.ietf.org/html/draft-petithuguenin-tram-turn-dtls-00)
- TURN REST API (http://tools.ietf.org/html/draft-uberti-behave-turn-rest-00)
- Origin field in TURN (Multi-tenant TURN Server) (https://tools.ietf.org/html/draft-ietf-tram-stun-origin-06)
- TURN Bandwidth draft specs (http://tools.ietf.org/html/draft-thomson-tram-turn-bandwidth-01)
- TURN-bis (with dual allocation) draft specs (http://tools.ietf.org/html/draft-ietf-tram-turnbis-04)
ICE and related specs:
- RFC 5245 - ICE
- RFC 5768 – ICE–SIP
- RFC 6336 – ICE–IANA Registry
- RFC 6544 – ICE–TCP
- RFC 5928 - TURN Resolution Mechanism
The implementation fully supports the following client-to-TURN-server protocols:
- UDP (per RFC 5766)
- TCP (per RFC 5766 and RFC 6062)
- TLS (per RFC 5766 and RFC 6062): including TLS1.3; ECDHE is supported.
- DTLS1.0 and DTLS1.2 (http://tools.ietf.org/html/draft-petithuguenin-tram-turn-dtls-00)
- SCTP (experimental implementation).
Relay protocols:
User databases (for user repository, with passwords or keys, if authentication is required):
- SQLite
- MariaDB/MySQL
- PostgreSQL
- Redis
- MongoDB
Management interfaces:
- telnet cli
- HTTPS interface
Monitoring:
- Redis can be used for status and statistics storage and notification
- prometheus interface (unavailable on apt package)
Message integrity digest algorithms:
- HMAC-SHA1, with MD5-hashed keys (as required by STUN and TURN standards)
TURN authentication mechanisms:
- 'classic' long-term credentials mechanism;
- TURN REST API (a modification of the long-term mechanism, for time-limited secret-based authentication, for WebRTC applications: http://tools.ietf.org/html/draft-uberti-behave-turn-rest-00);
- experimental third-party oAuth-based client authorization option;
Performance and Load Balancing:
When used as a part of an ICE solution, for VoIP connectivity, this TURN server can handle thousands simultaneous calls per CPU (when TURN protocol is used) or tens of thousands calls when only STUN protocol is used. For virtually unlimited scalability a load balancing scheme can be used. The load balancing can be implemented with the following tools (either one or a combination of them):
- DNS SRV based load balancing;
- built-in 300 ALTERNATE-SERVER mechanism (requires 300 response support by the TURN client);
- network load-balancer server.
Traffic bandwidth limitation and congestion avoidance algorithms implemented.
Target platforms:
- Linux (Debian, Ubuntu, Mint, CentOS, Fedora, Redhat, Amazon Linux, Arch Linux, OpenSUSE)
- BSD (FreeBSD, NetBSD, OpenBSD, DragonFlyBSD)
- Solaris 11
- Mac OS X
- Cygwin (for non-production R&D purposes)
- Windows (native with, e.g., MSVC toolchain)
This project can be successfully used on other *NIX platforms, too, but that is not officially supported.
The implementation is supposed to be simple, easy to install and configure. The project focuses on performance, scalability and simplicity. The aim is to provide an enterprise-grade TURN solution.
To achieve high performance and scalability, the TURN server is implemented with the following features:
- High-performance industrial-strength Network IO engine libevent2 is used
- Configurable multi-threading model implemented to allow full usage of available CPU resources (if OS allows multi-threading)
- Multiple listening and relay addresses can be configured
- Efficient memory model used
- The TURN project code can be used in a custom proprietary networking environment. In the TURN server code, an abstract networking API is used. Only couple files in the project have to be re-written to plug-in the TURN server into a proprietary environment. With this project, only implementation for standard UNIX Networking/IO API is provided, but the user can implement any other environment. The TURN server code was originally developed for a high-performance proprietary corporate environment, then adopted for UNIX Networking API
- The TURN server works as a user space process, without imposing any special requirements on the system
Links
- Project homepage: https://coturn.github.io/
- Repository: https://github.com/coturn/coturn/
- Issue tracker: https://github.com/coturn/coturn/issues
- Google group: https://groups.google.com/forum/#!forum/turn-server-project-rfc5766-turn-server