include/linux/if_caqm.h
0 → 100644
+485
−0
+4
−1
+20
−0
Loading
Merge Pull Request from: @chengjunjia
# Introduction of CAQM
Add a new congestion control algorithm, caqm.
CAQM(Confined Active Queue Management) can be viewed as an enhanced ECN, which makes the packet carry, not only `congestion or not` signal, but also the `cwnd increasement value` from the endhost to let the switches better coordinate endhosts.
## CAQM header location
The location of a **caqm header** is at the Ether Header, to avoid recalculating the checksum after modifying the caqm header.
```
┌──────┬──────┬──────┬────────┬─────────────┬──────────────┐
│dstMac│srcMac│ tpid │CAQM_Hdr│0x0800/0x86DD│IP/TCP/Payload│
└──────┴──────┴───┬──┴────────┴───┬─────────┴──────────────┘
│ │
┌───▼───────────────▼────────────┐
│Ether_type: indict next hdr type│
└────────────────────────────────┘
```
Here, the proto number of CAQM, `tpid`, is a **variable** in kernel currently, `sysctl_caqm_tpid`. By default, we set it as 0x8200.
The reason for the `variable`, is that, 1) we have not get a 802 number from IANA; 2) we need the mutability to ensure evolvability for our protocol.
## CAQM header format
| Field Name | Location | Description |
|------------|------------|---------------------------------------------|
| TPID | Bit[31:16] | ConfigurableTPIDindicate this a 4 byteOPTAG |
| CC_TYPE | Bit[15:13] | Congestion Control type: 3'b000=CAQM |
| CC_INFO | Bit[12:0] | Congestion Information of CAQM |
`CC_INFO` has 4 fields:
1. Bit[10]=Enable (alias EN). 0:Disable; 1:Enable
2. Bit[9]=Congestion status (alias C). 0:Non-congestion;1:Congestion.
3. Bit[8]=Hint Valid (alias I), 0: Hint is 0; 1: Hint is valid
4. Bit[7:0]=Hint (alias HINT), carries the CAQM Hint value.
# Improvement by CAQM
Compared with Cubic, BBR, DCTCP; for a 10Gbps network (3 clients, 1 server, incast traffic by iperf program). We measure the switch queue value:
| Cubic | BBR | DCTCP | CAQM |
|-------|-------|-------|-------|
| 4MB | 700KB | 500KB | 200KB |
It decrease queue by 50%.
# Modification to the kernel
## Avoid affecting the original process of kernel
We have a kernel parameter, `sysctl_caqm_enable` which is `struct static_key_false`.
If it is false, the kernel process would be the same as original, without any procedure of our new algorithm, caqm.
We measure the performance of netperf after adding the code. `after, cli` means that the new code with caqm, as the cli; the srv is just from the master. **It can prove that we do not cause performance degradation for the kernel network stack.**
| Testcase | Configuration | after, cli | before, cli | Gap/% | before,srv | after,srv | Gap/% |
|-----------------|-------|----------|----------|-------|----------|----------|--------|
| TCP_STREAM | 1 | 7.78 | 7.40 | 5.16 | 7.91 | 7.89 | 0.22 |
| TCP_STREAM | 64 | 484.92 | 470.68 | 3.02 | 462.29 | 454.75 | 1.66 |
| TCP_STREAM | 512 | 1499.30 | 1386.03 | 8.17 | 1299.91 | 1400.66 | \-7.19 |
| TCP_STREAM | 65536 | 21841.53 | 21937.98 | \-0.44 | 22226.57 | 22685.24 | \-2.02 |
| UDP_STREAM | 1 | 2.38 | 2.30 | 3.32 | 1.99 | 1.99 | 0.00 |
| UDP_STREAM | 64 | 154.38 | 146.23 | 5.57 | 126.65 | 126.94 | \-0.23 |
| UDP_STREAM | 128 | 309.06 | 290.59 | 6.36 | 253.16 | 253.97 | \-0.32 |
| UDP_STREAM | 256 | 617.50 | 578.10 | 6.82 | 503.26 | 505.47 | \-0.44 |
| UDP_STREAM | 512 | 1212.11 | 1131.96 | 7.08 | 992.84 | 1002.01 | \-0.92 |
| UDP_STREAM | 32768 | 6644.95 | 6082.07 | 9.25 | 5507.71 | 5515.40 | \-0.14 |
| TCP_RR | \\ | 37068.73 | 34033.92 | 8.92 | 32469.74 | 38276.70 | \-15.17 |
| TCP_CRR | \\ | 9904.74 | 10341.43 | \-4.22 | 9631.93 | 10198.41 | \-5.55 |
| UDP_RR | \\ | 39505.25 | 41119.12 | \-3.92 | 37712.64 | 40246.41 | \-6.30 |
We also test the codes with `packetDrill` and `ltp`, which both prove that we do not affect the original processing.
We also generate some *corrupt* packets to test availability.
## Modified codes
1. CAQM header is at the Ethernet level, to parse and deparse it, modify the `skbuff.h` to claim that we use `KABI_RESERVE(1)`. Besides, `dev.c` for the receiver, `eth.c` for the sender.
2. To enable the speedup features, we modify `gro.c` for **GRO**, `flow_dissector.c` for **RPS**, and `dev.c` for **GSO**.
3. Because CAQM is at the ethernet layer, the speedup design for IPv4 `hh_cache` in `ip_output.c`, which is to add ethernet header without add dst mac and src mac separately, can not be used. So as IPv6 design, we disable the cache when the packet to send is with caqm.
4. Modify the `tcp_input.c` and `tcp_output.c` to add the congestion control algorithm, `tcp_caqm`. It is because current `cc_ops` is not enough for ours which is with a new signal.
Issue:https://gitee.com/openeuler/kernel/issues/IAZBRK
Link:https://gitee.com/openeuler/kernel/pulls/12343
Reviewed-by:
Liu Chao <liuchao173@huawei.com>
Reviewed-by:
Yue Haibing <yuehaibing@huawei.com>
Signed-off-by:
Zhang Peng <zhangpeng362@huawei.com>