一个常数时间双并行Montgomery算法的简单说明

考虑使用 P+Q \leftarrow xADD(P, Q, P-Q)[2]P \leftarrow xDBL(P) 实现 [m]P + [n]Q 的常数时间双并行计算。

我们可以将 mmnn 分解成二进制形式:

[m]P+[n]Q=[i=0k1b0,i2i]P+[i=0k1b1,i2i]Q[m]P + [n]Q = \left[ \sum\limits_{i=0}^{k-1} b_{0,i} 2^i \right] P + \left[ \sum\limits_{i=0}^{k-1} b_{1,i} 2^i \right] Q

根据二分法的经典思路,我们可以写出递归式:

R0(i)=2R0(i+1)+b0,iP+b1,iQR_0^{(i)} = 2 R_0^{(i+1)} + b_{0,i} P + b_{1,i} Q

其中,初始状态 R0(k)=OR_0^{(k)} = \mathcal{O}R0(i)=[j=ik1b0,j2ji]P+[j=ik1b1,j2ji]QR_0^{(i)} = \left[\sum\limits_{j=i}^{k-1} b_{0, j} 2^{j-i} \right] P + \left[ \sum\limits_{j=i}^{k-1} b_{1, j} 2^{j-i} \right] Q

不过值得注意的是:这种基础算法不仅容易受到侧信道攻击(执行时间取决于比特位是 0 还是 1),而且无法使用高效的蒙哥马利 xADD,因为 xADD 要求在计算 A+BA+B 时,必须已知 ABA-B 的值。

这时我们将偏差记成一个预存的常量 D(i)=b0,iP+b1,iQD^{(i)} = b_{0,i} P + b_{1,i} Q,这样就有:

R0(i)=2R0(i+1)+D(i)xADD(R0(i+1),R0(i+1)+D(i),D(i))R_0^{(i)} = 2 R_0^{(i+1)} + D^{(i)} \leftarrow \texttt{xADD}(R_0^{(i+1)}, R_0^{(i+1)} + D^{(i)}, D^{(i)})

R(i)=R0(i)+D(i)R_*^{(i)} = R_0^{(i)} + D^{(i)}

所有 R(i)R_*^{(i)} 可能的值就恰好构成了一个差分矩形:

+P +P +Q +Q +(P+Q)
$R_0^{(i)}$
$R_0^{(i)} + P$
$R_0^{(i)} + Q$
$R_0^{(i)} + P + Q$

但是问题来了:我们需要预先计算出 D(i)D^{(i)} 的值,而 D(i)D^{(i)} 的值又依赖于 b0,ib_{0,i}b1,ib_{1,i} 的值,这是变化的。不过我们注意到,他们的变化范围是有限的,于是我们可以利用打表的方式,即预先计算出 D(i)D^{(i)} 的所有可能取值。

PPQQ 的值我们总是清楚的,D(i)D^{(i)} 的值也只有 4 种可能:{0,P,Q,P+Q}\{0, P, Q, P+Q\}。且根据 xADD\texttt{xADD} 的方法,当我们知道 {P,Q,P+Q,PQ}\{P, Q, P+Q, P-Q\} 中任意三个点的值时,我们总能构造出第四个值。

如果我们允许用四个寄存器来存储上述差分矩形的基点,那么算法会变得十分清楚。

设:

{R0(i)=R0(i)R1(i)=R0(i)+PR2(i)=R0(i)+QR3(i)=R0(i)+P+Q\begin{cases} R_0^{(i)} = R_0^{(i)} \\ R_1^{(i)} = R_0^{(i)} + P \\ R_2^{(i)} = R_0^{(i)} + Q \\ R_3^{(i)} = R_0^{(i)} + P + Q \end{cases}

与此同时,将上述构造出来的四个点打包成一个整体的状态:

S(i)=(R0(i)R1(i)R2(i)R3(i))S^{(i)} = \begin{pmatrix} R_0^{(i)} \\ R_1^{(i)} \\ R_2^{(i)} \\ R_3^{(i)} \end{pmatrix}

根据递归式,我们可以得到如下运算表格:

b0,ib_{0, i} b1,ib_{1, i} R0(i1)R_0^{(i-1)} R1(i1)R_1^{(i-1)} R2(i1)R_2^{(i-1)} R3(i1)R_3^{(i-1)}
0 0 2R0(i)2 R_0^{(i)} R0(i)+R1(i)R_0^{(i)} + R_1^{(i)} R0(i)+R2(i)R_0^{(i)} + R_2^{(i)} R0(i)+R3(i)R_0^{(i)} + R_3^{(i)}
1 0 R1(i)+R0(i)R_1^{(i)} + R_0^{(i)} 2R1(i)2R_1^{(i)} R1(i)+R2(i)R_1^{(i)} + R_2^{(i)} R1(i)+R3(i)R_1^{(i)} + R_3^{(i)}
0 1 R2(i)+R0(i)R_2^{(i)} + R_0^{(i)} R2(i)+R1(i)R_2^{(i)} + R_1^{(i)} 2R2(i)2R_2^{(i)} R2(i)+R3(i)R_2^{(i)} + R_3^{(i)}
1 1 R3(i)+R0(i)R_3^{(i)} + R_0^{(i)} R3(i)+R1(i)R_3^{(i)} + R_1^{(i)} R3(i)+R2(i)R_3^{(i)} + R_2^{(i)} 2R3(i)2R_3^{(i)}

下面用函数式的方式重述算法。

注意到上述表格告诉我们,每一行都会恰好有一个 xDBL 和 三个 xADD,为了尽量做到常数时间,我们固定主要运算函数:

f(T0,T1,T2,T3,D1,D2,D3)={T0xDBL(T0)T1xADD(T0,T1,D1)T2xADD(T0,T2,D2)T3xADD(T0,T3,D3)=(2T0,T0+T1,T0+T2,T0+T3,)\begin{aligned} f(T_0, T_1, T_2, T_3, D_1, D_2, D_3) &= \begin{cases} T_0 \leftarrow \texttt{xDBL}(T_0)\\ T_1 \leftarrow \texttt{xADD}(T_0, T_1, D_1)\\ T_2 \leftarrow \texttt{xADD}(T_0, T_2, D_2)\\ T_3 \leftarrow \texttt{xADD}(T_0, T_3, D_3)\\ \end{cases}\\ &= (2T_0, T_0 + T_1, T_0 + T_2, T_0 + T_3, *) \end{aligned}

为了完整描述算法过程,我们定义置换函数:

{π0=idπ1:(R0,R1,R2,R3,)(R1,R0,R2,R3,)π2:(R0,R1,R2,R3,)(R2,R1,R0,R3,)π3:(R0,R1,R2,R3,)(R3,R1,R2,R0,)\begin{cases} \pi_0 = \text{id}\\ \pi_1: (R_0, R_1, R_2, R_3, *) \mapsto (R_1, R_0, R_2, R_3, *)\\ \pi_2: (R_0, R_1, R_2, R_3, *) \mapsto (R_2, R_1, R_0, R_3, *)\\ \pi_3: (R_0, R_1, R_2, R_3, *) \mapsto (R_3, R_1, R_2, R_0, *)\\ \end{cases}

不难发现上述置换事实上均为对换,即具有对合性

设初始状态(未开始迭代的状态)为 S(k)=(O,P,Q,P+Q)tS^{(k)} = (\mathcal{O}, P, Q, P + Q)^{t},标量位组合索引序列为 βi=b0,i+2b1,i\beta_i = b_{0, i} + 2b_{1, i}。下面考虑将 S(k)S(0)S^{(k)}\rightarrow S^{(0)} 进行全路径展开。

根据单步状态转移方程:

S(i)=πβi(f(πβi(S(i+1)),Δβi))S^{(i)} = \pi_{\beta_i} \left( f\left( \pi_{\beta_i}\left(S^{(i+1)}\right), \Delta_{\beta_i} \right) \right)

其中,Δβi=(D1,D2,D3)\Delta_{\beta_i} = (D_1, D_2, D_3) 是传给 xADD 的差值常量组。因为 Montgomery 曲线上的 xADD 仅依赖于差值的仿射 x 坐标(即 x(AB)=x(BA)x(A-B) = x(B-A),符号免疫),我们可以通过预计算得出所有可能的 Δβ\Delta_\beta

具体来说,Δβ\Delta_\beta 恰好是点集 {0,P,Q,P+Q}\{0, P, Q, P+Q\} 在经过 πβ\pi_\beta 置换后,第 1、2、3 项与第 0 项的差值:

  • β=0\beta=0Δ0=(P,Q,P+Q)\Delta_0 = (P, Q, P+Q)
  • β=1\beta=1Δ1=(P,PQ,Q)\Delta_1 = (P, P-Q, Q)
  • β=2\beta=2Δ2=(PQ,Q,P)\Delta_2 = (P-Q, Q, P)
  • β=3\beta=3Δ3=(Q,P,P+Q)\Delta_3 = (Q, P, P+Q)

这意味着我们还需要四个寄存器预存储预计算的四个差点PPQQP+QP+QPQP - Q

将全过程从 S(k)S^{(k)}S(0)S^{(0)} 逐层嵌套展开

S(k1)=πβk1fπβk1(S(k))S(k2)=πβk2fπβk2πβk1fπβk1(S(k))S(0)=πβ0fπβ0πβ1fπβ1πβk2πβk1fπβk1(S(k))\begin{aligned} S^{(k-1)} &= \pi_{\beta_{k-1}} \circ f \circ \pi_{\beta_{k-1}} \left( S^{(k)} \right) \\ S^{(k-2)} &= \pi_{\beta_{k-2}} \circ f \circ \pi_{\beta_{k-2}} \circ \pi_{\beta_{k-1}} \circ f \circ \pi_{\beta_{k-1}} \left( S^{(k)} \right) \\ &\vdots \\ S^{(0)} &= \pi_{\beta_{0}} \circ f \circ \pi_{\beta_{0}} \circ \pi_{\beta_{1}} \circ f \circ \pi_{\beta_{1}} \circ \dots \circ \pi_{\beta_{k-2}} \circ \pi_{\beta_{k-1}} \circ f \circ \pi_{\beta_{k-1}} \left( S^{(k)} \right) \end{aligned}

仔细观察 S(0)S^{(0)} 的表达式,发现除了最外层和最内层的置换,中间所有的置换全部两两相邻,形成了 πβiπβi+1\pi_{\beta_i} \circ \pi_{\beta_{i+1}} 的形式

之前定义的 π\pi 函数(例如 π1\pi_1 仅仅对换了 R0R_0R1R_1,而保持 R2,R3R_2, R_3 不变)在数学上只是一个局部对换,它们不满足封闭性。 比如 π1π2\pi_1 \circ \pi_2 会产生一个三阶循环轮换((0,1,2,3)(1,2,0,3)(0,1,2,3) \mapsto (1,2,0,3)),它既不是对合的,也无法用单一的 πx\pi_x 来表示。如果沿用这个定义,我们举步维艰,接下来必须将 π\pi 置换升级为全局异或置换

将基点状态 SS 的四个位置索引看作两位二进制数 x{00,01,10,11}2x \in \{00, 01, 10, 11\}_2重新定义置换群 Π={π0,π1,π2,π3}\Pi = \{\pi_0, \pi_1, \pi_2, \pi_3\},使其作用在状态 SS 上的规则为索引的按位异或

πj(Rx)=Rxj\pi_j(R_x) = R_{x \oplus j}

展开来看,四个置换的完整映射为:

{π0:(R0,R1,R2,R3)(R0,R1,R2,R3)π1:(R0,R1,R2,R3)(R1,R0,R3,R2)(不仅换 0,1,还换 2,3)π2:(R0,R1,R2,R3)(R2,R3,R0,R1)π3:(R0,R1,R2,R3)(R3,R2,R1,R0)\begin{cases} \pi_0: (R_0, R_1, R_2, R_3) \mapsto (R_0, R_1, R_2, R_3) \\ \pi_1: (R_0, R_1, R_2, R_3) \mapsto (R_1, R_0, R_3, R_2) \quad \text{(不仅换 0,1,还换 2,3)} \\ \pi_2: (R_0, R_1, R_2, R_3) \mapsto (R_2, R_3, R_0, R_1) \\ \pi_3: (R_0, R_1, R_2, R_3) \mapsto (R_3, R_2, R_1, R_0) \end{cases}

即使 π1\pi_1 同时交换了 R2R_2R3R_3,经过固定的 ff 函数计算后,再用 π1\pi_1 换回来,结果依然完美符合上述运算表格(因为 ff 中的加法操作是关于 T0T_0 的,且差值 Δ\Delta 也经过了对应的预排列)。值得一提的是,升级为异或置换后,群 Π\Pi 同构于 Klein 四元群 V4V_4,满足复合律:

πaπb=πab\pi_a \circ \pi_b = \pi_{a \oplus b}

定义索引序列 β\beta布尔差分导数序列 rr

ri=βiβiβi+1i{0,1,,k2}r_i = \frac{\partial \beta}{\partial i} \equiv \beta_i \oplus \beta_{i+1} \quad i \in \{0, 1, \dots, k-2\}

:为了形式统一,我们可以假定一个虚拟的初始前置状态 βk=0\beta_k = 0,使得 rk1=βk10=βk1r_{k-1} = \beta_{k-1} \oplus 0 = \beta_{k-1}

利用 Klein 四元群的复合律 πaπb=πab\pi_a \circ \pi_b = \pi_{a \oplus b},我们将所有相邻的置换合并:

πβiπβi+1=πβiβi+1=πri\pi_{\beta_i} \circ \pi_{\beta_{i+1}} = \pi_{\beta_i \oplus \beta_{i+1}} = \pi_{r_i}

rir_i 代入展开式中,就有:

S(0)=πβ0fπr0fπr1fπrk2fπrk1(S(k))S^{(0)} = \pi_{\beta_0} \circ f \circ \pi_{r_0} \circ f \circ \pi_{r_1} \circ f \circ \dots \circ \pi_{r_{k-2}} \circ f \circ \pi_{r_{k-1}} \left( S^{(k)} \right)

这样构造出来的常数时间算法,实际上是在执行这样一条流水线:

状态装载πdk1计算fπdk2计算fπd0计算fπβ0还原目标结果\textbf{状态装载} \xrightarrow{\pi_{d_{k-1}}} \textbf{计算} f \xrightarrow{\pi_{d_{k-2}}} \textbf{计算} f \cdots \xrightarrow{\pi_{d_0}} \textbf{计算} f \xrightarrow{\pi_{\beta_0}} \textbf{还原目标结果}

这样的方法只关注当前步和上一步的状态差异,将置换次数减半,同时差分导数只会最多泄露标量位之间的相对消息,具有天然的抗测信道攻击的能力。

接下来就需要我们就按照全局异或置换 πj(Rx)=Rxj\pi_j(R_x) = R_{x \oplus j} 继续对差分选取进行推导。

在每一步迭代中,ff 函数执行的核心操作是:

TxxADD(T0,Tx,Dx)for x{1,2,3}T_x \leftarrow \texttt{xADD}(T_0, T_x, D_x) \quad \text{for } x \in \{1, 2, 3\}

由于状态 SS 经过了 πβ\pi_\beta 的置换,我们有 Tx=SβxT_x = S_{\beta \oplus x}T0=SβT_0 = S_\beta。因此,ff 函数所需要的差值必然是:

Dx=TxT0=SβxSβD_x = T_x - T_0 = S_{\beta \oplus x} - S_\beta

我们将四个基点用二进制索引 y=y1y0{00,01,10,11}2y = y_1 y_0 \in \{00, 01, 10, 11\}_2 统一表示:

Sy=y0P+y1QS_y = y_0 P + y_1 Q

代入差值公式:

Dx=((βx)0P+(βx)1Q)(β0P+β1Q)D_x = ((\beta \oplus x)_0 P + (\beta \oplus x)_1 Q) - (\beta_0 P + \beta_1 Q)

Dx=((β0x0)β0)P+((β1x1)β1)QD_x = ((\beta_0 \oplus x_0) - \beta_0) P + ((\beta_1 \oplus x_1) - \beta_1) Q

利用布尔代数的一个小技巧 (ba)b=(1)ba(b \oplus a) - b = (-1)^b a,上式可以直接化简为:

Dx=(1)β0x0P+(1)β1x1QD_x = (-1)^{\beta_0} x_0 P + (-1)^{\beta_1} x_1 Q

由于 Montgomery 曲线上的 xADD\texttt{xADD} 具有符号免疫性(仅依赖仿射 xx 坐标,即 ABBAA-B \equiv B-A),我们在传入差点时可以取绝对值。

下面分别来看 x=1,2,3x=1, 2, 3 时的 DxD_x

  • 对于 T1T_1 (x=1    x0=1,x1=0x=1 \implies x_0=1, x_1=0):D1=(1)β0PPD_1 = (-1)^{\beta_0} P \equiv P (永远是 PP,与 β\beta 无关)
  • 对于 T2T_2 (x=2    x0=0,x1=1x=2 \implies x_0=0, x_1=1):D2=(1)β1QQD_2 = (-1)^{\beta_1} Q \equiv Q (永远是 QQ,与 β\beta 无关)
  • 对于 T3T_3 (x=3    x0=1,x1=1x=3 \implies x_0=1, x_1=1):D3=(1)β0P+(1)β1QD_3 = (-1)^{\beta_0} P + (-1)^{\beta_1} Q
    • 如果 β0=β1\beta_0 = \beta_1(即 β{0,3}\beta \in \{0, 3\}),D3=±(P+Q)P+QD_3 = \pm(P+Q) \equiv P+Q
    • 如果 β0β1\beta_0 \neq \beta_1(即 β{1,2}\beta \in \{1, 2\}),D3=±(PQ)PQD_3 = \pm(P-Q) \equiv P-Q

综上所述:在全局异或置换下,真正的差分表 Δβ\Delta_\beta 应该是:

  • Δ0=(P,Q,P+Q)\Delta_0 = (P, Q, P+Q)
  • Δ1=(P,Q,PQ)\Delta_1 = (P, Q, P-Q)
  • Δ2=(P,Q,PQ)\Delta_2 = (P, Q, P-Q)
  • Δ3=(P,Q,P+Q)\Delta_3 = (P, Q, P+Q)

通过上述推导可以发现,我们只需要让 D1D_1D2D_2 是绝对的静态常量 PPQQ,然后动态传递 T3T_3 的差值(在 P+QP+QPQP-Q 之间横跳)。下面用 F1,F2F_1, F_2 分别存储 P+QP+QPQP-Q

定义状态变量 ci=b0,ib1,ic_i = b_{0,i} \oplus b_{1,i},于是不难讨论得到:

  • ci=0c_i = 0 时,我们需要 T3T_3 的差值为 P+QP+Q
  • ci=1c_i = 1 时,我们需要 T3T_3 的差值为 PQP-Q

据此,给出伪代码如下:

\begin{algorithm}
\caption{Optimized Constant-Time 2-D Montgomery Ladder (4 Registers)}
\begin{algorithmic}
\Require $k$-bit scalars $m = \sum b_{0,i}2^i, n = \sum b_{1,i}2^i$, Base points $P, Q, P+Q, P-Q$
\Ensure Return point $R = [m]P + [n]Q$

\State \Comment{Phase 1: Dynamic Difference Initialization}
\State $D_1 \gets P$
\State $D_2 \gets Q$
\State $F_1 \gets P+Q$
\State $F_2 \gets P-Q$
\State $c_{prev} \gets 0$ \Comment{Tracks the XOR state of scalar bits}

\State \Comment{Phase 2: Point Initialization}
\State $(R_0, R_1, R_2, R_3) \gets (\mathcal{O}, P, Q, P+Q)$
\State $\beta_{prev} \gets 0$

\State \Comment{Phase 3: Main Evaluation Loop}
\For{$i = k-1$ \textbf{downto} $0$}
    \State $\beta_i \gets b_{0,i} + 2 \cdot b_{1,i}$
    \State $r_i \gets \beta_i \oplus \beta_{prev}$ \Comment{Boolean differential derivative}

    \State \Comment{Apply $\pi_{r_i}$ using orthogonal Constant-Time Swaps}
    \State $v_0 \gets r_i \text{ AND } 1$
    \State $v_1 \gets (r_i \text{ AND } 2) \gg 1$
    \State $\texttt{CSWAP}(v_0, R_0, R_1)$
    \State $\texttt{CSWAP}(v_0, R_2, R_3)$
    \State $\texttt{CSWAP}(v_1, R_0, R_2)$
    \State $\texttt{CSWAP}(v_1, R_1, R_3)$

    \State \Comment{Dynamic Difference Loading for $T_3$}
    \State $c_i \gets b_{0,i} \oplus b_{1,i}$
    \State $swap\_diff \gets c_i \oplus c_{prev}$
    \State $\texttt{CSWAP}(swap\_diff, F_1, F_2)$

    \State \Comment{Execute fixed function $f$}
    \State $T_0 \gets \texttt{xDBL}(R_0)$
    \State $T_1 \gets \texttt{xADD}(R_0, R_1, D_1)$
    \State $T_2 \gets \texttt{xADD}(R_0, R_2, D_2)$
    \State $T_3 \gets \texttt{xADD}(R_0, R_3, F_1)$

    \State $(R_0, R_1, R_2, R_3) \gets (T_0, T_1, T_2, T_3)$
    \State $\beta_{prev} \gets \beta_i$
    \State $c_{prev} \gets c_i$
\EndFor

\State \Comment{Phase 4: Final Restore Permutation $\pi_{\beta_0}$}
\State $v_0 \gets \beta_{prev} \text{ AND } 1$
\State $v_1 \gets (\beta_{prev} \text{ AND } 2) \gg 1$
\State $\texttt{CSWAP}(v_0, R_0, R_1)$
\State $\texttt{CSWAP}(v_0, R_2, R_3)$
\State $\texttt{CSWAP}(v_1, R_0, R_2)$
\State $\texttt{CSWAP}(v_1, R_1, R_3)$
\State \Return $R_0$
\end{algorithmic}
\end{algorithm}

更进一步地,是否可以考虑用三个寄存器存储基点(R0(i)R_0^{(i)}R1(i)R_1^{(i)}R2(i)R_2^{(i)})达到目的呢?


设:

{R1(i)=R0(i)+D(i)R2(i)=R0(i)+F(i)\begin{cases} R_1^{(i)} = R_0^{(i)} + D^{(i)} \\ R_2^{(i)} = R_0^{(i)} + F^{(i)} \end{cases}

记状态:

S(i)=(R0(i)R1(i)R2(i))S^{(i)} = \begin{pmatrix} R_0^{(i)} \\ R_1^{(i)} \\ R_2^{(i)} \end{pmatrix}

设:

K(i)=[j=ik1b0,j2ji]P+[j=ik1b1,j2ji]QK^{(i)} = \left[\sum_{j=i}^{k-1} b_{0, j} 2^{j-i}\right]P + \left[\sum_{j=i}^{k-1} b_{1, j} 2^{j-i}\right]Q

根据我们的想法,它严格满足以下不变量:

{R0(i)=K(i)+xiXi+yiYiR1(i)=K(i)+(1xi)Xi+yiYiR2(i)=K(i)+(1xi)Xi+(1yi)Yi\begin{cases} R_0^{(i)} = K^{(i)}+ x_i X_i + y_i Y_i \\ R_1^{(i)} = K^{(i)}+ (1 - x_i) X_i + y_i Y_i \\ R_2^{(i)} = K^{(i)}+ (1 - x_i) X_i + (1 - y_i) Y_i \end{cases}

其中 (Xi,Yi)(X_i, Y_i) 为当前的标架,即此处的 D(i)D^{(i)}F(i)F^{(i)}(xi,yi)(x_i, y_i) 为对应的标量位。

要从 S(i)S^{(i)} 推导到 S(i1)S^{(i-1)},事实上需要计算:

Ki1=2Ki+xi1Xi1+yi1Yi1K_{i-1} = 2K_i + x_{i-1} X_{i-1} + y_{i-1} Y_{i-1}

此时对应有:

{R0(i1)=K(i1)+xi1Xi+yi1Yi=2Ki+2xi1Xi1+2yi1Yi1R1(i1)=K(i1)+(1xi1)Xi+yi1Yi=2Ki+Xi1+2yi1Yi1R2(i1)=K(i1)+(1xi1)Xi+(1yi1)Yi=2Ki+Xi1+Yi1\begin{cases} R_0^{(i-1)} = K^{(i-1)}+ x_{i-1} X_i + y_{i-1} Y_i = 2K_i + 2x_{i-1} X_{i-1} + 2y_{i-1} Y_{i-1} \\ R_1^{(i-1)} = K^{(i-1)}+ (1 - x_{i-1}) X_i + y_{i-1} Y_i = 2K_i + X_{i-1} + 2y_{i-1} Y_{i-1} \\ R_2^{(i-1)} = K^{(i-1)}+ (1 - x_{i-1}) X_i + (1 - y_{i-1}) Y_i = 2K_i + X_{i-1} + Y_{i-1} \end{cases}

直接推导不容易看出一些关系,我们根据前面的思路,同样引入布尔差分倒数 r2ir_{2i}r2i+1r_{2i + 1}

  • r2i=0r_{2i}=0r2i+1=0r_{2i+1}=0:此时标架不发生翻转,即 (Xi1,Yi1)=(Xi,Yi)\left(X_{i-1},Y_{i-1}\right)=\left(X_{i},Y_{i}\right) ,且对应位也不发生翻转,即 (xi1,yi1)=(xi,yi)\left(x_{i-1},y_{i-1}\right)=\left(x_{i},y_{i}\right) 。代入目标公式有:
    • R0(i1)=2Ki+2xiXi+2yiYi=2(Ki+xiXi+yiYi)2R0(i)R_{0}^{(i-1)}=2K_{i}+2x_{i}X_{i}+2y_{i}Y_{i}=2\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)\leftarrow 2R_{0}^{(i)}
    • R1(i1)=2Ki+Xi+2yiYi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+yiYi)R0(i)+R1(i)R_{1}^{(i-1)}=2K_{i}+X_{i}+2y_{i}Y_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{1}^{(i)}
    • R2(i1)=2Ki+Xi+Yi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R0(i)+R2(i)R_{2}^{(i-1)}=2K_{i}+X_{i}+Y_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{2}^{(i)}
  • r2i=1r_{2i}=1r2i+1=0r_{2i+1}=0:此时标架不发生翻转,即 (Xi1,Yi1)=(Xi,Yi)\left(X_{i-1},Y_{i-1}\right)=\left(X_{i},Y_{i}\right) ,但主轴位发生翻转,即 (xi1,yi1)=(1xi,yi)\left(x_{i-1},y_{i-1}\right)=\left(1-x_{i},y_{i}\right) 。代入目标公式有:
    • R0(i1)=2Ki+2(1xi)Xi+2yiYi=2(Ki+(1xi)Xi+yiYi)2R1(i)R_{0}^{(i-1)}=2K_{i}+2\left(1-x_{i}\right)X_{i}+2y_{i}Y_{i}=2\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)\leftarrow 2R_{1}^{(i)}
    • R1(i1)=2Ki+Xi+2yiYi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+yiYi)R0(i)+R1(i)R_{1}^{(i-1)}=2K_{i}+X_{i}+2y_{i}Y_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{1}^{(i)}
    • R2(i1)=2Ki+Xi+Yi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R0(i)+R2(i)R_{2}^{(i-1)}=2K_{i}+X_{i}+Y_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{2}^{(i)}
  • r2i=0r_{2i}=0r2i+1=1r_{2i+1}=1:此时标架发生翻转,即 (Xi1,Yi1)=(Yi,Xi)\left(X_{i-1},Y_{i-1}\right)=\left(Y_{i},X_{i}\right) 。空间扭转导致新标架下的坐标系映射变为 (xi1,yi1)=(yi,1xi)\left(x_{i-1},y_{i-1}\right)=\left(y_{i},1-x_{i}\right) 。代入目标公式有:
    • R0(i1)=2Ki+2yiYi+2(1xi)Xi=2(Ki+(1xi)Xi+yiYi)2R1(i)R_{0}^{(i-1)}=2K_{i}+2y_{i}Y_{i}+2\left(1-x_{i}\right)X_{i}=2\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)\leftarrow 2R_{1}^{(i)}
    • R1(i1)=2Ki+Yi+2(1xi)Xi=(Ki+(1xi)Xi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R1(i)+R2(i)R_{1}^{(i-1)}=2K_{i}+Y_{i}+2\left(1-x_{i}\right)X_{i}=\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{1}^{(i)}+R_{2}^{(i)}
    • R2(i1)=2Ki+Yi+Xi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R0(i)+R2(i)R_{2}^{(i-1)}=2K_{i}+Y_{i}+X_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{2}^{(i)}
  • r2i=1r_{2i}=1r2i+1=1r_{2i+1}=1:此时标架发生翻转,即 (Xi1,Yi1)=(Yi,Xi)\left(X_{i-1},Y_{i-1}\right)=\left(Y_{i},X_{i}\right) 。标架扭转再加上沿轴步进,使得新标架下的坐标系映射变为 (xi1,yi1)=(1yi,1xi)\left(x_{i-1},y_{i-1}\right)=\left(1-y_{i},1-x_{i}\right) 。代入目标公式有:
    • R0(i1)=2Ki+2(1yi)Yi+2(1xi)Xi=2(Ki+(1xi)Xi+(1yi)Yi)2R2(i)R_{0}^{(i-1)}=2K_{i}+2\left(1-y_{i}\right)Y_{i}+2\left(1-x_{i}\right)X_{i}=2\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow 2R_{2}^{(i)}
    • R1(i1)=2Ki+Yi+2(1xi)Xi=(Ki+(1xi)Xi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R1(i)+R2(i)R_{1}^{(i-1)}=2K_{i}+Y_{i}+2\left(1-x_{i}\right)X_{i}=\left(K_{i}+\left(1-x_{i}\right)X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{1}^{(i)}+R_{2}^{(i)}
    • R2(i1)=2Ki+Yi+Xi=(Ki+xiXi+yiYi)+(Ki+(1xi)Xi+(1yi)Yi)R0(i)+R2(i)R_{2}^{(i-1)}=2K_{i}+Y_{i}+X_{i}=\left(K_{i}+x_{i}X_{i}+y_{i}Y_{i}\right)+\left(K_{i}+\left(1-x_{i}\right)X_{i}+\left(1-y_{i}\right)Y_{i}\right)\leftarrow R_{0}^{(i)}+R_{2}^{(i)}
xDBL->xADD->xADD 直观步进关系 R₀ R₁ R₂ xDBL xADD xADD 2R₀ R₀+R₁ 2R₁ R₀+R₂ R₁+R₂ 2R₂ 2R₀R₀+R₁R₀+R₂ 2R₁R₀+R₁R₀+R₂ 2R₁R₁+R₂R₀+R₂ 2R₂R₁+R₂R₀+R₂

虽然我们解决了主要状态寄存器的递进关系(递推关系),我们还需要进一步时刻保证底层 xADD 抓取到的常数点值绝对正确,我们必须在内存中维护一个完整的四差分常量池主轴差分 (D1,D2)(D_1, D_2) 以及对角线差分 (F1,F2)(F_1, F_2)。在初始状态下,它们分别对应 P,QP, QP+Q,PQP+Q, P-Q。(这是按照我们心情不妨设定的)

主轴差分 DD 的交换规律非常直观(r2i+1=1r_{2i+1} = 1X,YX, Y 互换)。
对于对角线差分 FF ,下面计算新一代目标状态的主对角线差值:

F1(i1)R2(i1)R0(i1)F_1^{(i-1)} \equiv R_2^{(i-1)} - R_0^{(i-1)}

代入展开式:

R2(i1)R0(i1)=(2Ki+Xi1+Yi1)(2Ki+2xi1Xi1+2yi1Yi1)R_2^{(i-1)} - R_0^{(i-1)} = (2K_i + X_{i-1} + Y_{i-1}) - (2K_i + 2x_{i-1}X_{i-1} + 2y_{i-1}Y_{i-1})

即得:

F1(i1)(12xi1)Xi1+(12yi1)Yi1F_1^{(i-1)} \equiv (1 - 2x_{i-1})X_{i-1} + (1 - 2y_{i-1})Y_{i-1}

注意到,因为 x,y{0,1}x, y \in \{0, 1\},所以系数 (12xi1)(1 - 2x_{i-1})(12yi1)(1 - 2y_{i-1}) 的取值严格限制在 {+1,1}\{+1, -1\}注意到对于单个点, AAA \equiv -A ,我们发现:

  • xi1=yi1x_{i-1} = y_{i-1}F1(i1)±(Xi1+Yi1)Xi1+Yi1=F1(i)F_1^{(i-1)} \equiv \pm(X_{i-1} + Y_{i-1}) \equiv X_{i-1} + Y_{i-1} = F_1^{(i)}
  • xi1yi1x_{i-1} \neq y_{i-1}F1(i1)±(Xi1Yi1)Xi1Yi1=F2(i)F_1^{(i-1)} \equiv \pm(X_{i-1} - Y_{i-1}) \equiv X_{i-1} - Y_{i-1} = F_2^{(i)}
    当且仅当 r2ir2i+1=1r_{2i} \oplus r_{2i+1} = 1 时,F1(i1)F_1^{(i-1)} 必须装载上一轮的 F2(i)F_2^{(i)}

至此我们就可以写出伪代码了:

\begin{algorithm}
\caption{Optimized Constant-Time 2-D Montgomery Ladder (3 Registers, 1D+2A)}
\begin{algorithmic}
\Require $k$-bit scalars $m, n$, Base points $P, Q$, and Difference $P-Q$
\Ensure Return point $R = [m]P + [n]Q$
\State \Comment{Phase 1: Scalar Recoding to Force Odd Parity}
\State $\epsilon_m \gets (m + 1) \pmod 2$
\State $\epsilon_n \gets (n + 1) \pmod 2$
\State $m' \gets m - \epsilon_m = \sum b'_{0,i}2^i$
\State $n' \gets n - \epsilon_n = \sum b'_{1,i}2^i$
\State $\sigma_0 \gets 0$
\State $\sigma_1 \gets 1$
\State \Comment{Phase 2: Point and Difference Initialization}
\State $(R_0, R_1) \gets (\mathcal{O}, P)$
\State $R_2 \gets \texttt{xADD}(P, Q, P-Q)$
\State $(D_1, D_2) \gets (P, Q)$
\State $(F_1, F_2) \gets (R_2, P-Q)$
\State \Comment{Phase 3: Main Evaluation Loop}
\For{$i = k-1$ \textbf{downto} $0$}
    \State $r_{2i} \gets b'_{\sigma_0, i} \oplus b'_{\sigma_0, i+1}$
    \State $r_{2i+1} \gets b'_{\sigma_1, i} \oplus b'_{\sigma_1, i+1}$
    \State $h \gets r_{2i} + r_{2i+1}$
    \State \Comment{Constant-Time Memory Splicing}
    \State $\texttt{CSWAP}(r_{2i+1}, D_1, D_2)$
    \State $\texttt{CSWAP}(h \pmod 2, F_1, F_2)$
    \State \Comment{Dynamic Hardware Routing}
    \State $idx\_dbl \gets h$
    \State $idx\_add \gets r_{2i+1}$
    \State \Comment{Execute 1D+2A Engine}
    \State $T_0 \gets \texttt{xDBL}(R_{idx\_dbl})$
    \State $T_1 \gets \texttt{xADD}(R_{idx\_add}, R_{idx\_add+1}, D_1)$
    \State $T_2 \gets \texttt{xADD}(R_0, R_2, F_1)$
    \State $(R_0, R_1, R_2) \gets (T_0, T_1, T_2)$
    \State \Comment{Update logical base pointers}
    \State $\texttt{CSWAP}(r_{2i+1}, \sigma_0, \sigma_1)$
\EndFor
\State \Comment{Phase 4: Zero-Cost Final Extraction}
\State $\omega \gets \epsilon_m + \epsilon_n$
\State \Return $R_{\omega}$
\end{algorithmic}
\end{algorithm}

而在 SQIsign 文档中的 LadderBiscalar 算法(Algorithm 8.8)与前述算法逻辑是完全一致的,但该代码笔者认为存在一定问题,修正后的伪代码如下:

\begin{algorithm}
\caption{LadderBiscalar($P, Q, P - Q, (A_{24} : C_{24}), m, n$)}
\begin{algorithmic}
\Require Projective points $P = (X_P : Z_P)$, $Q = (X_Q : Z_Q)$, $P - Q = (X_{P-Q} : Z_{P-Q})$, the Montgomery coefficient $(A_{24} : C_{24})$ of the curve $E$, and positive scalars $m, n$ with binary representations $m = (m_{k-1}, \ldots, m_0)_2$ and $n = (n_{k-1}, \ldots, n_0)_2$, resp.
\Ensure The projective point $[m]P + [n]Q = (X_{[m]P+[n]Q} : Z_{[m]P+[n]Q})$.
\State \textbf{Recoding stage:}
\If{$m$ is even \textbf{and} $n$ is odd}
    \State $\sigma = (\sigma_0, \sigma_1) \gets (1, 0)$
\Else
    \State $\sigma = (\sigma_0, \sigma_1) \gets (0, 1)$
\EndIf
\State $m' \gets m$, $n' \gets n$
\If{$m$ is even}
    \State $m' \gets m' - 1$
\EndIf
\If{$n$ is even}
    \State $n' \gets n' - 1$
\EndIf
\State Set $b = (b_0, b_1)$, where $b_0 \gets (0, m'_{k-1}, \ldots, m'_0)$ and $b_1 \gets (0, n'_{k-1}, \ldots, n'_0)$
\For{$i$ \textbf{from} $0$ \textbf{up to} $k - 1$}
    \State $r_{2i} \gets b_{\sigma_0, i} \oplus b_{\sigma_0, i+1}$
    \State $r_{2i+1} \gets b_{\sigma_1, i} \oplus b_{\sigma_1, i+1}$
    \If{$r_{2i+1} = 1$}
        \State $t \gets \sigma_0$
        \State $\sigma_0 \gets \sigma_1$
        \State $\sigma_1 \gets t$
    \EndIf
\EndFor
\State \textbf{Evaluation stage:}
\State $R_0 \gets (1 : 0)$
\State $T = (T_0, T_1) \gets (P, Q)$
\State $R_1 \gets T_{\sigma_0}$
\State $R_2 \gets T_{\sigma_0+1 \pmod 2}$
\State $D_1 \gets R_1$
\State $D_2 \gets R_2$
\State $R_2 \gets \textcolor{blue}{\mathsf{xADD}}(R_1, R_2, P - Q)$
\State $F_1 \gets R_2$
\State $F_2 \gets P - Q$
\For{$i$ \textbf{from} $k - 1$ \textbf{down to} $0$}
    \State $h \gets r_{2i} + r_{2i+1}$
    \State $T_0 \gets R_{h \pmod 2}$
    \State $T = (T_0, T_1) \gets (T_0, R_2)$
    \State $T_0 \gets \textcolor{blue}{\mathsf{xDBL}}(T_{\lfloor h/2 \rfloor}, (A_{24} : C_{24}))$
    \State $T_1 \gets R_{r_{2i+1}}$
    \State $T_2 \gets R_{r_{2i+1}+1}$
    \If{$r_{2i+1} = 1$}
        \State $\mathit{TMP} \gets D_1$
        \State $D_1 \gets D_2$
        \State $D_2 \gets \mathit{TMP}$
    \EndIf
    \State $T_1 \gets \textcolor{blue}{\mathsf{xADD}}(T_1, T_2, D_1)$
    \State $T_2 \gets \textcolor{blue}{\mathsf{xADD}}(R_0, R_2, F_1)$
    \If{$h \pmod 2 = 1$}
        \State $\mathit{TMP} \gets F_1$
        \State $F_1 \gets F_2$
        \State $F_2 \gets \mathit{TMP}$
    \EndIf
    \State $R_0 \gets T_0$
    \State $R_1 \gets T_1$
    \State $R_2 \gets T_2$
\EndFor
\State $(X_{[m]P+[n]Q} : Z_{[m]P+[n]Q}) \gets R_{\textcolor{red}{(m \pmod 2) + (n \pmod 2)}}$
\State \Return $[m]P + [n]Q = (X_{[m]P+[n]Q} : Z_{[m]P+[n]Q})$
\end{algorithmic}
\end{algorithm}

LadderBiscalar Algo Monitor

🖱️ Scroll to Zoom | Drag to Pan
$m$: $n$:
Trace Log

上一篇  下一篇