P5435 基于值域预处理的快速 GCD 题解

KobeBeanBryantCox · 2025-03-05 15:32:23 · 题解

P5435 基于值域预处理的快速 GCD 题解

题目传送门。

现有题解在讲什么啊？大部分没有把底层逻辑讲清楚，更有甚者还是卡常 AC。~~（当然，并不是指全部题解。）~~

本题解讲述所有常用的求 gcd 的方法（不常用的不讲），且有效率对比。

题意

求多次 gcd。

1. 朴素 gcd

前置知识：辗转相除法。

int gcd(int a,int b){return !b?a:gcd(b,a%b);}

复杂度 O(\log\min\{a,b\})，能拿 50 分，记录。

事实上，algorithm 库中的 __gcd() 也是采用的辗转相除法，其在 VSCode 中的 stl_algo.h 文件下的源码如下：

template<typename _EuclideanRingElement>
  _GLIBCXX20_CONSTEXPR
  _EuclideanRingElement
  __gcd(_EuclideanRingElement __m, _EuclideanRingElement __n)
  {
    while (__n != 0)
{
  _EuclideanRingElement __t = __m % __n;
  __m = __n;
  __n = __t;
}
    return __m;
  }

区别是其采用循环处理。也能拿 50 分，记录。

2. Binary GCD / Stein 算法

这两个名字是指同一个东西。

前置知识：基本位运算，辗转相减法。

定理：

\gcd(\dfrac a 2,b) & [a\bmod 2=0][b\bmod2=1]\\ \gcd(a,\dfrac b 2) & [a\bmod 2=1][b\bmod2=0]\\ 2\times\gcd(\dfrac a 2,\dfrac b 2) & [a\bmod 2=0][b\bmod2=0]\\ \gcd(|a-b|,\min(a,b)) & [a\bmod 2=1][b\bmod2=1] \end{matrix}\right.

考虑证明：

先证明第三个。事实上 \gcd(ka,kb)=k\times\gcd(a,b) 也是对的，考虑证明。
- 令 d=\gcd(a,b)，则 a=x_1d，b=x_2d，则有 \gcd(x_1,x_2)=1（否则 d 就不是最大的公因数了）。
- 则 \gcd(ka,kb)=\gcd(kdx_1,kdx_2)=kd=k\times \gcd(a,b)。第三个证毕。
证明前两个，这里讨论第一个，第二个同理。
- 令 d=\gcd(a,b) ，则 a=2k_1d，b=k_2d，所以 \gcd(2k_1,k_2)=1，则有 \gcd(k_1,k_2)=1。
- 所以 \gcd(dk_1,dk_2)=d，即 \gcd(\dfrac a 2,b)=d=\gcd(a,b)。
证明最后一个。事实上 a,b 都是奇数的限制是不需要的。
- 根据辗转相减法，若 a>b 有 \gcd(a,b)=\gcd(a-b,b)，否则 \gcd(a,b)=\gcd(b,a)=\gcd(b-a,a)。所以证毕。

然后注意到第三个肯定只能在刚开始的时候操作，证明显然：根据表达式的奇偶性，易得，但凡出现了“a,b 都是偶数”不成立，那么之后永远不可能成立。

所以我们在刚开始的时候除以足够多的 2，后面就不需要管第三个了。

于是我们得到了这样的代码：

int gcd(int a,int b)
{
    int x=__builtin_ctz(a),y=__builtin_ctz(b);
    b>>=y;int z=min(x,y);
    while(a)
    {
        a>>=x;int tmp=b-a;
        b=min(a,b),a=abs(tmp),x=__builtin_ctz(tmp);
    }
    return b<<z;
}

其中 __builtin_ctz 功能是找出数字二进制位末尾 0 的个数，理论时间复杂度是 O(\log n)，总的时间复杂度是 O(\log\min\{a,b\})。

值得关注的是，__builtin_ctz 的底层逻辑主要依赖于处理器的硬件指令，而不是用代码实现，相当于处理器在配置的时候已经内置了此功能，因此找不到其源码，但使用它可以充分地利用硬件特性提高计算效率。

这样子是能卡过的，由于数据太水，甚至跑得比正解还快，但是是可以卡的，具体看文章末尾。记录。

事实上，c++17 中新增的 gcd() 函数也是用 Binary GCD 实现的，以下是 VSCode 中 numeric 库中的 gcd() 源码：

template<typename _Mn, typename _Nn>
  constexpr common_type_t<_Mn, _Nn>
  gcd(_Mn __m, _Nn __n) noexcept
  {
    static_assert(is_integral_v<_Mn> && is_integral_v<_Nn>,
      "std::gcd arguments must be integers");
    static_assert(_Mn(2) == 2 && _Nn(2) == 2,
      "std::gcd arguments must not be bool");
    using _Ct = common_type_t<_Mn, _Nn>;
    const _Ct __m2 = __detail::__abs_r<_Ct>(__m);
    const _Ct __n2 = __detail::__abs_r<_Ct>(__n);
    return __detail::__gcd<make_unsigned_t<_Ct>>(__m2, __n2);
  }

可以发现它取绝对值后调用了 numeric 中的 __gcd()，再找到其源码：

template<typename _Tp>
  constexpr _Tp
  __gcd(_Tp __m, _Tp __n)
  {
    static_assert(is_unsigned<_Tp>::value, "type must be unsigned");

    if (__m == 0)
return __n;
    if (__n == 0)
return __m;

    const int __i = std::__countr_zero(__m);
    __m >>= __i;
    const int __j = std::__countr_zero(__n);
    __n >>= __j;
    const int __k = __i < __j ? __i : __j; // min(i, j)

    while (true)
{
  if (__m > __n)
    {
      _Tp __tmp = __m;
      __m = __n;
      __n = __tmp;
    }

  __n -= __m;

  if (__n == 0)
    return __m << __k;

  __n >>= std::__countr_zero(__n);
}
  }

可以发现它用的就是 Binary GCD。注意到有一个 std::__countr_zero() 函数，我们翻到源码，在文件 bit 下：

template<typename _Tp>
  constexpr int
  __countr_zero(_Tp __x) noexcept
  {
    using __gnu_cxx::__int_traits;
    constexpr auto _Nd = __int_traits<_Tp>::__digits;

    if (__x == 0)
      return _Nd;

    constexpr auto _Nd_ull = __int_traits<unsigned long long>::__digits;
    constexpr auto _Nd_ul = __int_traits<unsigned long>::__digits;
    constexpr auto _Nd_u = __int_traits<unsigned>::__digits;

    if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_u)
return __builtin_ctz(__x);
    else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ul)
return __builtin_ctzl(__x);
    else if _GLIBCXX17_CONSTEXPR (_Nd <= _Nd_ull)
return __builtin_ctzll(__x);
    else // (_Nd > _Nd_ull)
{
  static_assert(_Nd <= (2 * _Nd_ull),
    "Maximum supported integer size is 128-bit");

  constexpr auto __max_ull = __int_traits<unsigned long long>::__max;
  unsigned long long __low = __x & __max_ull;
  if (__low != 0)
    return __builtin_ctzll(__low);
  unsigned long long __high = __x >> _Nd_ull;
  return __builtin_ctzll(__high) + _Nd_ull;
}
  }

注意到它调用的竟然也是 __builtin_ctz 和其 long long 形式。

用这个只能拿 70 分，估计是因为多做了一些判断导致的。记录。

3. 基于值域 O(V) 预处理的快速 O(1) GCD

这才是正解！！！

前置知识：集合，指数函数，线性筛。

参考了这一篇题解，并对其补充说明，若表述相同请见谅，如有侵权请联系删除相关表述。

一些定义

定义所有质数集合为 \mathbb P。
定义集合 \{a,b,c\} 是 n 的分解，当且仅当 a\times b\times c=n（注意，这个集合只有三个元素，下同）。
定义集合 \{a,b,c\} 是 n 的合法分解，当且仅当集合是其分解且满足 a,b,c\le\sqrt n 或 a,b,c\in\mathbb P 二者之一（这里我们强行定义 1\in\mathbb P）。

原理

定理 1：

值域中的所有 x 均有其合法分解。

考虑证明：不妨设 a\le b\le c\le x。若 c\not\in\mathbb P 且 \sqrt x<c\le x，则 c 可以分解为 \{d,e\}（d\le e），此时显然有 d\le\sqrt x。而 a\times b=\dfrac x c\le\sqrt x，此时存在 x 的分解 \{d,a\times b,e\}，此时继续上述分解 e 直到 e\not\in\mathbb P 且 \sqrt x<e\le x 不成立，可以发现一定存在不成立的时刻。

定理 2：

将 x 分解为 \{a,b,c\}，令 a_0=\gcd(a,y),a_1=\gcd(b,\dfrac y{a_0}),a_2=\gcd(c,\dfrac y{a_0a_1})，则有 \gcd(x,y)=a_0\times a_1\times a_2。

考虑证明：前面证明过了 \gcd(ka,kb)=k\times\gcd(a,b)，即 \gcd(p,q)=r\times gcd(\dfrac p r,\dfrac q r)[r|p][r|q]，分别代入 \left\{\begin{matrix} p_1=a\times b\times c,q_1=y,r_1=\gcd(a,q1)\\ p_2=b\times c,q_2=\dfrac{q_1}{r_1},r_2=\gcd(b,q_2)\\ p_3=c,q_3=\dfrac{q_2}{r_2},r_3=\gcd(c,q_3) \end{matrix}\right. 即可证明。

实现

1. 分解

上结论：对于 x\ge 2，找到 x 的最小质因子 p 以及 \dfrac x p 的合法分解 \{a,b,c\}，不妨设 a\le b\le c，则 x 的合法分解为 \{a\times p,b,c\}。

考虑证明 a\times p\le \sqrt x 或 a\times p\in\mathbb P：

当 p\le x^{\frac 1 4}，由于 a\le b\le c，则有 a\le(\frac x p)^{\frac 1 3}，现在要求出 a\times p 的最大值。
- 等价于求函数 f(p)=p^{\frac 2 3}x^{\frac 1 3} 最大值。
- 所以取 p=x^{\frac 1 4} 能取到最大，最大为 \sqrt x。
考虑 x^{\frac 1 4}<p\le \sqrt x 的情况：

所以可以用线性筛递推出来。

代码：

// fac 表示一个分解，isnp 表示是不是合数，p 是质数集，pre 是预处理的 gcd，V 和 v 都是值域
void init(const int v=V)
{
  fac[1]={1,1,1};
  for(int i=2;i<=v;i++)
  {
    if(!isnp[i])fac[i]={i,1,1},p[tot++]=i;
    for(int j=0;j<tot&&i*p[j]<=v;j++)
    {
      int t=i*p[j];isnp[t]=true;
      fac[t]=fac[i],*min_element(fac[t].begin(),fac[t].end())*=p[j];
      if(i%p[j]==0)break;
    }
  }
  TT=sqrt(v);
  for(int i=1;i<=TT;i++)pre[0][i]=pre[i][0]=i;
  for(int i=1;i<=TT;i++)
    for(int j=1;j<=i;j++)pre[i][j]=pre[j][i]=pre[j][i%j];
}

2. 查询

不讲，直接看代码：

int g(int x,int y){int t=x%y;if(t==0)return y;return y>TT?1:pre[y][t];}
int gcd(int a,int b)
{
  if(a==1||b==1)return 1;
  int a0=g(a,fac[b][0]),a1=g(a/=a0,fac[b][1]),a2=g(a/=a1,fac[b][2]);
  return a0*a1*a2;
}

所以就做完了，总代码：

namespace GCD
{
    const int V=1e6,T=1e3;
    int pre[T+5][T+5];array<int,3>fac[V+5];
    bool isnp[V+5];int p[V+5],tot=0;
    int TT;
    void init(const int v=V)
    {
        fac[1]={1,1,1};
        for(int i=2;i<=v;i++)
        {
            if(!isnp[i])fac[i]={i,1,1},p[tot++]=i;
            for(int j=0;j<tot&&i*p[j]<=v;j++)
            {
                int t=i*p[j];isnp[t]=true;
                fac[t]=fac[i],*min_element(fac[t].begin(),fac[t].end())*=p[j];
                if(i%p[j]==0)break;
            }
        }
        TT=sqrt(v);
        for(int i=1;i<=TT;i++)pre[0][i]=pre[i][0]=i;
        for(int i=1;i<=TT;i++)
            for(int j=1;j<=i;j++)pre[i][j]=pre[j][i]=pre[j][i%j];
    }
    int g(int x,int y){int t=x%y;if(t==0)return y;return y>TT?1:pre[y][t];}
    int gcd(int a,int b)
    {
        if(a==1||b==1)return 1;
        int a0=g(a,fac[b][0]),a1=g(a/=a0,fac[b][1]),a2=g(a/=a1,fac[b][2]);
        return a0*a1*a2;
    }
}

注意不要全开 long long，本题中会 MLE。记录。

以下的代码均在 c++17 加入快读快输，且除了 gcd 的求法之外的其余部分相同的情况下测试。

方法	理论时间复杂度	实际效率
基于值域预处理的快速 GCD	O(V+n^2)	5.05s
手写 binary gcd（也叫 Stein 算法）	O(n^2\log V)	4.26s
c++17 中 numeric 的 `gcd()`	O(n^2\log V)	6.75s TLE 70pts
algorithm 中的 `__gcd()`	O(n^2\log V)	9.07s TLE 50pts
手写 gcd	O(n^2\log V)	9.14s TLE 50pts

关于 Binary GCD 比正解快，纯粹是因为数据太水了。

我来给个数据，实测能把 Binary GCD 卡到 1.4s+，而正解还是稳定的 600ms。

生成器：

#include<bits/stdc++.h>
#define Code using
#define by namespace
#define wjb std
Code by wjb;
int main()
{
    freopen("in.txt","w",stdout);
    srand(time(0));
    int n=5000;
    cout<<n<<"\n";
    for(int i=1;i<=n;i++)cout<<(1<<(rand()%10+10))+rand()%2-1<<" ";
    cout<<"\n";
    for(int i=1;i<=n;i++)cout<<(1<<(rand()%10+10))+rand()%2-1<<" ";
    cout<<"\n";
    return 0;
}

后记

~~可恶的 \sout{\LaTeX} 好难打啊啊啊！~~

若讲述有误，请私信或评论指出。