# Certificate Auditing Tools

As part of the Mercury network metadata capture and analysis project
(https://github.com/cisco/mercury), we include standalone tools for auditing
X.509 certificates.

* `batch_gcd` detects whether a set of RSA moduli contains any pairs that share
  a common factor. This could happen, for example, if some of the moduli were
  generated by the same weak entropy source.  This input to this tool is either
  a list of hex integers, or a file of PEM certificates from which the tool will
  extract RSA public key moduli.
* `cert_analyze` takes a file of certificates (either base64 lines
  or PEM) and converts them to a JSON encoding for easy processing with tools
  like Python or `jq`.
* `tls_scanner` scans HTTPS server(s) for certificates.  It can write a
  PEM-formatted file of certificates, and a CSV-formatted index file that
  contains the hostnames and the SHA1 hash of the corresponding certificates.

Authors: Brandon Enright, Andrew Chi, David McGrew

## Contents

* [Installation](#installation)
* [Quick Start](#quick-start)
* [Capabilities and Limitations](#capabilities-and-limitations)
* [Example Workflow](#example-workflow)
* [Ethics](#ethics)
* [References](#references)

## Installation

The easiest way to try Batch GCD is by pulling or building a Docker image of the
`mercury` package.  We also provide instructions for building the mercury
package from source.

### Docker

The Docker image for `mercury` also includes the tools `batch_gcd`,
`cert_analyze`, and `tls_scanner`.  Note that depending on your Docker
configuration, the docker command may require sudo.

**Option 1.** Pull a pre-built Docker image from GitHub's container repository.
```
$ docker pull ghcr.io/cisco/mercury:latest
$ alias batch_gcd='docker run --rm -i --entrypoint /usr/local/bin/batch_gcd --volume .:/root ghcr.io/cisco/mercury:latest'
$ alias cert_analyze='docker run --rm -i --entrypoint /usr/local/bin/cert_analyze --volume .:/root ghcr.io/cisco/mercury:latest'
$ alias tls_scanner='docker run --rm -i --entrypoint /usr/local/bin/tls_scanner --volume .:/root ghcr.io/cisco/mercury:latest'
```

**Option 2.** Build the Docker image using the source code.
```
$ git clone https://github.com/cisco/mercury.git
$ cd mercury
$ docker build -t mercury:latest .
$ alias batch_gcd='docker run --rm -i --entrypoint /usr/local/bin/batch_gcd --volume .:/root mercury:latest'
$ alias cert_analyze='docker run --rm -i --entrypoint /usr/local/bin/cert_analyze --volume .:/root mercury:latest'
$ alias tls_scanner='docker run --rm -i --entrypoint /usr/local/bin/tls_scanner --volume .:/root mercury:latest'
```

For usage information, run each command with `--help`.  For example: `batch_gcd
--help`.

### Build from source

A modern Linux distribution is required in order to build `mercury` from source.
```
$ git clone https://github.com/cisco/mercury.git
```
Not all certificate tools (e.g., `batch_gcd`) are built by default, because they
have extra dependencies such as the GNU Multiple Precision Arithmetic Library
(GMP). To build them manually, run the following commands from the top-level
directory of the `mercury` source repository.
```
$ ./configure
$ make
$ make tls_scanner --dir=src
$ make batch_gcd --dir=src
$ make batch_gcd_test --dir=test   # run tests (optional)
```
The executables will be built within the `src` directory, e.g., `src/batch_gcd`.

The exact set of dependencies will vary according to Linux distribution. On
Debian 12, the following dependencies are sufficient for building the `mercury`
package.
```
$ sudo apt-get update
$ sudo apt-get install -y git build-essential \
      pkg-config wget jq tcpreplay iproute2 sudo debconf \
      zlib1g-dev libssl-dev libgmp-dev
```

## Quick Start

The `batch_gcd` command line tool takes a set of integers as input
and efficiently computes the greatest common divisor (GCD) of all pairs
of integers, then reports the non-trivial common factors.  Integers can
be read from standard input (with one hex integer per line), which is the default, or from a set
of RSA certificates in a single file in PEM (RFC 7468) format, with the `--cert-file` option.  Non-trivial factorizations
are written to stdout, followed by the groups of line numbers
whose moduli share a common factor.  If the option `--write-keys` is provided, then
the RSA private key of each factored keys is written out to a file in the "openssl"
RSA PRIVATE KEY format (which is a PEM encoding of the RSAPrivateKey structure defined
in RFC 3447).  Each private key is written to a separate file, using a filename with a
base that matches the input file and contains the line number of public key in the input,
with a suffix of `.rsapriv.pem`.  The corresponding certificate is also written out to a
file with the same base name and a suffix of `.cert.pem`.  Informational messages are written to stderr.

Using the standard input to factor integers looks like this:
```
$ echo -e "29bf7\n12a15\n17b6d" | batch_gcd
Running batch GCD on 3 moduli.
Parallelization: 12 threads
Found 3 weak moduli out of 3.
Computing pairwise GCD for 1 moduli, estimated time: O(3 * 1)
Further found co-factors for 1 weak moduli.
Vulnerable modulus on line 1: 29bf7 has factors 133 and 22d
Vulnerable modulus on line 2: 12a15 has factors 89 and 22d
Vulnerable modulus on line 3: 17b6d has factors 89 and 2c5
Reporting which lines, if any, share a common factor.
2,3;1,2
$ echo -e "29bf7\n12a15\n17b6d" | batch_gcd 2> /dev/null
Vulnerable modulus on line 1: 29bf7 has factors 133 and 22d
Vulnerable modulus on line 2: 12a15 has factors 89 and 22d
Vulnerable modulus on line 3: 17b6d has factors 89 and 2c5
2,3;1,2
```
Using the `--cert-file` and `--write-keys` options looks like this:
```
$ batch_gcd --cert-file combined.pem --write-keys
certs: 1623   duplicates: 1225   RAM needed: 101888
running batch GCD on 398 moduli, ignoring 1228 duplicate lines.
Parallelization: 6 threads
Found 385 weak moduli out of 398.
Computing pairwise GCD for 0 moduli, estimated time: O(385 * 0)
Further found co-factors for 0 weak moduli.
wrote out 385 RSA PRIVATE KEY (.rsapriv.pem) and CERTIFICATE (.cert.pem) files
Reporting which lines, if any, share a common factor.
757,1260,1430;342,527,622,832,982;122,621;323,671,1029,1452;137,631,682,1466;1258,1356;237,1141;178,1024;308,851;20,351;667,772;117,1479;375,1533;64,321,329;700,830;185,298;547,699,1257,1323;486,882;493,630;396,997;187,1301;541,665,1002,1357;674,811,833,1252,1342,1453;32,1127;42,109,250,377,432,544,628,691,925,995,1034,1076,1087,1172,1284,1527,1620;5,22;215,1035;211,529,1401;12,101,128,160,184,216,313,341,385,404,406,429,495,496,504,528,552,731,765,814,835,996,1005,1041,1078,1122,1133,1266,1281,1302,1355,1573,1622,1623;333,677;47,258,679,689;390,462;172,698;306,813;834,1592;247,1570;21,732;34,392,394,756,764;168,680,962,1299,1473;376,666;43,629,695,1616;742,743;373,1283;455,865;1305,1470,1556;108,257,411,915;862,1023;196,730,981,1130;7,166,303,1313;301,1052;98,124,212,292,370,412,464,467,693,734,823,864,923,1030,1471,1588,1611;176,1511;571,806,839;302,532,625,1613;686,1439;602,1597;928,1055;569,924;11,23,36,41,44,65,92,123,136,157,213,218,219,220,221,249,251,255,256,259,309,336,337,344,354,380,395,420,424,454,540,609,733,740,963,967,972,1007,1057,1060,1062,1121,1138,1145,1274,1276,1287,1297,1304,1314,1315,1343,1367,1477,1504,1509,1510,1568,1571,1586,1618;884,1557;926,1335;1,601;436,1312;933,1478;186,847;164,1363;741,970;335,352;1108,1364;175,502;305,355,419,927,975;886,1390;378,934,1558;1285,1289;1263,1286;503,1566;48,91,147,181,287,320,701,767,775,917,1051;195,1282;1554,1591;3,1400;159,173;453,521;460,604;1468,1480,1593;452,692,1140;410,976,1251,1614;397,437;288,837,868,1167;403,458,567,607,626,922,998,1175,1612;86,820;174,632,1450;763,777;356,463,739,773,805,1077,1288;687,761,1044,1454;530,929;345,1273

$ ls -v combined*pem
combined-line-1.cert.pem
combined-line-1.rsapriv.pem
combined-line-3.cert.pem
combined-line-3.rsapriv.pem
combined-line-5.cert.pem
combined-line-5.rsapriv.pem
combined-line-7.cert.pem
combined-line-7.rsapriv.pem
combined-line-11.cert.pem
...

```

## Capabilities and Limitations

This tool is designed for computing batch GCD on a very large set of integers,
using the product tree approach developed by [Daniel J. Bernstein
[1]](http://cr.yp.to/papers.html#smoothparts) and notably implemented by [Heninger
et. al
[2]](https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/heninger)
to analyze Internet scans of TLS certificates. Currently, `batch_gcd` can handle
up to roughly 67 million 2048-bit RSA moduli, given a machine with sufficient
memory of about 1 TB of RAM, and it automatically parallelizes the computation
across all CPU cores.  The current input size limit is due to the fact that the
batch GCD algorithm multiplies all of the input together into one large integer,
and the largest single integer that GMP can represent is 2^37 bits, or about 16
GB (details
[here](https://gmplib.org/list-archives/gmp-bugs/2009-July/001538.html) and
[here](https://gmplib.org/gmp6.0)).  If the input size exceeds this limit, the
tool aborts with an error message.

The tool will detect and report duplicate RSA moduli, but for the purposes of
batch GCD computation and reporting of common factors, it will only use the
first modulus in the set of duplicates.  It will also detect and report
pathological cases where one modulus divides another.

Usage:
```
$ batch_gcd --help
/usr/local/bin/batch_gcd usage:

    batch_gcd [OPTIONS]

This tool efficiently runs all-to-all GCD on RSA moduli.  If no arguments
are provided, read moduli from stdin.  The input must be exactly one hex
integer per line.  No blank lines or other characters are permitted.

Alternatively, the options below can be used to read a file of PEM-encoded
certificates instead of the raw moduli.

Informational progress messages are written to stderr.

OPTIONS:
   --cert-file <arg>   read certificates from file <arg>
   --write-keys        write out private keys to PEM file
   --help              print this help message and exit
```
Note that the host's current working directory (.) will be mounted as the
container's current working directory (/root).  This provides mercury some read
and write access to the host filesystem, such as a certificate file (`*.pem`) in
your current directory.

## Example Workflow

To check for weak keys in a set of certificates, use a pipeline of the tools
`cert_analyze`, [`jq`](https://stedolan.github.io/jq/), and `batch_gcd`.

1. Input: a file containing base64-encoded X.509 certificates, one per line.
2. `cert_analyze` converts the X.509 certificates to a JSON encoding for easy
   processing.
3. `jq` extracts the RSA modulus from the JSON.
4. `batch_gcd` computes all-pairs GCD and reports any factorizations (by line
   number and modulus).

Here is an example pipeline.

```
$ cat certs.b64 |
    cert_analyze |
    jq -rc '.subject_public_key_info.subject_public_key.modulus' |
    batch_gcd
```

Here is an example output in which 12 out of 13 input certificates contained
moduli which shared a common factor with another modulus.

```
Running batch GCD on 13 moduli.
Parallelization: 12 threads
Found 12 weak moduli out of 13.
Computing pairwise GCD for 10 moduli, estimated time: O(12 * 10)
Further found co-factors for 10 weak moduli.
Vulnerable modulus on line 1: 1e7e8ad4e9a9e9220606abe8c000eda0f107ceee2589670d92cee58f36b243092a651736cbff727dbb0ffdc15fb93726ed322db315666aaec8d718d75ff3bf6ebcad3bd79b023dac2005a8f959ccdb889c13b89661e6e2f2a92d3114748bebbb98b9b9d620ebd3686d3a152ca656631891cad253276d961a5f78401a35791fa9 has factors 38326f703d6d55184940485e16aeee14778b4cf36ebe05863863c4423e10a0f30d517b4b082cb3651e1cee7ff12c1f985d94e89ef3fba74a9314e05b5d153419 and 8ae9f0c710ed2a2c8885cad9f5757b8fb27cc95b7b89bf33ddce184822c1376cf99527e2862042dbb66313f44c4c47b6c0259e16f63f000194c4d5bbe3bb3a11
Vulnerable modulus on line 2: 1e3d8b322db4c6e0ad74b09278746658fffed7b6e2bf925ed07ac29e8bcd9be7addcdba1f0f8f744658c30abc53c07fdc29a1ab62ec95e619a9acca7d74e78f7fe7ef6dd51332c5df3415476a3a7bb1612094c81bc59c176e60a1807b66ab217061ec8e9b03da639e58f466d3b6e975ff6ff03387f6af46390a430fb486d921d has factors 38326f703d6d55184940485e16aeee14778b4cf36ebe05863863c4423e10a0f30d517b4b082cb3651e1cee7ff12c1f985d94e89ef3fba74a9314e05b5d153419 and 89c1d88b238c8a21eed66b977a8605cb72e6b69ac0cd007d63c41000e090f5c0b93aff6398269ea8fb3beb90fe785fef279d80f1fb04ad2d07cf22a87e6aaea5
Vulnerable modulus on line 4: 465b8136e852ce4f7b9b0ac7288a28e48d8c550168f1f7f2fefe50ba9e06a48b716a7db5e3226091d4d30fc437048ac881e9af0cc49e2246bbc5c903117adee00b670705004d69f1b674729d9241c79216a391c9790c1638369ce4686b01d2680baa6f9feb56ed47c7eed0f237f3997d674d7a97a193bc9a07a26b649c36d605 has factors 82bf88d173860b5084516f6f9be4c7e88920956d08b678b373190760b3bada34959d7bc631d332110c8b593022fb714dbfe64a8bcf599ebe57743c1a30d54be1 and 89c1d88b238c8a21eed66b977a8605cb72e6b69ac0cd007d63c41000e090f5c0b93aff6398269ea8fb3beb90fe785fef279d80f1fb04ad2d07cf22a87e6aaea5
Vulnerable modulus on line 5: 46f2bb0daab537a117a2e660f2f6febbf4040a15a23442f95c2f3ab69d586ca24802cb441308b8349dd4a82a018e0e61eb0d58f71050969ada441fb9825ccceb046bced8ad06e641fb802c315cbec4b58905e7fb5562bda93418bd62521dd01dc26c78bfd384e52aade80ff43fb06830cf82a4dfa2e74e7794f95d3f81b603f1 has factors 82bf88d173860b5084516f6f9be4c7e88920956d08b678b373190760b3bada34959d7bc631d332110c8b593022fb714dbfe64a8bcf599ebe57743c1a30d54be1 and 8ae9f0c710ed2a2c8885cad9f5757b8fb27cc95b7b89bf33ddce184822c1376cf99527e2862042dbb66313f44c4c47b6c0259e16f63f000194c4d5bbe3bb3a11
Vulnerable modulus on line 6: 811a9746b424a1c0ef0ceee0a31c3f3ddda5e5e5c770cd9a4bc18afc6951dc4513cc495fffbd0d5f37af5893f632960dd399183f6424ca396137d0b6475fcbd9bc12539af78f7275f77e4876fb049625abf9be0acad137a7b61ae42d0e872d8f6d2711702dbbb2b4c03982b971d6d05ee35d5353554d330cc3534929a624d8ef has factors 924f2f80458a840d2506995213e3b35a920334c68ee54936027cf03babadfe1e5ad0d6e1c78b8be50801e96f984d3224de1aa1cd318a9fe304f03e3e6ded3c73 and e1e533f46ed52aa7891d4cb8c543c7df5612841101cb3ef4fdf3501c98a64023968dc93c3fcbe8a5038bc3ffd9358dbf44b5a4e77e57d3e31d3824f513fc2e95
Vulnerable modulus on line 7: 265efca5d4be3a3a32e40d1849859ae399d7e50dcefc5ed45ac1b0afe9bde70ae998080854b602dd64688d41463b7f367b2ef07ae299ba376973ee558edd16a4e2dd2eeed798d910a13b974e6312b72807ee8e499db0ddbae0d52fd20a8669d6ba4fef474683b1f42f99dae0e9523de25d4d29f3139fddfa871c5246495a5441 has factors 43237429ca9deb90d17025d613995dfbf77f00ba2bd8b23fef9d97a535cb3ae60436fd5209170aa1ef8c8a4eb6b9c7405899494ac825f94fb4fd873776e7537b and 924f2f80458a840d2506995213e3b35a920334c68ee54936027cf03babadfe1e5ad0d6e1c78b8be50801e96f984d3224de1aa1cd318a9fe304f03e3e6ded3c73
Vulnerable modulus on line 8: 1244c3e99d517d452502fe41281e4217433f61c8aae94e02d1c92a437d6aaafce28054cabbb94108dbd19cff196fbd593b27b0c1161219b5d12e87a09fe1e2fb61ec36ae682236b871bb3f3ea193de205d3ad60b876fa0cb367ce91fabcbfbff4cd88691ac870c9d915dd731def87f37735de81155183a604e3ae38fb877294f has factors 41510eb2b4cffbc472aeca534df8f8e8c5aa0f553cde1e025a75610f10ae9db595b78f830ec239f9bfd2c2d00ae6508b4ddff8b02ae1ed7d896ec60805ed446b and 4799f8749293e39ee6af86764a84dae2e09eeb0ce218e21ad9638eeb9d2e2600211411a6509ec47fe8900adcfef49a037d560c8e2c2ed4227d4a2cda870307ad
Vulnerable modulus on line 9: 13790c6a303529cdf722e7b11a3ce8492d8d374edd0cdf9b1459617a6523c88a7d4caed55ea00d3e3afb530583a090ed5a5ba73de9ad9f6dbf9bfb09cff7fac4cddef7aa1af30dcb2d4345c3c81211e9dde9e97735f9c81f2979bff19ceac8a5f33d5f127ad47ecae4f49d911c121e04364d984663df3d276f797532a635b4d1 has factors 459f46db5baf96f8404aaf1a831f0db8aad7d93a4256e8156b2b70757a011d800c39962486ce3ba88788716e9ac581c37c140116d12e2e9abd56262a1a255635 and 4799f8749293e39ee6af86764a84dae2e09eeb0ce218e21ad9638eeb9d2e2600211411a6509ec47fe8900adcfef49a037d560c8e2c2ed4227d4a2cda870307ad
Vulnerable modulus on line 10: 3433499cb03fc8d47173942631c509ee48ac8f66c1470b2f44dce418d1a1dfba12c51f6494c707a2f65ab781fd4745752242bb276484fe90130990004fe241215e7bd0429d7f153bd706296e0672dd22ef622bdd01aacb7bd4a9d3844dbd2226214fcc3084514e943de95767ece05fae635cfae43ddb5314d12c1058e4cf8fdd has factors 4799f8749293e39ee6af86764a84dae2e09eeb0ce218e21ad9638eeb9d2e2600211411a6509ec47fe8900adcfef49a037d560c8e2c2ed4227d4a2cda870307ad and baa262e43120e54665c0cbcb3ead51b0a00819b4e91d4872d3fbe7724e03ab71bcaf8bb84c74dec96aadfcb18653a8f05393d666069923bc7b9b31363d6d6ef1
Vulnerable modulus on line 11: 11c37c626d7b698246aa3d0715433fdf70e3605bf9ba9527c75e7b49d2b0e1442d8a3ecd7861fceb055e5a7e335a308996c5a01f5bc30408c6ef11a63c808eb362b6d092299ed8d5df7adc8931f11c2312b88773398877a592503c3bd4c7f779f9f4e86f1e19c3b2fbfe1b574af6a7d08e33ffa4238cae10463fc172b0921c27 has factors 41510eb2b4cffbc472aeca534df8f8e8c5aa0f553cde1e025a75610f10ae9db595b78f830ec239f9bfd2c2d00ae6508b4ddff8b02ae1ed7d896ec60805ed446b and 459f46db5baf96f8404aaf1a831f0db8aad7d93a4256e8156b2b70757a011d800c39962486ce3ba88788716e9ac581c37c140116d12e2e9abd56262a1a255635
Vulnerable modulus on line 12: 2f9e533464cff15c387975dbd55f90e7e5fa0289b11eaebaac8c2b34862750c0d63f96153c121b3920348d5f333d32392c62e770f22644ec243e164149f324d8f471641349aed208e80ec4375197c97b6f439cf062768b24f712bc926fc5cefdaff8fe74e4fa51bc7a3c6eda9c4781351b9dcccd9e753285b521d8ff285262bb has factors 41510eb2b4cffbc472aeca534df8f8e8c5aa0f553cde1e025a75610f10ae9db595b78f830ec239f9bfd2c2d00ae6508b4ddff8b02ae1ed7d896ec60805ed446b and baa262e43120e54665c0cbcb3ead51b0a00819b4e91d4872d3fbe7724e03ab71bcaf8bb84c74dec96aadfcb18653a8f05393d666069923bc7b9b31363d6d6ef1
Vulnerable modulus on line 13: 32c1e32b3fc51c183f9060dfb63733966cc2649211fda1a567c38b42fad5b8bbc169c0134de8fde881b12ffdf89a35c4741e89749dceee1a446897353d932677466eaf0123550abd16418a5e1d6a7e33443a7eedbc100afa70edb736f1353a3d9a26bd9cbfa4119a384098952915def0c3d0507450e9f0a2dd1f607cfdc1ede5 has factors 459f46db5baf96f8404aaf1a831f0db8aad7d93a4256e8156b2b70757a011d800c39962486ce3ba88788716e9ac581c37c140116d12e2e9abd56262a1a255635 and baa262e43120e54665c0cbcb3ead51b0a00819b4e91d4872d3fbe7724e03ab71bcaf8bb84c74dec96aadfcb18653a8f05393d666069923bc7b9b31363d6d6ef1
Reporting which lines, if any, share a common factor.
1,2;8,11,12;9,11,13;8,9,10;4,5;2,4;1,5;6,7;10,12,13
```

More test data can be found in the `test/batch_gcd` directory of the `mercury`
[git repository](https://github.com/cisco/mercury/tree/main/test/batch_gcd).
For example, analyzing the following 100 certificates results in the
factorization of 4 of the RSA keys.  The `batch_gcd` tool can write out the
vulnerable certificates, along with their corresponding RSA private key files.
```
$ cp test/batch_gcd/test_cert_100.pem.bgcd-in certs100.pem
$ batch_gcd --cert-file certs100.pem --write-keys
certs: 100   duplicates: 0   RAM needed: 12800
running batch GCD on 100 moduli.
Parallelization: 6 threads
Found 4 weak moduli out of 100.
Computing pairwise GCD for 0 moduli, estimated time: O(4 * 0)
Further found co-factors for 0 weak moduli.
wrote out 4 RSA PRIVATE KEY (.rsapriv.pem) and CERTIFICATE (.cert.pem) files
Reporting which lines, if any, share a common factor.
77,99;17,26
$ ls *.pem
certs100-line-17.cert.pem    certs100-line-26.cert.pem    certs100-line-77.cert.pem    certs100-line-99.cert.pem    certs100.pem
certs100-line-17.rsapriv.pem certs100-line-26.rsapriv.pem certs100-line-77.rsapriv.pem certs100-line-99.rsapriv.pem
```

## Ethics

This tool is intended for defensive security research and forensics. Researchers, administrators, penetration testers, and security operations teams can use it to protect networks, detect vulnerabilities, and benefit the broader community through improved awareness and defensive posture. As with any security tool, it could potentially be misused. **Do not use this tool for any malicious purpose.**

## References

1. Bernstein, D.J.  "How to find the smooth parts of integers." 2004.
   https://cr.yp.to/papers.html#smoothparts

2. Heninger, Nadia, Zakir Durumeric, Eric Wustrow, and J. Alex Halderman.
   "Mining your Ps and Qs: Detection of widespread weak keys in network
   devices." 21st USENIX Security Symposium (USENIX Security 12), pp. 205-220.
   2012.
   https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/heninger
