EHAX CTF 2026 - Quantum Message (Forensics)

Challenge Overview

Category: Forensics Files provided: challenge.wav Hint: “he types very fast like 20ms”

Objective: Recover the hidden message or flag stored inside the provided audio file.

TL;DR: The 82-second audio file contains no plaintext or payloads natively within its metadata. A spectrogram analysis reveals structured 1-second long tonal blocks separated by ~20ms gaps. Processing each 1-second segment through a periodogram uncovers two dominant frequencies per block: one “low” and one “high”, operating effectively as a custom high-speed DTMF keypad. Extracting these frequency pairs gives an 80-digit string, which perfectly maps back to concatenated decimal ASCII text representing the flag.

1. Initial Triage

The first step is checking the basic audio formats and metadata available in the file:

file challenge.wav
exiftool challenge.wav

Observations:

WAV format, mono stream, 44.1 kHz, float32 encoding.
Duration is approximately 82 seconds.
No obvious metadata flag inside the headers.

Quickly checking for common “easy wins” natively embedded in the file data:

strings challenge.wav
binwalk challenge.wav

No useful embedded plaintext, Steghide payloads, or concatenated files are present.

2. Signal-Level Observation

Opening the file in a spectrogram (or analyzing the envelope natively) shows a distinct, repetitive structural pattern:

Successive long tone blocks, with each continuous tonal pulse taking exactly ~1 second.
Very short, sharp silence gaps between the blocks.
The gap length measures precisely ~17-20 ms (perfectly matching the prompt’s hint: “he types very fast like 20ms”).

From basic envelope and run-length analysis, we calculate:

There are exactly ~80 tone segments.
Each tone segment is ~1002 ms.
The silence gaps functionally act as symbol delimiters.

Conclusion: The audio is not an organically recorded speech file or ambient noise; it is strictly a mathematically structured, machine-generated symbol transmission.

3. Frequency Structure & Demodulation

For each ~1-second tone block, we identify the dominant frequencies.

Each discrete sound block contains:

1 dominant frequency sourced from a low group: {300, 900, 1500, 2100} (in Hz)
1 dominant frequency sourced from a high group: {2700, 3300, 3900} (in Hz)

This configuration is perfectly analogous to a custom Dual-Tone Multi-Frequency (DTMF) keypad matrix.

We can map every (low_index, high_index) pair to a 2D integer keypad grid:

Low \ High	2700	3300	3900
300	1	2	3
900	4	5	6
1500	7	8	9
2100		0

Running this operation iteratively across all 80 tone segments produced the following continuous 80-digit numerical stream:

69725288123113117521101161171099511210412111549995395495395534895539952114121125

4. Final Decode

Treating the 80-digit stream as concatenated, variable-length decimal ASCII chars requires simple length-based parsing logic:

Parse exactly 2 digits if the parsed value falls cleanly in the standard printable ASCII range 32..99.
Otherwise, parse exactly 3 digits (for higher 3-digit ASCII characters like 104 (‘h’), 121 (‘y’), etc.).

Processing the stream via this logical translation decrypts the string to:

EH4X{qu4ntum_phys1c5_15_50_5c4ry}

5. Reproducible Solve Script

This standalone Python script using numpy and scipy automates the entire analytical process directly from the raw WAV file:

1
#!/usr/bin/env python3
2
import numpy as np
3
from scipy.io import wavfile
4
from scipy.ndimage import uniform_filter1d
5
from scipy.signal import periodogram
6

7
sr, x = wavfile.read("challenge.wav")
8
if x.ndim > 1:
9
    x = x[:, 0]
10
x = x.astype(float)
11

12
# 1) Detect tone-present regions (separated by ~20ms silence)
13
env = uniform_filter1d(np.abs(x), size=int(sr * 0.005))
14
mask = env > 0.1
15

16
runs = []
17
cur = mask[0]
18
start = 0
19
for i, v in enumerate(mask[1:], 1):
20
    if v != cur:
21
        runs.append((cur, start, i))
22
        cur = v
23
        start = i
24
runs.append((cur, start, len(mask)))
25

26
# Keep only long "on" segments (the ~1 second symbols)
27
segments = [(s, e) for on, s, e in runs if on and (e - s) > sr * 0.5]
28

29
low_group = [300, 900, 1500, 2100]
30
high_group = [2700, 3300, 3900]
31

32
keypad = {
33
    (0, 0): "1", (0, 1): "2", (0, 2): "3",
34
    (1, 0): "4", (1, 1): "5", (1, 2): "6",
35
    (2, 0): "7", (2, 1): "8", (2, 2): "9",
36
    (3, 1): "0",
37
}
38

39
digits = []
40
for s, e in segments:
41
    sig = x[s:e]
42
    f, p = periodogram(sig, fs=sr, scaling="spectrum")
43

44
    # Determine strongest low-group bin
45
    low_amp = []
46
    for c in low_group:
47
        m = (f > c - 20) & (f < c + 20)
48
        low_amp.append(p[m].max() if m.any() else 0.0)
49
    r = int(np.argmax(low_amp))
50

51
    # Determine strongest high-group bin
52
    high_amp = []
53
    for c in high_group:
54
        m = (f > c - 20) & (f < c + 20)
55
        high_amp.append(p[m].max() if m.any() else 0.0)
56
    c = int(np.argmax(high_amp))
57

58
    digits.append(keypad[(r, c)])
59

60
digit_stream = "".join(digits)
61
print("[+] Digit stream:", digit_stream)
62

63
# 2) Parse concatenated decimal ASCII
64
i = 0
65
out = []
66
while i < len(digit_stream):
67
    v2 = int(digit_stream[i:i+2]) if i + 2 <= len(digit_stream) else -1
68
    if 32 <= v2 <= 99:
69
        out.append(chr(v2))
70
        i += 2
71
    else:
72
        out.append(chr(int(digit_stream[i:i+3])))
73
        i += 3
74

75
flag = "".join(out)
76
print("[+] Flag:", flag)

Console Execution Output:

python3 solve.py
[+] Digit stream: 69725288123113117521101161171099511210412111549995395495395534895539952114121125
[+] Flag: EH4X{qu4ntum_phys1c5_15_50_5c4ry}

Final Flag

EH4X{qu4ntum_phys1c5_15_50_5c4ry}