AI-Powered Asset Labeling - ProjectDiscovery Documentation

Asset labeling is currently in early beta and operates asynchronously. The initial labeling process may take some time as we optimize performance. We’re actively working on speed improvements to make this process faster and more efficient.

Asset labeling is the automated process of categorizing and contextualizing the assets discovered by ProjectDiscovery. Instead of presenting you with a raw list of domains or IPs, the platform intelligently classifies assets by attaching descriptive labels or tags to each one. These labels provide immediate context about what an asset is – for example, distinguishing a marketing website from an API endpoint or identifying a development server versus a production system. By automatically organizing assets into meaningful categories, asset labeling helps security teams understand their attack surface at a glance and focus on what matters most. In practical terms, once ProjectDiscovery discovers an asset, it will evaluate that asset’s characteristics and assign labels that describe its role or nature. For instance, a web application login page might be labeled as a “Login Portal,” or a host with a name like staging.example.com might get tagged as “Staging Environment” to indicate it’s not a production system. Asset labeling bridges the gap between raw asset data and the business context behind those assets, making your asset inventory more informative and easier to navigate.

How It Works

ProjectDiscovery’s asset labeling engine classifies assets by analyzing various pieces of information collected during discovery. It uses a combination of asset metadata, DNS information, HTTP responses, and even screenshots to determine how to label each asset:

Asset Metadata: Basic details about the asset (such as IP addresses, open ports, SSL certificate data, and hosting information) are examined for clues. For example, an SSL certificate’s Common Name might reveal the application’s name, or an IP’s ASN could indicate the cloud provider or organization owning the asset. This metadata helps identify what the asset might be (e.g., a cloud storage bucket, a VPN gateway, etc.) and adds context for labeling.
DNS Records: DNS information is used to infer the asset’s purpose or ownership. The domain or subdomain names can be very telling. For instance, an asset under dev. or staging. subdomains suggests a non-production environment, whereas something like mail.example.com could indicate an email server. CNAME records might point to a known service (for example, a CNAME to a SaaS provider’s domain), which the platform can recognize and label accordingly. In short, ProjectDiscovery looks at hostnames and DNS details to glean context (like environment, service type, or associated product) that inform the asset’s label.
HTTP Responses: For web assets, the content and behavior of the HTTP(S) service are analyzed. The platform uses its HTTP probing capabilities to gather response headers, status codes, and page content. This includes looking at the HTML title, body text, and other fingerprints. Certain keywords or patterns can identify the application type – for example, a page title containing “Login” or a form with password fields likely indicates a login portal, while a default page saying “Welcome to nginx” indicates a generic web server instance. The system also detects technologies and frameworks running on the asset (e.g., identifying a WordPress site or an Apache server from response signatures) via deep technology fingerprinting. All this HTTP-derived information feeds into the labeling decision.
Screenshots: ProjectDiscovery can capture screenshots of discovered web services. These screenshots provide a visual snapshot of the asset’s interface. In the asset labeling process, screenshots serve as an additional data point for understanding the asset. For example, a screenshot that shows a login screen or an admin panel UI is a strong indicator of the asset’s function (even if the text wasn’t conclusive). While the labeling at this beta stage is mostly driven by metadata and textual analysis, having a screenshot means that if automated logic doesn’t perfectly categorize an asset, an analyst can quickly glance at the image and understand what the asset is.

Behind the scenes, all these inputs are combined to assign one or multiple labels to the asset. The system uses a rules-based approach (and will continue to get smarter over time) to match patterns or signatures with label categories. For example, if an asset’s DNS name contains “api” and the HTTP response returns JSON, a rule might label it as an “API Endpoint.” Similarly, a host identified to be running Jenkins (via tech fingerprinting of HTTP response) might get a label like “Jenkins CI” to denote it’s a CI/CD service. Each label is essentially a quick descriptor that summarizes an aspect of the asset, allowing you to immediately understand its nature without deep manual investigation.

Benefits of Automated Labeling

Automated asset labeling brings several advantages to security professionals and engineers managing a large number of assets:

Reduces Manual Effort: One of the biggest benefits is cutting down the tedious work of labeling assets by hand. In the past, teams might maintain spreadsheets or use tagging systems to mark which assets are production, which are internal, which belong to a certain team, etc. ProjectDiscovery’s automated approach does this heavy lifting for you. As soon as assets are discovered, the platform annotates them with relevant labels, sparing you from examining each asset individually and typing out tags. This automation frees up your time to focus on higher-value tasks like analyzing findings or improving security controls.
Speeds Up Security Triage: With assets automatically categorized, you can prioritize and triage security issues faster. When a new vulnerability or incident is reported, having labeled assets means you instantly know the context. For example, if an alert comes in for api.test.example.com, an “API” label and perhaps a “Staging” label on that asset will tell you it’s a staging API server. You can then decide the urgency (maybe lower than a production issue) and the appropriate team to notify. Without having to dig for this information, response times improve. In short, labels act as immediate context clues that help you quickly determine the criticality of an asset and the impact of any associated vulnerabilities.
Better Asset Management & Organization: Asset labels make it much easier to organize and filter your asset inventory. You can group assets by their labels to get different views of your attack surface. For instance, you might filter to see all assets labeled “Production” to ensure you’re focusing scans and monitoring on live customer-facing systems, or you might pull up all assets labeled “Login Portal” to review authentication points in your infrastructure. This capability turns a flat list of assets into a richly organized dataset that can be sliced and diced for various purposes. It enhances visibility across your environment – you can quickly answer questions like “How many external login pages do we have?” or “Which assets are running database services?” if such labels are applied. Ultimately, this leads to more structured and efficient asset management.
Consistency and Scale: Automated labeling applies the same criteria uniformly across all assets, ensuring consistent classification. Human tagging can be subjective – different team members might label similar assets differently or overlook some assets entirely. With ProjectDiscovery doing it automatically, every asset is evaluated with the same logic, and nothing gets skipped due to oversight. This consistency is especially important when you have hundreds or thousands of assets in dynamic cloud environments. The feature scales effortlessly – no matter how many assets you discover overnight, each will get labeled without adding to anyone’s workload. As your attack surface grows, automated labeling keeps the context up-to-date continuously, which is crucial for maintaining an accurate asset inventory in fast-changing environments.

In summary, automated asset labeling streamlines asset management by eliminating manual tagging drudgery, accelerating the interpretation of asset data, and bringing order and clarity to your inventory. It’s an efficiency boost that also improves the quality of your security posture by ensuring you always know what each asset is and why it’s there.

Cloud Platform

​How It Works

​Benefits of Automated Labeling

How It Works

Benefits of Automated Labeling