Running Katana
Learn about running Katana with examples including commands and output
For all of the flags and options available for Katana be sure to check out the Usage page.
On this page we share examples of Katana with specific flags and goals and the output you can expect from each.
Running Katana
Katana requires a URL or endpoint to crawl and accepts single or multiple inputs.
A URL can be provided using -u option, and multiple values can be provided using comma-separated input, similarly file input is supported using -list option and additionally piped input (stdin) is also supported.
Input for katana
A URL can be provided using -u option, and multiple values can be provided using comma-separated input, similarly file input is supported using -list option and additionally piped input (stdin) is also supported.
URL Input
Multiple URL Input (comma-separated)
List Input
STDIN (piped) Input
Example running katana -
Crawling Mode
Standard Mode
Standard crawling modality uses the standard go http library under the hood to handle HTTP requests/responses. This modality is much faster as it doesn’t have the browser overhead. Still, it analyzes HTTP responses body as is, without any javascript or DOM rendering, potentially missing post-dom-rendered endpoints or asynchronous endpoint calls that might happen in complex web applications depending, for example, on browser-specific events.
Headless Mode
Headless mode hooks internal headless calls to handle HTTP requests/responses directly within the browser context. This offers two advantages:
- The HTTP fingerprint (TLS and user agent) fully identify the client as a legitimate browser
- Better coverage since the endpoints are discovered analyzing the standard raw response, as in the previous modality, and also the browser-rendered one with javascript enabled.
Headless crawling is optional and can be enabled using -headless
option.
Here are other headless CLI options -
-no-sandbox
Runs headless chrome browser with no-sandbox option, useful when running as root user.
-no-incognito
Runs headless chrome browser without incognito mode, useful when using the local browser.
-headless-options
When crawling in headless mode, additional chrome options can be specified using -headless-options
, for example -
Scope Control
Crawling can be endless if not scoped, as such katana comes with multiple support to define the crawl scope.
-field-scope
Most handy option to define scope with predefined field name, rdn
being default option for field scope.
rdn
- crawling scoped to root domain name and all subdomains (e.g.*example.com
) (default)fqdn
- crawling scoped to given sub(domain) (e.g.www.example.com
orapi.example.com
)dn
- crawling scoped to domain name keyword (e.g.example
)
-crawl-scope
For advanced scope control, -cs
option can be used that comes with regex support.
For multiple in scope rules, file input with multiline string / regex can be passed.
-crawl-out-scope
For defining what not to crawl, -cos
option can be used and also support regex input.
For multiple out of scope rules, file input with multiline string / regex can be passed.
-no-scope
Katana is default to scope *.domain
, to disable this -ns
option can be used and also to crawl the internet.
-display-out-scope
As default, when scope option is used, it also applies for the links to display as output, as such external URLs are default to exclude and to overwrite this behavior, -do
option can be used to display all the external URLs that exist in targets scoped URL / Endpoint.
Here is all the CLI options for the scope control -
Crawler Configuration
Katana comes with multiple options to configure and control the crawl as the way we want.
-depth
Option to define the depth
to follow the urls for crawling, the more depth the more number of endpoint being crawled + time for crawl.
-js-crawl
Option to enable JavaScript file parsing + crawling the endpoints discovered in JavaScript files, disabled as default.
-crawl-duration
Option to predefined crawl duration, disabled as default.
-known-files
Option to enable crawling robots.txt
and sitemap.xml
file, disabled as default.
-automatic-form-fill
Option to enable automatic form filling for known / unknown fields, known field values can be customized as needed by updating form config file at $HOME/.config/katana/form-config.yaml
.
Automatic form filling is experimental feature.
Authenticated Crawling
Authenticated crawling involves including custom headers or cookies in HTTP requests to access protected resources. These headers provide authentication or authorization information, allowing you to crawl authenticated content / endpoint. You can specify headers directly in the command line or provide them as a file with katana to perfrom authenticated crawling.
Note: User needs to be manually perform the authentication and export the session cookie / header to file to use with katana.
-headers
Option to add a custom header or cookie to the request.
Syntax of headers in the HTTP specification
Here is an example of adding a cookie to the request:
It is also possible to supply headers or cookies as a file. For example:
There are more options to configure when needed, here is all the config related CLI options -
Connecting to Active Browser Session
Katana can also connect to active browser session where user is already logged in and authenticated. and use it for crawling. The only requirement for this is to start browser with remote debugging enabled.
Here is an example of starting chrome browser with remote debugging enabled and using it with katana -
step 1) First Locate path of chrome executable
Operating System | Chromium Executable Location | Google Chrome Executable Location |
---|---|---|
Windows (64-bit) | C:\Program Files (x86)\Google\Chromium\Application\chrome.exe | C:\Program Files (x86)\Google\Chrome\Application\chrome.exe |
Windows (32-bit) | C:\Program Files\Google\Chromium\Application\chrome.exe | C:\Program Files\Google\Chrome\Application\chrome.exe |
macOS | /Applications/Chromium.app/Contents/MacOS/Chromium | /Applications/Google Chrome.app/Contents/MacOS/Google Chrome |
Linux | /usr/bin/chromium | /usr/bin/google-chrome |
step 2) Start chrome with remote debugging enabled and it will return websocker url. For example, on MacOS, you can start chrome with remote debugging enabled using following command -
Now login to the website you want to crawl and keep the browser open.
step 3) Now use the websocket url with katana to connect to the active browser session and crawl the website
Note: you can use
-cdd
option to specify custom chrome data directory to store browser data and cookies but that does not save session data if cookie is set toSession
only or expires after certain time.
Filters
-field
Katana comes with built in fields that can be used to filter the output for the desired information, -f
option can be used to specify any of the available fields.
Here is a table with examples of each field and expected output when used -
FIELD | DESCRIPTION | EXAMPLE |
---|---|---|
url | URL Endpoint | https://admin.projectdiscovery.io/admin/login?user=admin&password=admin |
qurl | URL including query param | https://admin.projectdiscovery.io/admin/login.php?user=admin&password=admin |
qpath | Path including query param | /login?user=admin&password=admin |
path | URL Path | https://admin.projectdiscovery.io/admin/login |
fqdn | Fully Qualified Domain name | admin.projectdiscovery.io |
rdn | Root Domain name | projectdiscovery.io |
rurl | Root URL | https://admin.projectdiscovery.io |
ufile | URL with File | https://admin.projectdiscovery.io/login.js |
file | Filename in URL | login.php |
key | Parameter keys in URL | user,password |
value | Parameter values in URL | admin,admin |
kv | Keys=Values in URL | user=admin&password=admin |
dir | URL Directory name | /admin/ |
udir | URL with Directory | https://admin.projectdiscovery.io/admin/ |
Here is an example of using field option to only display all the urls with query parameter in it -
Custom Fields
You can create custom fields to extract and store specific information from page responses using regex rules. These custom fields are defined using a YAML config file and are loaded from the default location at $HOME/.config/katana/field-config.yaml
. Alternatively, you can use the -flc
option to load a custom field config file from a different location.
Here is example custom field.
When defining custom fields, following attributes are supported:
- name (required)
The value of name attribute is used as the
-field
cli option value.
- type (required)
The type of custom attribute, currenly supported option -
regex
- part (optional)
The part of the response to extract the information from. The default value is
response
, which includes both the header and body. Other possible values areheader
andbody
.
- group (optional)
You can use this attribute to select a specific matched group in regex, for example:
group: 1
Running katana using custom field:
-store-field
To compliment field
option which is useful to filter output at run time, there is -sf, -store-fields
option which works exactly like field option except instead of filtering, it stores all the information on the disk under katana_field
directory sorted by target url.
The -store-field
option can be useful for collecting information to build a targeted wordlist for various purposes, including but not limited to:
- Identifying the most commonly used parameters
- Discovering frequently used paths
- Finding commonly used files
- Identifying related or unknown subdomains
Katana Filters
-extension-match
Crawl output can be easily matched for specific extension using -em
option to ensure to display only output containing given extension.
-extension-filter
Crawl output can be easily filtered for specific extension using -ef
option which ensure to remove all the urls containing given extension.
-match-regex
The -match-regex
or -mr
flag allows you to filter output URLs using regular expressions. When using this flag, only URLs that match the specified regular expression will be printed in the output.
-filter-regex
The -filter-regex
or -fr
flag allows you to filter output URLs using regular expressions. When using this flag, it will skip the URLs that are match the specified regular expression.
Advance Filtering
Katana supports DSL-based expressions for advanced matching and filtering capabilities:
- To match endpoints with a 200 status code:
- To match endpoints that contain “default” and have a status code other than 403:
- To match endpoints with PHP technologies:
- To filter out endpoints running on Cloudflare:
DSL functions can be applied to any keys in the jsonl output. For more information on available DSL functions, please visit the dsl project.
Here are additional filter options -
Rate Limit
It’s easy to get blocked / banned while crawling if not following target websites limits, katana comes with multiple option to tune the crawl to go as fast / slow we want.
-delay
option to introduce a delay in seconds between each new request katana makes while crawling, disabled as default.
-concurrency
option to control the number of urls per target to fetch at the same time.
-parallelism
option to define number of target to process at same time from list input.
-rate-limit
option to use to define max number of request can go out per second.
-rate-limit-minute
option to use to define max number of request can go out per minute.
Here is all long / short CLI options for rate limit control -
Output
Katana support both file output in plain text format as well as JSON which includes additional information like, source
, tag
, and attribute
name to co-related the discovered endpoint.
-output
By default, katana outputs the crawled endpoints in plain text format. The results can be written to a file by using the -output option.
-jsonl
-store-response
The -store-response
option allows for writing all crawled endpoint requests and responses to a text file. When this option is used, text files including the request and response will be written to the katana_response directory. If you would like to specify a custom directory, you can use the -store-response-dir
option.
Note:
-store-response
option is not supported in -headless
mode.
Here are additional CLI options related to output -
Katana as a library
katana
can be used as a library by creating an instance of the Option
struct and populating it with the same options that would be specified via CLI. Using the options you can create crawlerOptions
and so standard or hybrid crawler
.
crawler.Crawl
method should be called to crawl the input.