> ## Documentation Index
> Fetch the complete documentation index at: https://docs.projectdiscovery.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Extractors

> Review details on extractors for Nuclei

Extractors can be used to extract and display in results a match from the response returned by a module.

### Types

Multiple extractors can be specified in a request. As of now we support five type of extractors.

1. **regex** - Extract data from response based on a Regular Expression.
2. **kval** - Extract `key: value`/`key=value` formatted data from Response Header/Cookie
3. **json** - Extract data from JSON based response in JQ like syntax.
4. **xpath** - Extract xpath based data from HTML Response
5. **dsl** - Extract data from the response based on a DSL expressions.

### Regex Extractor

Example extractor for HTTP Response body using **regex** -

```yaml theme={null}
extractors:
  - type: regex # type of the extractor
    part: body  # part of the response (header,body,all)
    regex:
      - "(A3T[A-Z0-9]|AKIA|AGPA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}"  # regex to use for extraction.
```

### Kval Extractor

A **kval** extractor example to extract `content-type` header from HTTP Response.

```yaml theme={null}
extractors:
  - type: kval # type of the extractor
    kval:
      - content_type # header/cookie value to extract from response
```

Note that `content-type` has been replaced with `content_type` because **kval** extractor does not accept dash (`-`) as input and must be substituted with underscore (`_`).

### JSON Extractor

A **json** extractor example to extract value of `id` object from JSON block.

```yaml theme={null}
      - type: json # type of the extractor
        part: body
        name: user
        json:
          - '.[] | .id'  # JQ like syntax for extraction
```

For more details about JQ - [https://github.com/itchyny/gojq](https://github.com/itchyny/gojq)

### Xpath Extractor

A **xpath** extractor example to extract value of `href` attribute from HTML response.

```yaml theme={null}
extractors:
  - type: xpath # type of the extractor
    attribute: href # attribute value to extract (optional)
    xpath:
      - '/html/body/div/p[2]/a' # xpath value for extraction
```

With a simple [copy paste in browser](https://www.scientecheasy.com/2020/07/find-xpath-chrome.html/), we can get the **xpath** value form any web page content.

### DSL Extractor

A **dsl** extractor example to extract the effective `body` length through the `len` helper function from HTTP Response.

```yaml theme={null}
extractors:
  - type: dsl  # type of the extractor
    dsl:
      - len(body) # dsl expression value to extract from response
```

### Dynamic Extractor

Extractors can be used to capture Dynamic Values on runtime while writing Multi-Request templates. CSRF Tokens, Session Headers, etc. can be extracted and used in requests. This feature is only available in RAW request format.

Example of defining a dynamic extractor with name `api` which will capture a regex based pattern from the request.

```yaml theme={null}
    extractors:
      - type: regex
        name: api
        part: body
        internal: true # Required for using dynamic variables
        regex:
          - "(?m)[0-9]{3,10}\\.[0-9]+"
```

The extracted value is stored in the variable **api**, which can be utilised in any section of the subsequent requests.

If you want to use extractor as a dynamic variable, you must use `internal: true` to avoid printing extracted values in the terminal.

An optional regex **match-group** can also be specified for the regex for more complex matches.

```yaml theme={null}
extractors:
  - type: regex  # type of extractor
    name: csrf_token # defining the variable name
    part: body # part of response to look for
    # group defines the matching group being used. 
    # In GO the "match" is the full array of all matches and submatches 
    # match[0] is the full match
    # match[n] is the submatches. Most often we'd want match[1] as depicted below
    group: 1
    regex:
      - '<input\sname="csrf_token"\stype="hidden"\svalue="([[:alnum:]]{16})"\s/>'
```

The above extractor with name `csrf_token` will hold the value extracted by `([[:alnum:]]{16})` as `abcdefgh12345678`.

If no group option is provided with this regex, the above extractor with name `csrf_token` will hold the full match (by `<input name="csrf_token"\stype="hidden"\svalue="([[:alnum:]]{16})" />`) as `<input name="csrf_token" type="hidden" value="abcdefgh12345678" />`.

### Reusable Dynamic Extractors

With Nuclei v3.1.4 you can now reuse dynamic extracted value (ex: csrf\_token in above example) immediately in next extractors and is by default available in subsequent requests

Example:

```
id: basic-raw-example

info:
  name: Test RAW Template
  author: pdteam
  severity: info


http:
  - raw:
      - |
        GET / HTTP/1.1
        Host: {{Hostname}}

    extractors:
      - type: regex
        name: title
        group: 1
        regex:
          - '<title>(.*)<\/title>'
        internal: true

      - type: dsl
        dsl:
          - '"Title is " + title'
```
