diff --git a/docs/configuration.md b/docs/configuration.md index 27e269c3b2..0f4f97ca69 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -293,7 +293,7 @@ User can set the source field value to 'sc4s' by using the `SC4S_SET_SOURCE_AS_S ## Drop all data by IP or subnet (deprecated) -Using `vendor_product_by_source` to null queue is now a deprecated task. See the supported method for dropping data in [Filtering events from output](https://splunk.github.io/splunk-connect-for-syslog/main/sources/#filtering-events-from-output). +Using `vendor_product_by_source` to null queue is now a deprecated task. See the supported method for dropping data in [Filtering events from output](sources/index.md#filtering-events-from-output). ## Splunk Connect for Syslog output templates (syslog-ng templates) diff --git a/docs/create-parser.md b/docs/create-parser.md deleted file mode 100644 index b5aef31f64..0000000000 --- a/docs/create-parser.md +++ /dev/null @@ -1,111 +0,0 @@ - -# Create a parser - -SC4S parsers perform operations that would normally be performed during index time, including linebreaking, source and sourcetype setting, and timestamping. You can write your own parser if the parsers available in the SC4S package do not meet your needs. - -## Before you start -* Make sure you have read our [contribution standards](CONTRIBUTING.md). -* For more background information on how filters and parsers work, read the [sources](sources/index.md) documentation in this manual. -* Prepare your testing environment. With Python>=3.9: -``` -pip3 install poetry -poetry install -``` - -* Prepare your testing command: -``` -poetry run pytest -v --tb=long \ ---splunk_type=external \ ---splunk_hec_token= \ ---splunk_host= \ ---sc4s_host= \ ---splunk_user= \ ---splunk_password= \ ---junitxml=test-results/test.xml \ --n \ - -``` - -* Create a new branch in the repository where you will apply your changes. - -## Procure a raw log message -If you already have a raw log message, you can skip this step. Otherwise, you need to extract one to have something to work with. You can do this in multiple ways, this section describes three methods. - -### Procure a raw log message using `tcpdump` -You can use the `tcpdump` command to get incoming raw messages on a given port of your server: - -``` bash -tcpdump -n -s 0 -S -i any -v port 8088 - -tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes -09:54:26.051644 IP (tos 0x0, ttl 64, id 29465, offset 0, flags [DF], proto UDP (17), length 466) -10.202.22.239.41151 > 10.202.33.242.syslog: SYSLOG, length: 438 -Facility local0 (16), Severity info (6) -Msg: 2022-04-28T16:16:15.466731-04:00 NTNX-21SM6M510425-B-CVM audispd[32075]: node=ntnx-21sm6m510425-b-cvm type=SYSCALL msg=audit(1651176975.464:2828209): arch=c000003e syscall=2 success=yes exit=6 a0=7f2955ac932e a1=2 a2=3e8 a3=3 items=1 ppid=29680 pid=4684 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=964698 comm=“sshd” exe=“/usr/sbin/sshd” subj=system_u:system_r:sshd_t:s0-s0:c0.c1023 key=“logins”\0x0a - -``` - -### Procure a raw log message using Wireshark -Once you get your stream of messages, copy one of them. Note that in UDP there are not usually any message separators. -You can also read the logs using Wireshark from the .pcap file. From Wireshark go to Statistics > Conversations, then click on `Follow Stream`: - -![ws_conversation](resources/images/ws_conv.png) - -### Procure a raw log message by saving it in Splunk -See [Obtaining "On-the-wire" Raw Events](https://splunk.github.io/splunk-connect-for-syslog/main/troubleshooting/troubleshoot_resources/#obtain-raw-message-events). - -## Create a unit test -To create a unit test, use the existing test case that is most similar to your use case. The naming convention is `test_vendor_product.py`. - -1. Make sure that your log is being parsed correctly by creating a test case. -Assuming you have a raw message like this: - -`<14>1 2022-03-30T11:17:11.900862-04:00 host - - - - Carbon Black App Control event: text="File 'c:\program files\azure advanced threat protection sensor\2.175.15073.51407\winpcap\x86\packet.dll' [c4e671bf409076a6bf0897e8a11e6f1366d4b21bf742c5e5e116059c9b571363] would have blocked if the rule was not in Report Only mode." type="Policy Enforcement" subtype="Execution block (unapproved file)" hostname="CORP\USER" username="NT AUTHORITY\SYSTEM" date="3/30/2022 3:16:40 PM" ip_address="10.0.0.3" process="c:\program files\azure advanced threat protection sensor\2.175.15073.51407\microsoft.tri.sensor.updater.exe" file_path="c:\program files\azure advanced threat protection sensor\2.175.15073.51407\winpcap\x86\packet.dll" file_name="packet.dll" file_hash="c4e671bf409076a6bf0897e8a11e6f1366d4b21bf742c5e5e116059c9b571363" policy="High Enforcement - Domain Controllers" rule_name="Report read-only memory map operations on unapproved executables by .NET applications" process_key="00000433-0000-23d8-01d8-44491b26f203" server_version="8.5.4.3" file_trust="-2" file_threat="-2" process_trust="-2" process_threat="-2" prevalence="50"` - -* Make sure that the message is a valid python string, where escape characters are placed correctly. -* Anonymize the data. -* Rename functions. -* Update index and sourcetype fields. -* Extract and replace values with field names in the test string. - -2. Now run the test, for example: -``` -poetry run pytest -v --tb=long \ ---splunk_type=external \ ---splunk_hec_token= \ ---splunk_host= \ ---sc4s_host= \ ---splunk_user= \ ---splunk_password= \ ---junitxml=test-results/test.xml \ --n \ -test/test_vendor_product.py -``` - -3. The parsed log should appear in Splunk: -![parsed_log](resources/images/parser_dev_splunk_first_run.png) - -In this example the message is being parsed as a generic `nix:syslog` sourcetype. This means that the message format complied with RFC standards, and SC4S could correctly identify the format fields in the message. - -## Create a parser -To assign your messages to the proper index and sourcetype you will need to create a parser. Your parser must be declared in `package/etc/conf.d/conflib`. The naming convention is `app-type-vendor_product.conf`. - -1. If you find a similar parser in SC4S, you can use it as a reference. In the parser, make sure you assign the proper sourcetype, index, vendor, product, and template. The template shows how your message should be parsed before sending them to Splunk. - -The most basic configuration will forward raw log data with correct metadata, for example: -```bash ---8<---- "docs/resources/parser_development/app-syslog-vmware_cb-protect_example_basic.conf" -``` -All messages that start with the string `Carbon Black App Control event:` will now be routed to the proper index and assigned the given sourcetype: -![](resources/images/parser_dev_basic_output.png) -For more info about using message filtering go to [sources documentation.](sources/index.md#standard-syslog-using-message-parsing) - -2. To apply more transformations, add the parser: -```bash ---8<---- "docs/resources/parser_development/app-syslog-vmware_cb-protect_example.conf" -``` -This example extracts all fields that are nested in the raw log message first by using `csv-parser` to split `Carbon Black App Control event` and the rest of message as a two separate fields named `header` and `message`. `kv-parser` will extract all key-value pairs in the `message` field. - -3. To test your parser, run a previously created test case. If you need more debugging, use `docker ps` to see your running containers and `docker logs` to see what's happening to the parsed message. - -4. Commit your changes and open a pull request. diff --git a/docs/creating_parsers/filter_message.md b/docs/creating_parsers/filter_message.md new file mode 100644 index 0000000000..63802924b0 --- /dev/null +++ b/docs/creating_parsers/filter_message.md @@ -0,0 +1,285 @@ +# Filter Messages + +!!! note "Prerequisites" + Before reading this section, make sure you are familiar with [Sources](../sources/index.md) and [Read First](index.md). + +This section covers how to create `application` filters. Filters in user made parsers are responsible for matching incoming log messages based on a set of filter statements and routing them to the appropriate parsers for further processing. + +Most filters have the following structure: + +``` +application [] { + filter { + + }; + parser { (); }; +}; +``` + +Filters are grouped into topics. Each topic represents a stage or strategy for identifying log sources. + +For example: + +- **sc4s-syslog-sdata** - matches by structured data (often a Private Enterprise Number, PEN). Example: +``` +application app-syslog-f5_bigip_structured[sc4s-syslog-sdata] { + filter { + match('^\[F5@12276' value("SDATA")); + }; + parser { app-syslog-f5_bigip_structured(); }; +}; +``` + +- **sc4s-syslog-pgm** - matches by program value (`PROGRAM` in RFC3164, `APP-NAME` in RFC5424). Example: + +``` +application app-syslog-alcatel_switch[sc4s-syslog-pgm] { + filter { + program('swlogd' type(string) flags(prefix)); + }; + parser { app-syslog-alcatel_switch(); }; +}; +``` + +- **sc4s-syslog** - usually used for identification by message body content. Example: + +``` +application app-syslog-arista_eos[sc4s-syslog] { + filter { + program('^[A-Z]\S+$') + and message('%' type(string) flags(prefix)); + }; + parser { app-syslog-arista_eos(); }; +}; +``` + +- **sc4s-network-source** - matches by destination port. Example: + +``` +application app-netsource-brocade_syslog[sc4s-network-source] { + filter { + not filter(f_is_source_identified) + and ( + ( + match("brocade", value('.netsource.sc4s_vendor'), type(string)) + and match("syslog", value('.netsource.sc4s_product'), type(string)) + ) + or (tags("ns_vendor:brocade") and tags("ns_product:syslog")) + or tags(".source.s_BROCADE") + or "${.netsource.sc4s_vendor_product}" eq "brocade_syslog" + ) + }; + parser { app-netsource-brocade_syslog(); }; +}; +``` + +- **cef** - for CEF-formatted messages. Example: + +``` +application app-cef-a10_vthunder[cef] { + filter{ + match("A10" value(".metadata.cef.device_vendor")) + and match("vThunder" value(".metadata.cef.device_product")); + }; + parser { app-cef-a10_vthunder(); }; +}; +``` + +- **sc4s-almost-syslog** - for sources sending legacy non-conformant RFC 3164 logs. + +Filters can use different functions to match parts of the log message. These functions can be combined in conditional patterns. Below is a list of the most common functions with their arguments, as well as common patterns for matching logs: + +**program()** + +Used for matching messages by the program field (`PROGRAM` in RFC3164, `APP-NAME` in RFC5424). You can pass the following options: + +- `type()`: + - `pcre` - Perl Compatible Regular Expressions (default). + - `string` - Literal string match (faster than regex). + - `glob` - Shell-style glob pattern. + +- `flags()`: + - `prefix` - Match if the value starts with the pattern. + - `substring` - Match if pattern appears anywhere in the value. + - `ignore-case` - Disable case-sensitive matching. + +Example: + +``` +# using type(string) and flags(substring, ignore-case) +filter { + program('avx-gw-state-sync' type(string) flags(substring, ignore-case)) +}; + +# using default options (pcre regex) +filter { + program('^[A-Z]\S+$') +}; +``` + +**match()** + +Used for matching messages against a pattern. Unlike `program()`, `match()` can target any field using the `value()` or `template()` parameters. + +Syntax variants: + +- `match(pattern)` - matches against the message header + message (`MSGHDR` + `MSG`). +- `match(pattern value("MACRO"))` - matches against a specific field. + +Options: + +- `type()`: + - `pcre` - Perl Compatible Regular Expressions (default). + - `string` - Literal string match (faster than regex). + - `glob` - Shell-style glob pattern. + +- `flags()`: + - `prefix` - Match if the value starts with the pattern. + - `substring` - Match if pattern appears anywhere in the value. + - `ignore-case` - Disable case-sensitive matching. + - `store-matches` - Store captured groups into `$0`–`$255` macros. + +Examples: + +``` +# match by message body using regex (default type) +filter { + message('^time=\d{10}\|hostname=') +}; + +# match a specific field using a literal string +# common use case for CEF-formatted logs +filter { + match("A10" value(".metadata.cef.device_vendor")) + and match("vThunder" value(".metadata.cef.device_product")); +}; + +# match against MSGHDR with regex +filter { + match('(SYS|WF|TR|AUDIT|NF) ?$', value("MSGHDR")) +}; + +# match against structured data (SDATA) +filter { + match('^\[F5@12276' value("SDATA")) +}; +``` + +**message()** + +A shorthand for `match(pattern value("MESSAGE"))` — matches only against the message body (`MSG`), excluding headers. Supports the same `type()` and `flags()` options as `match()`. + +Example: + +``` +# literal substring match +filter { + message(': Avi-Controller: ' type(string) flags(substring)) +}; +``` + +**host()** + +Matches against the hostname field (`HOST`). Supports the same `type()` and `flags()` options as `program()`. + +Example: + +``` +filter { + host('myserver' type(string)) +}; +``` + +**String comparison operators (`eq`, `ne`)** + +An alternative to `match()`, used for equality checks on macro values or environment variables. Two syntaxes exist depending on what is being resolved: + +- `"${MACRO}"` — Resolves a macro value. Use for message fields like `${PROGRAM}`, `${HOST}`, or parser-set fields like `${.SDATA.sc4s@2620.product}`. +- `` "`ENV_VAR`" `` — Resolves an environment variable. Use for SC4S configuration options like `SC4S_DEST_*` or `SC4S_SOURCE_*`. + +Available operators: + +- `eq` — equals +- `ne` — not equals + +Examples: + +``` +# check vendor_product from netsource enrichment +filter { + "${.netsource.sc4s_vendor_product}" eq "aruba_clearpass" +}; + +# check if an env variable is set +filter { + "`SC4S_DEST_BEYONDTRUST_SRA_SYSLOG_FMT`" eq "SDATA" +}; +``` + +**tags()** + +Used for filtering messages by tags. Tags are labels attached to messages and are fast to filter on compared to string matching. Custom tags can be added using `set-tag()` in rewrite rules. + +Tags are set with: + +``` +rewrite r_set_my_tag { + set-tag("my_tag"); +}; +``` + +Syntax: + +- `tags("tag_name")` — matches if the message has the specified tag + +Examples: + +``` +# match by source tag +filter { + tags(".source.s_VMWARE_VCENTER") +}; + +# match by vendor/product tags +filter { + tags("ns_vendor:vmware") and tags("ns_product:vsphere") +}; + +# match by wire format +filter { + tags("wireformat:rfc3164_isodate") +}; +``` + +**Conditional statements** + +Filters can be combined using logical operators such as `and`, `or`, and `not`, and enclosed in parentheses for grouping complex conditions. + +Examples: + +``` +# Match messages from the VMware vCenter source +# or by a combination of vendor and product tags +filter { + tags(".source.s_VMWARE_VCENTER") + or ( + tags("ns_vendor:vmware") + and tags("ns_product:vsphere") + ) +}; +``` + +## Send messages to parser + +After creating filter, specify what parser should be use for further processing: + +``` +application app-syslog-vmware_cb-protect[sc4s-syslog] { + filter { + message('Carbon Black App Control event: ' type(string) flags(prefix)); + }; + parser { app-syslog-vmware_cb-protect(); }; # If filtering succeeds send data to this parser +}; +``` + +For more information about creating parsers see [Parse Messages](parse_message.md). \ No newline at end of file diff --git a/docs/creating_parsers/index.md b/docs/creating_parsers/index.md new file mode 100644 index 0000000000..e50c87551f --- /dev/null +++ b/docs/creating_parsers/index.md @@ -0,0 +1,54 @@ +# SC4S parsers + +!!! note "Prerequisites" + Before reading this section, make sure you are familiar with [Sources](../sources/index.md). + +This and subsequent sections describe how to create new parsers. SC4S parsers perform operations that would normally be performed during index time, including line-breaking, source and sourcetype setting. You can write your own parser if the parsers available in the SC4S package do not meet your needs or you want to add support for a new sourcetype. + +## Before you start + +* Make sure you have read our [contribution standards](../CONTRIBUTING.md). +* Obtain a raw log message that you want to parse. If you do not know how to do it, refer to [Obtain raw message events](../troubleshooting/troubleshoot_resources.md#obtain-raw-message-events). +* Prepare your testing environment. With Python>=3.11.0: + +``` +pip3 install poetry +poetry install +``` + +## Parsers + +### Naming conventions and project structure + +Parsers are .conf files with the naming convention: `app-type-vendor_product.conf`. Parsers that are part of the repository can be found at `package/etc/conf.d/conflib` or `package/lite/etc/addons` for Lite package. + +Remember that adding your parser to the main or lite package in the repo requires building a new image for it to become available to your SC4S instance. If you want to add locally new parser, you can add it to `/opt/sc4s/local` directory on your existing SC4S installation. + +### Parser structure + +The SC4S parser consists of `application` and `block parser` blocks. The `application` part uses filter clause to specify what logs will be parsed by the `block parser` block. Example of such parser is shown below: + +``` +--8<---- "docs/resources/parser_development/app-syslog-vmware_cb-protect_example_basic.conf" +``` + +!!! note "Note" + If you find a similar parser in SC4S, you can use it as a reference. In the parser, make sure you assign the proper sourcetype, index, vendor, product, and template. The template shows how your message should be parsed before sending them to Splunk. + + +The application filter will match all messages that start with the string `Carbon Black App Control event:`, and those events will be parsed by `block parser app-syslog-vmware_cb-protect()`. This parser then will route the message to index: `epintel`, set the sourcetype, source, vendor and product fields, and use the specified template. + +![](../resources/images/parser_dev_basic_output.png) + +To learn more about creating filters and parse blocks see pages: [Filter Messages](filter_message.md) and [Parse Messages](parse_message.md). + +### Adding parser to SC4S Lite package + +For SC4S lite, parsers are grouped into `addons`. Create a folder (if it does not already exist) in `package/lite/etc/addons` with the name of vendor. In this folder, also create an `addon_metadata.yaml` file with vendor name: + +``` +--- +name: "" +``` + +Lastly, add this addon to `package/lite/etc/config.yaml`. \ No newline at end of file diff --git a/docs/creating_parsers/parse_message.md b/docs/creating_parsers/parse_message.md new file mode 100644 index 0000000000..c5e760b69d --- /dev/null +++ b/docs/creating_parsers/parse_message.md @@ -0,0 +1,321 @@ +# Parse Messages + +!!! note "Prerequisites" + Before reading this section, make sure you are familiar with [Sources](../sources/index.md) and [Read First](index.md). + +This section covers how to create `block parser`. Every `application` block references a `block parser` that defines how to process the matched message. Within the parser, you extract fields and set Splunk metadata. + +## Structure + +``` +block parser () { + channel { + parser { }; + rewrite { }; + }; +}; +``` + +A block parser contains a `channel { }` with two stages: + +1. **Parsing** (optional) — extract fields from the message using `kv-parser`, `csv-parser`, `regexp-parser`, `json-parser`, `date-parser`, or `syslog-parser`. +2. **Rewrite** — set Splunk destination metadata (index, sourcetype, vendor, product, template). + +## Rewrite functions + +There are two rewrite functions for setting Splunk metadata: + +**`r_set_splunk_dest_default`** — sets all base Splunk metadata. Includes `index`, `sourcetype`, `vendor`, `product`, and optionally `source` and `template`. + +``` +rewrite { + r_set_splunk_dest_default( + index("netops") + sourcetype("alcatel:switch") + vendor("alcatel") + product("switch") + template("t_hdr_msg") + ); +}; +``` + +**`r_set_splunk_dest_update_v2`** — overrides specific fields already set by `r_set_splunk_dest_default`. It accepts `index`, `source`, `sourcetype`, `class`, and `template` options. You can also use the `condition` option for a conditional expression. + +``` +rewrite { + r_set_splunk_dest_update_v2( + sourcetype('citrix:netscaler:appfw') + condition(message(':(\s+\S+)?\s+APPFW(\s+\S+){3}\s+:')) + ); +}; +``` + +## Templates + +The `template` parameter in `r_set_splunk_dest_default` controls what part of the message is forwarded to Splunk. Templates are defined in [`package/etc/conf.d/conflib/_common/t_templates.conf`](https://github.com/splunk/splunk-connect-for-syslog/blob/main/package/etc/conf.d/conflib/_common/t_templates.conf). The most common ones: + +| Template | Content | Use case | +|---|---|---| +| `t_hdr_msg` | `${MSGHDR}${MESSAGE}` | Default for most parsers | +| `t_msg_only` | `${MSGONLY}` | When header is not needed (e.g. Palo Alto) | +| `t_program_msg` | `${PROGRAM}[${PID}]: ${MESSAGE}` | Program with PID and message | +| `t_hdr_sdata_msg` | `${MSGHDR}${MSGID} ${SDATA} ${MESSAGE}` | RFC5424 with structured data | +| `t_json_values_msg` | Same as `t_json_values` + `message=$MSG` | Parsed fields as JSON with original message | + +## Parsing methods + +Parsers in syslog-ng control how incoming log messages are broken down and structured for further processing and field extraction. They analyze the message content, extracting meaningful data into named fields that can later be used in Splunk. Different log sources and formats require different parsing strategies: + +**`kv-parser`** — use when logs contain key/value pairs (`key=value`). + +Options: + +- `prefix()` — string prepended to every extracted key name. Prevents collisions with built-in syslog-ng macros and keeps parsed fields in their own namespace. For example, with `prefix(".values.")`, a key `src=10.0.0.1` becomes the field `.values.src`. +- `template()` — specifies which part of the message to parse. By default, `kv-parser` operates on `${MESSAGE}`. +- `pair-separator()` — custom delimiter between key=value pairs (default is whitespace). +- `value-separator()` — custom separator between key and value (default is `=`). + +``` +# parsing HDR + MSG of the log and prepending parsed values with `.values.` +parser { + kv-parser(prefix(".values.") template("$(template t_hdr_msg)")); +}; +``` + +**`csv-parser`** — use when logs are consistently delimited with stable column order. + +Options: + +- `columns()` — comma-separated list of column names. Each column maps positionally to a delimited field in the message. Names become field keys (prefixed if `prefix()` is set). +- `prefix()` — string prepended to each column name (e.g., `prefix(".values.")` turns column `"src"` into `.values.src`). +- `delimiters()` — character(s) used to split the message into columns (e.g., `','` for CSV, `'\t'` for TSV). By default, space is used. +- `quote-pairs()` — characters used for quoting values that contain the delimiter (e.g., `'""'` for double-quote pairs). +- `template()` — specifies which part of the message to parse. By default, `csv-parser` operates on `${MESSAGE}`. +- `flags()` — parsing behavior flags: + - `escape-double-char` — treat doubled characters as a literal char inside a value, e.g., use `,,` to escape a single comma `,`. + - `greedy` — assign all remaining text to the last column instead of discarding it. + - `drop-invalid` — drop the message if the number of fields does not match the column count. + +``` +parser { + csv-parser( + columns("col1","col2","col3","col4") + prefix(".values.") + delimiters(',') + quote-pairs('""') + flags(escape-double-char) + ); +}; +``` + +**`regexp-parser`** — use to parse message content with regular expressions. + +Options: + +- `patterns()` — one or more regular expressions with named capture groups (`(?...)`). Each named group becomes a field. +- `prefix()` — string prepended to each captured group name. +- `template()` — which part of the message to match against (default is `${MESSAGE}`). + +``` +parser { + regexp-parser( + template("${MESSAGE}") + patterns('^(?\d+) (?[^ ]+) (?.*)') + prefix(".parsed.") + ); +}; +``` + +**`json-parser`** — use when logs are JSON-formatted. + +Options: + +- `prefix()` — string prepended to each JSON key. Nested JSON objects are flattened with dots (e.g., `{"event":{"src":"10.0.0.1"}}` with `prefix(".values.")` becomes `.values.event.src`). +- `template()` — which field to parse as JSON (default is `${MESSAGE}`). + +``` +parser { + json-parser(prefix('.values.')); +}; +``` + +**`date-parser`** — use when the timestamp needs to be explicitly parsed from a field, e.g., from a non-syslog message. + +Options: + +- `format()` — one or more strptime format strings to try in order (e.g., `'%s.%f'` for epoch with fractional seconds, `'%Y-%m-%dT%H:%M:%S%z'` for ISO 8601). Multiple formats can be passed as a comma-separated list; the first one that matches wins. +- `template()` — which field contains the timestamp string to parse. + +``` +parser { + date-parser( + format('%s.%f', '%s') + template("${.tmp.timestamp}") + ); +}; +``` + +**`syslog-parser`** — re-parses a reconstructed syslog line. Used in almost-syslog parsers when the original message has a non-standard header that needs to be normalized first with `regexp-parser` and then re-parsed. + +Options: + +- `template()` — the string to parse as a syslog message. Typically composed from `.tmp.*` fields extracted by a prior `regexp-parser`. +- `flags()` — parsing behavior flags: + - `assume-utf8` — assume the message is UTF-8 encoded without verification. + - `guess-timezone` — attempt to guess the timezone if not explicitly present in the timestamp. + - `no-header` — parse only the PRI field; put the rest into `${MSG}`. + +``` +# after regexp-parser extracted parts into .tmp.* fields +parser { + syslog-parser( + flags(assume-utf8, guess-timezone) + template("${.tmp.pri} $S_ISODATE ${.tmp.message}") + ); +}; +``` + +## Rewrite operations + +Beyond `r_set_splunk_dest_default` and `r_set_splunk_dest_update_v2`, block parsers commonly use these rewrite operations: + +**`set()`** — sets a field to a value. Supports macro expansion and conditions. + +``` +rewrite { + # copy a parsed field into HOST + set("${.values.hostname}", value("HOST")); + + # conditional set + set("new_value", value("PROGRAM") condition(program('old_value' type(string)))); +}; +``` + +**`subst()`** — performs string substitution on a field value. Supports regex. + +``` +rewrite { + # strip leading ": " from MESSAGE + subst('^: ', "", value("MESSAGE")); + + # strip path prefix from PROGRAM (e.g., "/usr/bin/app" -> "app") + subst('^\/(?:[^\/]+\/)+', "", value("PROGRAM")); + + # global flag to replace all occurrences + subst('\t', " ", value("MESSAGE") flags(global)); +}; +``` + +**`unset()`** — removes a field entirely. + +``` +rewrite { + unset(value("PROGRAM")); + unset(value("PID")); +}; +``` + +**`r_set_dest_splunk_null_queue`** — tags the message for dropping (null queue). Used in post-filters to discard noise or incomplete events. + +``` +rewrite(r_set_dest_splunk_null_queue); +``` + +**`map-value-pairs`** — remaps existing name-value pairs to a different set of names. + +``` +# map all .values.* to .SDATA.sc4sfields@27389.*, e.g., .values.src -> .SDATA.sc4sfields@27389.src +rewrite { + map-value-pairs( + key('.values.*' rekey(shift-levels(2) add-prefix(".SDATA.sc4sfields@27389."))) + ); +}; +``` + +## Examples + +**Simple parser** — no field extraction, just sets Splunk metadata: + +``` +block parser app-syslog-alcatel_switch() { + channel { + rewrite { + r_set_splunk_dest_default( + index('netops') + sourcetype('alcatel:switch') + vendor("alcatel") + product("switch") + template('t_hdr_msg') + ); + }; + }; +}; +``` + +**Parser with field extraction** — extracts key/value pairs, validates a required field: + +``` +block parser app-syslog-vendor_product() { + channel { + parser { + kv-parser(prefix(".values.") template("$(template t_hdr_msg)")); + }; + filter { + "${.values.some_required_field}" ne "" + }; + rewrite { + r_set_splunk_dest_default( + index('netfw') + sourcetype('vendor:product') + vendor("vendor") + product("product") + template('t_kv_values') + ); + }; + }; +}; +``` + +**Parser with conditional branches** — routes different message types to different sourcetypes: + +``` +block parser app-syslog-vendor_product() { + channel { + rewrite { + r_set_splunk_dest_default( + index("netops") + sourcetype('vendor:log') + vendor("vendor") + product('product') + template('t_msg_only') + ); + }; + + if (message(',TRAFFIC,' type(string) flags(substring))) { + parser { + csv-parser( + delimiters(chars('') strings('|')) + columns('version', 'device_vendor', 'device_product', 'device_version', 'device_event_class', 'name', 'severity', 'ext') + prefix('.metadata.cef.') + flags(greedy)); + }; + rewrite { + r_set_splunk_dest_update_v2( + index('netfw') + class('traffic') + sourcetype('vendor:traffic') + ); + }; + } elif (message(',SYSTEM,' type(string) flags(substring))) { + parser { csv-parser(columns(...) prefix(".values.") delimiters(',')); }; + rewrite { + r_set_splunk_dest_update_v2( + index('netops') + class('system') + sourcetype('vendor:system') + ); + }; + } else { }; + }; +}; +``` diff --git a/docs/creating_parsers/unit_tests.md b/docs/creating_parsers/unit_tests.md new file mode 100644 index 0000000000..b4a423f01f --- /dev/null +++ b/docs/creating_parsers/unit_tests.md @@ -0,0 +1,101 @@ +# Prerequisites +You can run tests either by using already created SC4S and Splunk instances, or you can use Docker Compose to create them instead. You can also mix these methods by providing, e.g., only Splunk and using Docker Compose to spin up an SC4S instance. + +If you want to use an already existing SC4S and Splunk instance, you need to pass these arguments: `--sc4s_host=`, `--sc4s_type=external` for SC4S and `--splunk_host=`, `--splunk_type=external` for Splunk. If you want to use Docker Compose, you need to have it installed. You can review the Compose configuration at `tests/docker-compose.yml`. + +Testing command for external setup: +``` +poetry run pytest -v --tb=long \ +--splunk_type=external \ +--splunk_hec_token= \ +--splunk_host= \ +--sc4s_type=external \ +--sc4s_host= \ +--splunk_user= \ +--splunk_password= \ +--junitxml=test-results/test.xml \ +-n \ + +``` + +# Creating a unit test + +To create a unit test, use the existing test case that is most similar to your use case. The naming convention for test files is `test_vendor_product.py`. + +Below is a template for the unit test file: + +``` +# Copyright Splunk, Inc. +# +# Use of this source code is governed by a BSD-2-clause-style +# license that can be found in the LICENSE-BSD2 file or at +# https://opensource.org/licenses/BSD-2-Clause +import datetime +import pytest + +from jinja2 import Environment, select_autoescape + +from .sendmessage import sendsingle +from .splunkutils import splunk_single +from .timeutils import time_operations + +env = Environment(autoescape=select_autoescape(default_for_string=False)) + + +@pytest.mark.addons("") +def test__( + record_property, setup_splunk, setup_sc4s, get_host_key +): + host = get_host_key + mt = env.from_string( + "{{ mark }}{{ bsd }} {{ host }} " + ) + + dt = datetime.datetime.now(datetime.timezone.utc) + _, bsd, _, _, _, _, epoch = time_operations(dt) + message = mt.render(mark="<134>", bsd=bsd, host=host) + + # Tune time functions + epoch = epoch[:-7] + sendsingle(message, setup_sc4s[0], setup_sc4s[1][514]) + st = env.from_string( + f'search _time={epoch} index=netfw host="{host}" sourcetype=""' + ) + search = st.render(epoch=epoch) + + result_count, _ = splunk_single(setup_splunk, search) + + record_property("resultCount", result_count) + record_property("message", message) + + assert result_count == 1 +``` + +Before you put your raw log into the unit test, make sure that: + +- The message is a valid Python string with escape characters placed correctly. +- Sensitive data has been redacted. + +**Timestamps in tests** + +When creating a unit test, pay close attention to time handling. You can use the `.timeutils` module to generate timestamps. The timestamp format you generate should match the original event format. In most cases, start by getting the current UTC time: + +`dt = datetime.datetime.now(datetime.timezone.utc)` + +Then use `time_operations`, which returns: + +- `iso` — ISO 8601 / RFC 5424-style timestamp, e.g., `2026-03-09T15:04:05.123456+01:00` +- `bsd` — BSD syslog / RFC 3164-style timestamp (`%b %d %H:%M:%S`), e.g., `Mar 09 15:04:05` +- `time` — time of day with microseconds only (`%H:%M:%S.%f`), e.g., `15:04:05.123456` +- `date` — calendar date (`%Y-%m-%d`), e.g., `2026-03-09` +- `tzoffset` — timezone offset from local tz, e.g., `+0100` +- `tzname` — timezone name, e.g., `UTC`, `CET`, `PDT` +- `epoch` — epoch seconds plus microseconds as a string (`%s.%f`), e.g., `1741532645.123456`. It is usually trimmed with `[:-7]` for seconds only and `[:-3]` for milliseconds. + +When creating a message template, make sure the format matches the original message. In some cases, the timestamp is not part of the header. For example, in this CEF message the timestamp is part of the `rt` field: + +``` +mt = env.from_string( + "{{ mark }} CEF:0|A10|vThunder|4.1.4-GR1-P12|WAF|session-id|2|rt={{ bsd }} src=1.1.1.1 spt=34860 dst=1.1.1.1 dpt=80 dhost=test.host.local cs1=uiext_sec_waf cs2=1 act=learn cs3=learn app=HTTP requestMethod=GET cn1=0 request=/sales/ msg=New session created: Id\=1\n" +) +``` diff --git a/docs/experiments.md b/docs/experiments.md index 0d8458de27..2f47d2d67c 100644 --- a/docs/experiments.md +++ b/docs/experiments.md @@ -10,7 +10,7 @@ To use the eBPF feature, you must have a host machine with and OS that supports To learn more visit this [blog post.](https://www.syslog-ng.com/community/b/blog/posts/syslog-ng-4-2-extra-udp-performance) ### Parallelize (TCP) -SC4S processes incoming messages from a TCP connection in a single thread. While this is adequate for many connections, it doesn't work efficiently when using a single or few high-traffic connections. This feature allows SC4S to process log messages from a single high-traffic TCP connection in multiple threads, which increases processing performance on multi-core machines. +SC4S processes incoming messages from a TCP connection in a single thread. While this is adequate for many connections, it does not work efficiently when using a single or few high-traffic connections. This feature allows SC4S to process log messages from a single high-traffic TCP connection in multiple threads, which increases processing performance on multi-core machines. To learn more, see the [Configuration documentation](./configuration.md#parallelize), as well as this [blog post](https://www.syslog-ng.com/community/b/blog/posts/accelerating-single-tcp-connections-in-syslog-ng-parallelize). diff --git a/docs/faq.md b/docs/faq.md index f5012a851c..97f6e04f81 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -52,7 +52,7 @@ The syslog protocol limits the extent to which you can make any syslog collectio **Q: I’m worried about data loss if SC4S goes down. Could I feed syslog to redundant SC4S servers to provide HA, without creating duplicate events in Splunk?** -A: In many system design decisions there is some level of compromise. Any network protocol that doesn't have an application level ACK will lose data because speed is selected over reliability in the design. This is the case with syslog. Use a clustered IP with an active/passive node for a level of resilience while keeping complexity to a minimum. +A: In many system design decisions there is some level of compromise. Any network protocol that does not have an application level ACK will lose data because speed is selected over reliability in the design. This is the case with syslog. Use a clustered IP with an active/passive node for a level of resilience while keeping complexity to a minimum. It could be possible to implement a far more complex solution utilizing an additional intermediary technology like Kafka, however the costs may outweigh the real world benefits. **Q: If the XL reference HW can handle just under 1 terabyte per day, how can SC4S be scaled to handle large deployments of many terabytes per day?** diff --git a/docs/gettingstarted/byoe-rhel8.md b/docs/gettingstarted/byoe-rhel8.md index 830cb69861..8049d2cf4b 100644 --- a/docs/gettingstarted/byoe-rhel8.md +++ b/docs/gettingstarted/byoe-rhel8.md @@ -4,7 +4,7 @@ Support for all non-containerized deployments using the "bring your own environment" method, as detailed in this documentation, will end in **April 2026**. After that, **no further updates, security patches, or technical support will be provided for this deployment method**. Continued use may expose your environment to security risks and compatibility issues. - We recommend planning your migration to one of the other supported platform or deployment method before April 2026 to ensure continued support and security. See [Select a Container Runtime and SC4S Configuration](https://splunk.github.io/splunk-connect-for-syslog/main/gettingstarted/getting-started-runtime-configuration/#step-3-select-a-container-runtime-and-sc4s-configuration) for more details. + We recommend planning your migration to one of the other supported platform or deployment method before April 2026 to ensure continued support and security. See [Select a Container Runtime and SC4S Configuration](getting-started-runtime-configuration.md#step-3-select-a-container-runtime-and-sc4s-configuration) for more details. Configuring SC4S in a non-containerized SC4S deployment requires a custom configuration. Note that since Splunk does not control your unique environment, we cannot help with setting up environments, debugging networking, etc. Consider this configuration only if: @@ -162,4 +162,4 @@ SC4S_LISTEN_DEFAULT_TLS_PORT=6514 ``` ### Create unique dedicated listening ports -For some source technologies, categorization by message content is not possible. To collect these sources, dedicate a unique listening port to a specific source. See [Sources](https://splunk.github.io/splunk-connect-for-syslog/main/sources/) for more information. +For some source technologies, categorization by message content is not possible. To collect these sources, dedicate a unique listening port to a specific source. See [Sources](../sources/index.md) for more information. diff --git a/docs/gettingstarted/docker-compose-MacOS.md b/docs/gettingstarted/docker-compose-MacOS.md index b32e8281ab..af820d425a 100644 --- a/docs/gettingstarted/docker-compose-MacOS.md +++ b/docs/gettingstarted/docker-compose-MacOS.md @@ -47,7 +47,7 @@ Each listening port on the container must be mapped to a listening port on the h To configure unique ports: -1. Modify the `/opt/sc4s/env_file` file to include the port-specific environment variables. See the [Sources](https://splunk.github.io/splunk-connect-for-syslog/main/sources/) +1. Modify the `/opt/sc4s/env_file` file to include the port-specific environment variables. See the [Sources](../sources/index.md) documentation to identify the specific environment variables that are mapped to each data source vendor and technology. 2. Modify the Docker Compose file that starts the SC4S container so that it reflects the additional listening ports you have created. You can amend the Docker Compose file with additional `target` stanzas in the `ports` section of the file (after the default ports). For example, the following additional `target` and `published` lines provide for 21 additional technology-specific UDP and TCP ports: diff --git a/docs/gettingstarted/getting-started-runtime-configuration.md b/docs/gettingstarted/getting-started-runtime-configuration.md index 6f41a37ee8..fe7933238f 100644 --- a/docs/gettingstarted/getting-started-runtime-configuration.md +++ b/docs/gettingstarted/getting-started-runtime-configuration.md @@ -49,7 +49,7 @@ and a filter `example.conf` in the `log_paths` and `filters` subdirectories. Co * In the `local/context` directory, change the "non-example" version of a file (e.g. `splunk_metadata.csv`) to preserve the changes upon restart. * `/opt/sc4s/archive` is a mount point for local storage of syslog events -if the optional mount is uncommented. The events are written in the syslog-ng EWMM format. See the [Configuration](https://splunk.github.io/splunk-connect-for-syslog/main/configuration/) +if the optional mount is uncommented. The events are written in the syslog-ng EWMM format. See the [Configuration](../configuration.md) topic for information about the directory structure that the archive uses. * `/opt/sc4s/tls` is a mount point for custom TLS certificates if the optional mount is uncommented. diff --git a/docs/gettingstarted/k8s-microk8s.md b/docs/gettingstarted/k8s-microk8s.md index ac4db1e447..0893da8c30 100644 --- a/docs/gettingstarted/k8s-microk8s.md +++ b/docs/gettingstarted/k8s-microk8s.md @@ -9,7 +9,7 @@ SC4S with MicroK8s leverages features of MicroK8s: * Uses MetalLB to preserve the source IP. * Works with any of the following operating systems: Windows, CentOS, RHEL, Ubuntu, Debian. -Splunk maintains container images, but it doesn't directly support or otherwise provide resolutions for issues within the runtime environment. +Splunk maintains container images, but it does not directly support or otherwise provide resolutions for issues within the runtime environment. ## Step 1: Allocate IP addresses This configuration requires as least two IP addresses: one for the host and one for the internal load balancer. We suggest allocating three IP addresses for the host and 5-10 IP addresses for later use. diff --git a/docs/gettingstarted/podman-systemd-general.md b/docs/gettingstarted/podman-systemd-general.md index de66c48ed6..061fbd82d8 100644 --- a/docs/gettingstarted/podman-systemd-general.md +++ b/docs/gettingstarted/podman-systemd-general.md @@ -210,4 +210,4 @@ systemctl --user enable sc4s systemctl --user start sc4s ``` -The remainder of the setup can be found in the [main setup instructions](https://splunk.github.io/splunk-connect-for-syslog/main/gettingstarted/quickstart_guide/). +The remainder of the setup can be found in the [main setup instructions](quickstart_guide.md). diff --git a/docs/sources/base/cef.md b/docs/sources/base/cef.md index 846b165e96..bade9f4fcd 100644 --- a/docs/sources/base/cef.md +++ b/docs/sources/base/cef.md @@ -10,7 +10,7 @@ Imperva, and Cyberark. Therefore, the CEF environment variables for unique port should be set only _once_. If your deployment has multiple CEF devices that send to more than one port, -set the CEF unique port variable(s) as a comma-separated list. See [Unique Listening Ports](https://splunk-connect-for-syslog.readthedocs.io/en/develop/sources/#unique-listening-ports) +set the CEF unique port variable(s) as a comma-separated list. See [Unique Listening Ports](../index.md#unique-listening-ports) for details. The source documentation included below is a reference baseline for any product that sends data diff --git a/docs/sources/base/leef.md b/docs/sources/base/leef.md index 3fabc4be82..08c5c95585 100644 --- a/docs/sources/base/leef.md +++ b/docs/sources/base/leef.md @@ -10,7 +10,7 @@ as well as other legacy systems. Therefore, the LEEF environment variables for should be set only _once_. If your deployment has multiple LEEF devices that send to more than one port, -set the LEEF unique port variable(s) as a comma-separated list. See [Unique Listening Ports](https://splunk-connect-for-syslog.readthedocs.io/en/develop/sources/#unique-listening-ports) +set the LEEF unique port variable(s) as a comma-separated list. See [Unique Listening Ports](../index.md#unique-listening-ports) for details. The source documentation included below is a reference baseline for any product that sends data diff --git a/docs/sources/index.md b/docs/sources/index.md index e5e63416f6..9e32d70ce0 100644 --- a/docs/sources/index.md +++ b/docs/sources/index.md @@ -16,19 +16,26 @@ definition of a specific port which will be used as a property of the event or b Many log sources can be supported using one of the flexible options available without specific code known as app-parsers. -New supported sources are added regularly. Please submit an [issue](https://github.com/splunk/splunk-connect-for-syslog/issues) with a description of the vend/product. Configuration information an a compressed pcap (.zip) from a non-production environment to request support for a new source. +New supported sources are added regularly. Please submit an [issue](https://github.com/splunk/splunk-connect-for-syslog/issues) with a description of the vendor/product. Configuration information and a compressed pcap (.zip) from a non-production environment to request support for a new source. Many sources can be self supported. While we encourage sharing new sources via the github project to promote consistency and develop best-practices there is no requirement to engage in the community. -* Sources that are *compliant* with RFC 5424,RFC 5425, RFC 5426, or RFC 6587 can be onboarded as [simple sources](https://splunk.github.io/splunk-connect-for-syslog/main/sources/base/simple/) -* Sources "compatible" with RFC3164 Note incorrect use of the syslog version, or "creative" formats in the time stamp or other fields may prevent use as [simple sources](https://splunk.github.io/splunk-connect-for-syslog/main/sources/base/simple/) -* Common Event Format [CEF](https://splunk.github.io/splunk-connect-for-syslog/main/sources/base/cef/) Also known as ArcSight format -* Log Extended Format [LEEF](https://splunk.github.io/splunk-connect-for-syslog/main/sources/base/leef/) +* Sources that are compliant with RFC 5424, RFC 5425, RFC 5426, or RFC 6587 can be onboarded as [simple sources](simple.md). +* Sources compatible with RFC 3164 can also be onboarded as [simple sources](simple.md). Note that incorrect use of the syslog version, or “creative” formats in the timestamp or other fields may prevent this method of onboarding and will require writing custom parsers. + +Other popular log formats that are supported by SC4S are: + +* Common Event Format [CEF](base/cef.md), also known as ArcSight format. +* Log Extended Format [LEEF](base/leef.md). + +## Common Patterns + +This section covers the most basic and common patterns for onboarding data with user-made parsers. If you want to read more, see [Create a Parser](../creating_parsers/index.md). ### Almost Syslog -Sources sending legacy non conformant 3164 like streams can be assisted by the creation of an "Almost Syslog" Parser. In an such a parser the goal is to process the syslog header allowing other parsers -to correctly parse and handle the event. The following example is take from a currently supported format where the source product used epoch in the time stamp field. +Sources sending legacy non conformant 3164 like streams can be assisted by the creation of an "Almost Syslog" Parser. In such a parser the goal is to process the syslog header allowing other parsers +to correctly parse and handle the event. The following example is taken from a currently supported format where the source product used epoch in the timestamp field. ```c #Example event @@ -64,7 +71,7 @@ to correctly parse and handle the event. The following example is take from a cu }; ``` -## Standard Syslog using message parsing +### Standard Syslog using message parsing Syslog data conforming to RFC3164 or complying with RFC standards mentioned above can be processed with an app-parser allowing the use of the default port rather than requiring custom ports the following example take from a currently supported source uses the value of "program" to identify the source as this program value is @@ -94,7 +101,7 @@ application alcatel_switch[sc4s-syslog] { }; ``` -## Standard Syslog vendor product by source +### Standard Syslog vendor product by source In some cases standard syslog is also generic and can not be disambiguated from other sources by message content alone. When this happens and only a single source type is desired the "simple" option above is valid but requires managing a port. @@ -154,44 +161,21 @@ application cisco_ios_debug-postfilter[sc4s-postfilter] { }; ``` -## Another example to drop events based on "src" and "action" values in message -```c -#filename: /opt/sc4s/local/config/app_parsers/rewriters/app-dest-rewrite-checkpoint_drop - -block parser app-dest-rewrite-checkpoint_drop-d_fmt_hec_default() { - channel { - rewrite(r_set_dest_splunk_null_queue); - }; -}; - -application app-dest-rewrite-checkpoint_drop-d_fmt_hec_default[sc4s-lp-dest-format-d_hec_fmt] { - filter { - match('checkpoint' value('fields.sc4s_vendor') type(string)) - and match('syslog' value('fields.sc4s_product') type(string)) - - and match('Drop' value('.SDATA.sc4s@2620.action') type(string)) - and match('12.' value('.SDATA.sc4s@2620.src') type(string) flags(prefix) ); - - }; - parser { app-dest-rewrite-checkpoint_drop-d_fmt_hec_default(); }; -}; -``` - -## The SC4S "fallback" sourcetype +### The SC4S "fallback" sourcetype -If SC4S receives an event on port 514 which has no soup filter, that event will be given a "fallback" sourcetype. If you see events in Splunk with the fallback sourcetype, then you should figure out what source the events are from and determine why these events are not being sourcetyped correctly. The most common reason for events categorized as "fallback" is the lack of a SC4S filter for that source, and in some cases a misconfigured relay which alters the integrity of the message format. In most cases this means a new SC4S filter must be developed. In this situation you can either build a filter or file an issue with the community to request help. +If SC4S receives an event on port 514 which has no matching parser, that event will be given a "fallback" sourcetype. If you see events in Splunk with the fallback sourcetype, then you should figure out what source the events are from and determine why these events are not being sourcetyped correctly. The most common reason for events categorized as "fallback" is the lack of a SC4S parser for that source, and in some cases a misconfigured relay which alters the integrity of the message format. In most cases this means a new SC4S parser must be developed. In this situation you can either build a parser or file an issue with the community to request help. -The "fallback" sourcetype is formatted in JSON to allow the administrator to see the constituent syslog-ng "macros" (fields) that have been automatically parsed by the syslog-ng server An RFC3164 (legacy BSD syslog) "on the wire" raw message is usually (but unfortunately not always) comprised of the following syslog-ng macros, in this order and spacing: +The "fallback" sourcetype is formatted in JSON to allow the administrator to see the constituent syslog-ng "macros" (fields) that have been automatically parsed by the syslog-ng server. An RFC3164 (legacy BSD syslog) "on the wire" raw message is usually (but unfortunately not always) comprised of the following syslog-ng macros, in this order and spacing: ``` <$PRI> $HOST $LEGACY_MSGHDR$MESSAGE ``` -These fields can be very useful in building a new filter for that sourcetype. In addition, the indexed field `sc4s_syslog_format` is helpful in determining if the incoming message is standard RFC3164. A value of anything other than `rfc3164` or `rfc5424_strict` indicates a vendor perturbation of standard syslog, which will warrant more careful examination when building a filter. +These fields can be very useful in building a new parser for that sourcetype. In addition, the indexed field `sc4s_syslog_format` is helpful in determining if the incoming message is standard RFC3164. A value of anything other than `rfc3164` or `rfc5424_strict` indicates a vendor perturbation of standard syslog, which will warrant more careful examination when building a parser. ## Splunk Connect for Syslog and Splunk metadata -A key aspect of SC4S is to properly set Splunk metadata prior to the data arriving in Splunk (and before any TA processing takes place. The filters will apply the proper index, source, sourcetype, host, and timestamp metadata automatically by individual data source. Proper values for this metadata (including a recommended index) are included with all "out-of-the-box" log paths included with SC4S and are chosen to properly interface with the corresponding TA in Splunk. The administrator will need to ensure all recommended indexes be created to accept this data if the defaults are not changed. +A key aspect of SC4S is to properly set Splunk metadata prior to the data arriving in Splunk (and before any TA processing takes place). The parsers will apply the proper index, source, sourcetype, host, and timestamp metadata automatically by individual data source. Proper values for this metadata (including a recommended index) are included with all "out-of-the-box" log paths included with SC4S and are chosen to properly interface with the corresponding TA in Splunk. The administrator will need to ensure all recommended indexes be created to accept this data if the defaults are not changed. It is understood that default values will need to be changed in many installations. Each source documented in this section has a table entitled "Sourcetype and Index Configuration", which highlights the default index and sourcetype for each source. See the section "SC4S metadata configuration" in the "Configuration" page for more information on how to override the default values in this table. diff --git a/docs/sources/base/simple.md b/docs/sources/simple.md similarity index 100% rename from docs/sources/base/simple.md rename to docs/sources/simple.md diff --git a/docs/sources/vendor/Cisco/cisco_meraki.md b/docs/sources/vendor/Cisco/cisco_meraki.md index 5968fb1924..e03eec895c 100644 --- a/docs/sources/vendor/Cisco/cisco_meraki.md +++ b/docs/sources/vendor/Cisco/cisco_meraki.md @@ -1,9 +1,9 @@ ## Meraki (MR, MS, MX) ## Key facts -* Cisco Meraki messages are not distinctive, which means that it's impossible to parse the sourcetype based on the log message. -* Because of the above you should either configure known Cisco Meraki hosts in SC4S, or open unique ports for Cisco Meraki devices. -* [Splunk Add-on for Cisco Meraki 2.1.0](https://splunkbase.splunk.com/app/5580) doesn't support syslog. Use [TA-meraki](https://splunkbase.splunk.com/app/3018) instead. `TA-meraki 1.1.5` requires sourcetype `meraki`. +* Cisco Meraki messages are not distinctive, which means that it is impossible to parse the sourcetype based on the log message. +* Because of the above, you should either configure known Cisco Meraki hosts in SC4S, or open unique ports for Cisco Meraki devices. +* [Splunk Add-on for Cisco Meraki 2.1.0](https://splunkbase.splunk.com/app/5580) does not support syslog. Use [TA-meraki](https://splunkbase.splunk.com/app/3018) instead. `TA-meraki 1.1.5` requires sourcetype `meraki`. ## Links diff --git a/docs/troubleshooting/syslog_pcap_sender.md b/docs/troubleshooting/syslog_pcap_sender.md index b66f07d398..47a1e99edf 100644 --- a/docs/troubleshooting/syslog_pcap_sender.md +++ b/docs/troubleshooting/syslog_pcap_sender.md @@ -91,7 +91,7 @@ Send to Syslog Server ### Q: Why use this instead of tcpreplay? -**A:** Traditional packet replay doesn't work for TCP syslog because: +**A:** Traditional packet replay does not work for TCP syslog because: - TCP requires valid connection state (sequence numbers) - Replayed packets have old sequence numbers - Destination rejects packets (no matching connection) diff --git a/docs/troubleshooting/troubleshoot_SC4S_server.md b/docs/troubleshooting/troubleshoot_SC4S_server.md index 7cf367a6c2..672fc94e9c 100644 --- a/docs/troubleshooting/troubleshoot_SC4S_server.md +++ b/docs/troubleshooting/troubleshoot_SC4S_server.md @@ -88,7 +88,7 @@ This is an indication that the standard `d_hec` destination in syslog-ng, which ## Issue: Invalid SC4S listening ports -[SC4S exclusively grants a port to a device when `SC4S_LISTEN_{vendor}_{product}_{TCP/UDP/TLS}_PORT={port}`](https://splunk.github.io/splunk-connect-for-syslog/main/sources/#unique-listening-ports). +[SC4S exclusively grants a port to a device when `SC4S_LISTEN_{vendor}_{product}_{TCP/UDP/TLS}_PORT={port}`](../sources/index.md#unique-listening-ports). During startup, SC4S validates that listening ports are configured correctly, and shows any issues in container logs. diff --git a/docs/troubleshooting/troubleshoot_resources.md b/docs/troubleshooting/troubleshoot_resources.md index e9c9c5af2f..e603ca258e 100644 --- a/docs/troubleshooting/troubleshoot_resources.md +++ b/docs/troubleshooting/troubleshoot_resources.md @@ -48,35 +48,43 @@ Here are some options for obtaining raw logs for one or more sourcetypes: ``` <165>1 2007-02-15T09:17:15.719Z router1 mgd 3046 UI_DBASE_LOGOUT_EVENT [junos@2636.1.1.1.2.18 username="user"] User 'user' exiting configuration mode ``` + +* Obtain a raw log message using Wireshark. +Once you get your stream of messages, copy one of them. Note that in UDP there are not usually any message separators. +You can also read the logs using Wireshark from the .pcap file. From Wireshark go to Statistics > Conversations, then click on `Follow Stream`: +![ws_conversation](../resources/images/ws_conv.png) + * Edit `env_file` to set the variable `SC4S_SOURCE_STORE_RAWMSG=yes` and restart SC4S. This stores the raw message in a syslog-ng macro called `RAWMSG` and is displayed in Splunk for all `fallback` messages. -* For most other sourcetypes, the `RAWMSG` is not displayed, but can be -viewed by changing the output template to one of the JSON variants, including t_JSON_3164 or t_JSON_5424, depending on RFC message type. See -[SC4S metadata configuration](https://splunk-connect-for-syslog.readthedocs.io/en/develop/configuration/#sc4s-metadata-configuration) for -more details. -* In order to send `RAWMSG` to Splunk regardless the sourcetype you can also temporarily place the following final filter in the local parser directory: -```conf -block parser app-finalfilter-fetch-rawmsg() { - channel { - rewrite { - r_set_splunk_dest_default( - template('t_fallback_kv') - ); + + * For most other sourcetypes, the `RAWMSG` is not displayed, but can be + viewed by changing the output template to one of the JSON variants, including t_JSON_3164 or t_JSON_5424, depending on RFC message type. See + [SC4S metadata configuration](../configuration.md#sc4s-metadata-configuration) for + more details. + + * In order to send `RAWMSG` to Splunk regardless of the sourcetype you can also temporarily place the following final filter in the local parser directory: + ```conf + block parser app-finalfilter-fetch-rawmsg() { + channel { + rewrite { + r_set_splunk_dest_default( + template('t_fallback_kv') + ); + }; }; }; -}; -application app-finalfilter-fetch-rawmsg[sc4s-finalfilter] { - parser { app-finalfilter-fetch-rawmsg(); }; -}; -``` -Once you have edited `SC4S_SOURCE_STORE_RAWMSG=yes` in `/opt/sc4s/env_file` and the `finalfilter` placed in `/opt/sc4s/local/config/app_parsers`, restart the SC4S instance to add raw messages to all the messages sent to Splunk. + application app-finalfilter-fetch-rawmsg[sc4s-finalfilter] { + parser { app-finalfilter-fetch-rawmsg(); }; + }; + ``` + Once you have edited `SC4S_SOURCE_STORE_RAWMSG=yes` in `/opt/sc4s/env_file` and the `finalfilter` placed in `/opt/sc4s/local/config/app_parsers`, restart the SC4S instance to add raw messages to all the messages sent to Splunk. -**NOTE:** Be sure to turn off the `RAWMSG` variable when you are finished, because it doubles the memory and disk requirements of SC4S. Do not -use `RAWMSG` in production. + **NOTE:** Be sure to turn off the `RAWMSG` variable when you are finished, because it doubles the memory and disk requirements of SC4S. Do not + use `RAWMSG` in production. -* You can enable the alternate destination `d_rawmsg` for one or more sourcetypes. This destination will write the raw messages to the -container directory `/var/syslog-ng/archive/rawmsg/`, which is typically mapped locally to `/opt/sc4s/archive`. Within this directory, the logs are organized by host and time. + * You can enable the alternate destination `d_rawmsg` for one or more sourcetypes. This destination will write the raw messages to the + container directory `/var/syslog-ng/archive/rawmsg/`, which is typically mapped locally to `/opt/sc4s/archive`. Within this directory, the logs are organized by host and time. ## Run `exec` into the container (advanced task) @@ -148,7 +156,7 @@ application app-dest-rewrite-device-d_fmt_hec_default[sc4s-postfilter] { ``` Note that filter match statement should be aligned to your data -The parser accepts time zone in formats: "America/New York" or "EST5EDT", but not short in form such as "EST". +The parser accepts time zone in formats: "America/New York" or "EST5EDT", but not in short form such as "EST". ## Issue: CyberArk log problems When data is received on the indexers, all events are merged together into one event. Check the following link for CyberArk configuration information: diff --git a/docs/upgrade.md b/docs/upgrade.md index 57a3765f4e..2beb3f1bb1 100644 --- a/docs/upgrade.md +++ b/docs/upgrade.md @@ -42,7 +42,7 @@ In NetApp ONTAP, the ontap:ems sourcetype has been updated to netapp:ontap:audit * New images will no longer be published to Docker Hub. Review the current Getting Started docs and update the `sc4s.service` file accordingly. * Internal metrics will now use the multi format by default. If your system uses unsupported versions of Splunk 8.1 or earlier, see the Configuration Documentation for information on how to revert to event or single format. * Internal metrics will now use the `_metrics` index by default. Update `vendor_product` key 'sc4s_metrics' to change the index. -* `vendor_product_by_source` is deprecated for null queue or dropping events. This use will be removed in version 3. See [Filtering events from output](https://splunk.github.io/splunk-connect-for-syslog/main/sources/). +* `vendor_product_by_source` is deprecated for null queue or dropping events. This use will be removed in version 3. See [Filtering events from output](sources/index.md#filtering-events-from-output). * `SPLUNK_HEC_ALT_DESTS` is deprecated and will be ignored. * `SC4S_DEST_GLOBAL_ALTERNATES` is deprecated and will be removed in future major versions. * Corrected Vendor/Product keys. See the following source documentation pages and revised configuration as part of your upgrade: diff --git a/mkdocs.yml b/mkdocs.yml index 0118e6dba9..d303d84e6b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -66,13 +66,18 @@ nav: - mk8s: "gettingstarted/ansible-mk8s.md" - Cloud (Experimental): - EKS (Experimental): "gettingstarted/eks.md" - - Create a parser: "create-parser.md" - - Configuration: "configuration.md" - - Destinations: "destinations.md" - Sources: - Read First: "sources/index.md" - - Basic Onboarding: "sources/base" + - Message Formats: "sources/base" + - Simple Source: "sources/simple.md" - Known Vendors: "sources/vendor" + - Create a parser: + - Read First: "creating_parsers/index.md" + - Filter Messages: "creating_parsers/filter_message.md" + - Parse Messages: "creating_parsers/parse_message.md" + - Unit Tests: "creating_parsers/unit_tests.md" + - Configuration: "configuration.md" + - Destinations: "destinations.md" - SC4S Lite: - Intro: "lite.md" - Pluggable modules: "pluggable_modules.md"