Skip to content

Commit 7d26272

Browse files
committed
feat: skill parser-creator
1 parent c8a27d4 commit 7d26272

File tree

12 files changed

+490
-0
lines changed

12 files changed

+490
-0
lines changed
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
name: parser-creator
3+
description: Creates SC4S syslog-ng parsers. Use when the user wants to create a new parser, add support for a new log source or vendor, or says "create parser", "add parser", "new log source", or "new vendor support".
4+
---
5+
6+
# Parser Creator
7+
8+
You can use only command to run the unit tests `poetry run pytest test-name -v -s --tb=short -n=0`
9+
10+
## Goal
11+
12+
Create a new SC4S parser and test coverage for a vendor/product pair in both:
13+
14+
- main package (`package/etc/conf.d/conflib`)
15+
- lite package (`package/lite/etc/addons`)
16+
17+
## Prerequites
18+
19+
Collect this information from the user before you start:
20+
21+
1. `vendor`: vendor name (lowercase, for example `acme`)
22+
2. `product`: product name (lowercase, for example `firewall`)
23+
3. `sourcetype`: target Splunk sourcetype using `vendor:product` (for example `acme:firewall`)
24+
4. `index`: target Splunk index (for example `netfw`)
25+
5. `sample logs`: one or more raw syslog messages
26+
27+
If any item is missing, ask for it before proceeding.
28+
29+
## Workflow
30+
31+
### Step 1 - Identify message format
32+
33+
Examine the sample logs and identify the Syslog format.
34+
35+
1. RFC3164: `<PRI>TIMESTAMP HOSTNAME PROGRAM: MESSAGE`
36+
2. RFC5424: `<PRI>VERSION TIMESTAMP HOSTNAME APP-NAME PROCID MSGID SDATA MESSAGE`
37+
3. CEF: `<PRI>TIMESTAMP HOSTNAME CEF:0|<Device Vendor>|<Device Product>|<Device Version>|<Signature ID>|<Name>|<Severity>|<Extension fields>`
38+
39+
If logs do not match one of these formats, tell the user the format is currently unsupported and stop.
40+
41+
### Step 2 - Create a parser
42+
43+
Start by creating a filter for the log message. Filter has a following structure:
44+
45+
```
46+
application <filter-name>[<topic>] {
47+
filter {
48+
<filter_block>
49+
};
50+
parser { <parser-name>(); };
51+
};
52+
```
53+
54+
Filter are grouped into topics. Use one of the following topics:
55+
56+
1. `cef` - for CEF-formatted messages.
57+
2. `sc4s-syslog-pgm` - matches by program value (`PROGRAM` in RFC3164, `APP-NAME` in RFC5424).
58+
3. `sc4s-syslog-sdata` - matches by structured data (often a Private Enterprise Number, PEN). If PEN is present, prefer this topic.
59+
4. `sc4s-syslog` - general filter for RFC3164/RFC5424, usually based on message content.
60+
5. `sc4s-network-source` - matches by destination port. Use this only when other topics are not viable. Because this requires sending logs to a new port, ask the user for permission first. If the user refuses, stop and explain why parser creation cannot continue.
61+
62+
Next create `block parser`:
63+
64+
```
65+
block parser <parser-name>() {
66+
<parsers and filters blocks>
67+
<rewrite block>
68+
};
69+
```
70+
71+
If structured data or repeated key/value data exists, include a parser stage (`kv-parser`, `csv-parser`, or `regexp-parser`) before rewrite. Only skip parsing when the message is truly unstructured; if so, explicitly state this in the final response.
72+
73+
There are two rewrite functions. Choose the correct one:
74+
75+
1. `r_set_splunk_dest_default` — sets **all** base Splunk metadata. Every parser MUST call this exactly once as its first rewrite. Always include `index`, `sourcetype`, `vendor`, and `product`. Optionally include `source` and `template`.
76+
2. `r_set_splunk_dest_update_v2`**conditionally overrides** specific fields that were already set by `r_set_splunk_dest_default`. Use this ONLY in `if/elif` branches to change a subset of fields (e.g. sourcetype, index) based on message content. Never use it as the first or only rewrite.
77+
78+
`r_set_splunk_dest_default` example (required in every parser):
79+
80+
```
81+
rewrite {
82+
r_set_splunk_dest_default(
83+
index('netops')
84+
sourcetype('alcatel:switch')
85+
vendor("alcatel")
86+
product("switch")
87+
template('t_hdr_msg')
88+
);
89+
};
90+
```
91+
92+
`r_set_splunk_dest_update_v2` example (optional, only after default is set):
93+
94+
```
95+
rewrite {
96+
r_set_splunk_dest_update_v2(
97+
sourcetype('citrix:netscaler:appfw') condition(message(':(\s+\S+)?\s+APPFW(\s+\S+){3}\s+:'))
98+
);
99+
};
100+
```
101+
102+
To choose correct template refer to the definitions in file: `t_templates.conf`.
103+
104+
Parser method selection:
105+
106+
Use `kv-parser` when logs contain key/value pairs (`key=value`, quoted values, RFC5424 SDATA blocks).
107+
- For RFC5424 SDATA, prefer `template("${SDATA}")`.
108+
- Use a scoped prefix like `.values.sdata.`.
109+
110+
Example:
111+
112+
```
113+
block parser app-syslog-vendor_product() {
114+
channel {
115+
parser {
116+
kv-parser(prefix(".values.") template("$(template t_hdr_msg)"));
117+
};
118+
# Optional: validate parsing succeeded
119+
filter {
120+
"${.values.some_required_field}" ne ""
121+
};
122+
rewrite {
123+
r_set_splunk_dest_default(
124+
index('netfw')
125+
sourcetype('vendor:product')
126+
vendor("vendor")
127+
product("product")
128+
template('t_kv_values')
129+
);
130+
};
131+
};
132+
};
133+
```
134+
135+
Use `csv-parser` when logs are consistently delimited and have stable column order.
136+
137+
Example:
138+
139+
```
140+
parser {
141+
csv-parser(
142+
columns("col1","col2","col3","col4")
143+
prefix(".values.")
144+
delimiters(',')
145+
quote-pairs('""')
146+
flags(escape-double-char)
147+
);
148+
};
149+
```
150+
151+
Use `regexp-parser` when logs are structured but not key/value or delimited.
152+
Combine methods when logs have multiple variants.
153+
154+
Example:
155+
156+
```
157+
parser {
158+
regexp-parser(
159+
template("${MESSAGE}")
160+
patterns("^(?<field1>\\d+) (?<field2>[^ ]+) (?<field3>.*)")
161+
prefix(".parsed.")
162+
);
163+
};
164+
```
165+
166+
You can combine all methods and use conditional branches to parse different message variants:
167+
168+
```
169+
block parser app-syslog-vendor_product() {
170+
channel {
171+
rewrite {
172+
r_set_splunk_dest_default(
173+
index("netops")
174+
sourcetype('vendor:log')
175+
vendor("vendor")
176+
product('product')
177+
template('t_msg_only')
178+
);
179+
};
180+
181+
if (message(',TRAFFIC,' type(string) flags(substring))) {
182+
parser { csv-parser(columns(...) prefix(".values.") delimiters(',')); };
183+
rewrite {
184+
r_set_splunk_dest_update_v2(
185+
index('netfw')
186+
class('traffic')
187+
sourcetype('vendor:traffic')
188+
);
189+
};
190+
} elif (message(',SYSTEM,' type(string) flags(substring))) {
191+
parser { csv-parser(columns(...) prefix(".values.") delimiters(',')); };
192+
rewrite {
193+
r_set_splunk_dest_update_v2(
194+
index('netops')
195+
class('system')
196+
sourcetype('vendor:system')
197+
);
198+
};
199+
} else { };
200+
};
201+
};
202+
```
203+
204+
### Step 4 - Create unit test
205+
206+
Create a unit test for the new parser. Testing instructions: [testing-parsers](./references/testing-parsers.md).
207+
208+
### Step 5 - Run parser tests
209+
210+
Run the new test using `poetry run pytest test-name -v -s --tb=short -n=0` command and verify the parser works correctly.
211+
212+
## Completion Checklist
213+
214+
Before finishing, confirm all items:
215+
216+
- Parser/filter created for main package.
217+
- Parser/filter created for lite package.
218+
- Parser includes field extraction (`kv-parser`, `csv-parser`, and/or `regexp-parser`) when sample logs are parseable.
219+
- Lite vendor metadata exists (`addon_metadata.yaml`) when required.
220+
- `package/lite/etc/config.yaml` updated for new lite vendor addon.
221+
- Unit tests created and passing for the new parser.
222+
- User informed about any constraints (for example, unsupported format or required network-source port changes).
223+
- The parser files have only one `block parser` definition and only one `application` deffinition.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Here is an example of a CEF filter (replace values inside `<>` with your own):
2+
3+
```
4+
application app-cef-<device_vendor>_<device_product>[cef] {
5+
filter{
6+
match("<device_vendor>" value(".metadata.cef.device_vendor"))
7+
and match("<device_product>" value(".metadata.cef.device_product"));
8+
};
9+
parser { app-cef-<device_vendor>_<device_product>(); };
10+
};
11+
```
12+
13+
14+
<!-- DEPRECATED:
15+
16+
block parser app-cef-<device_vendor>_<device_product>() {
17+
channel {
18+
rewrite {
19+
r_set_splunk_dest_default(
20+
index('<index>'),
21+
source('<device_vendor>:<device_product>'),
22+
sourcetype('<device_vendor>:<device_product>:cef')
23+
vendor('<device_vendor>')
24+
product('<device_product>')
25+
);
26+
};
27+
};
28+
}; -->
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Here is an example of a `netsource` filter:
2+
3+
```
4+
application app-netsource-barracuda_syslog[sc4s-network-source] {
5+
filter {
6+
not filter(f_is_source_identified)
7+
and (
8+
(
9+
match("barracuda", value('.netsource.sc4s_vendor'), type(string))
10+
and match("syslog", value('.netsource.sc4s_product'), type(string))
11+
)
12+
or (tags("ns_vendor:barracuda") and tags("ns_product:syslog"))
13+
or tags(".source.s_BARRACUDA_SYSLOG")
14+
)
15+
;
16+
};
17+
parser { app-netsource-barracuda_syslog(); };
18+
};
19+
```
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Here is an example of a `pgm` filter (replace values inside `<>` with your own):
2+
3+
```
4+
application app-syslog-<vendor_name>_<product_name>[sc4s-syslog-pgm] {
5+
filter {
6+
program('<program>' type(string) flags(prefix));
7+
};
8+
parser { <parser-name>(); };
9+
};
10+
```
11+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Example of an `sdata` filter:
2+
3+
```
4+
application app-syslog-<vendor_name>_<product_name>[sc4s-syslog-sdata] {
5+
filter {
6+
match('@<PEN>' value("SDATA"));
7+
};
8+
parser { <parser-name>(); };
9+
};
10+
```
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Example of an `sc4s-syslog` topic filter:
2+
3+
```
4+
application app-syslog-<vendor_name>_<vendor_product>[sc4s-syslog] {
5+
filter {
6+
message('Carbon Black App Control event: ' type(string) flags(prefix));
7+
};
8+
parser { <parser-name>(); };
9+
};
10+
```
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Below is a template for the unit test file:
2+
3+
```
4+
# Copyright <current-year> Splunk, Inc.
5+
#
6+
# Use of this source code is governed by a BSD-2-clause-style
7+
# license that can be found in the LICENSE-BSD2 file or at
8+
# https://opensource.org/licenses/BSD-2-Clause
9+
import datetime
10+
import pytest
11+
12+
from jinja2 import Environment, select_autoescape
13+
14+
from .sendmessage import sendsingle
15+
from .splunkutils import splunk_single
16+
from .timeutils import time_operations
17+
18+
env = Environment(autoescape=select_autoescape(default_for_string=False))
19+
20+
21+
@pytest.mark.addons("<addon-name>")
22+
def test_palo_alto_test_os_cef(
23+
record_property, setup_splunk, setup_sc4s, get_host_key
24+
):
25+
host = get_host_key
26+
mt = env.from_string(
27+
"{{ mark }}{{ bsd }} {{ host }} <test-message>"
28+
)
29+
30+
dt = datetime.datetime.now(datetime.timezone.utc)
31+
_, bsd, _, _, _, _, epoch = time_operations(dt)
32+
message = mt.render(mark="<134>", bsd=bsd, host=host)
33+
34+
# Tune time functions
35+
epoch = epoch[:-7]
36+
sendsingle(message, setup_sc4s[0], setup_sc4s[1][514])
37+
st = env.from_string(
38+
f'search _time={epoch} index=netfw host="{host}" sourcetype="<sourcetype>"'
39+
)
40+
search = st.render(epoch=epoch)
41+
42+
result_count, _ = splunk_single(setup_splunk, search)
43+
44+
record_property("resultCount", result_count)
45+
record_property("message", message)
46+
47+
assert result_count == 1
48+
```
49+
50+
When creating a unit test, pay close attention to time handling. You can use the `.timeutils` module to generate timestamps. The timestamp format you generate should match the original event format. In most cases, start by getting the current UTC time:
51+
52+
`dt = datetime.datetime.now(datetime.timezone.utc)`
53+
54+
Then use `time_operations`, which returns:
55+
- iso - ISO 8601 / RFC5424-style timestamp e.g. 2026-03-09T15:04:05.123456+01:00
56+
- bsd - BSD syslog / RFC3164-style timestamp ("%b %d %H:%M:%S") e.g. Mar 09 15:04:05
57+
- time - time of day with microseconds only ("%H:%M:%S.%f") e.g. 15:04:05.123456
58+
- date - calendar date ("%Y-%m-%d") e.g. 2026-03-09
59+
- tzoffset - timezone offset from local tz e.g. +0100
60+
- tzname - timezone name e.g. UTC, CET, PDT
61+
- epoch - epoch seconds plus microseconds as a string (`%s.%f`), for example `1741532645.123456`. It is usually trimmed with `[:-7]` for seconds only and `[:-3]` for milliseconds.
62+
63+
When creating a message template, make sure the format matches the original message itself. In some cases, the timestamp is not part of the header. For example, in this CEF message:
64+
65+
```
66+
mt = env.from_string(
67+
"{{ mark }} CEF:0|A10|vThunder|4.1.4-GR1-P12|WAF|session-id|2|rt={{ bsd }} src=1.1.1.1 spt=34860 dst=1.1.1.1 dpt=80 dhost=test.host.local cs1=uiext_sec_waf cs2=1 act=learn cs3=learn app=HTTP requestMethod=GET cn1=0 request=/sales/ msg=New session created: Id\=1\n"
68+
)
69+
```
70+
71+
the timestamp is part of the `rt` field.
72+
73+
Always use the full event in the test; do not truncate it. If the user provides multiple events (fewer than 10), use all of them in the tests (parameterize the test).

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -388,3 +388,7 @@ replay
388388
.addon/
389389
Pipfile
390390
event.txt
391+
392+
# vscode
393+
394+
.vscode/launch.json

0 commit comments

Comments
 (0)