Ideone.com

fork download

copy

import re
 
sentences = ['[\'Jan 31 19:28:14 nginx: 10.0.0.0 - - [31/Jan/2019:19:28:14 +0100] "POST /test/itf/ HTTP/x.x" 404 146 "-" "Mozilla/5.2 [en] (X11, U; OpenVAS-XX 9.2.7)"\']']
 
rx = re.compile(r'\b(\w{3})\s+(\d{1,2})\s+(\d{1,2}:\d{1,2}:\d{2})\s+(\w+)\W+(\d{1,3}(?:\.\d{1,3}){3})(?:\s+\S+){2}\s+\[([^][\s]+)\s+([+\d]+)]\s+"([A-Z]+)\s+(\S+)\s+(\S+)"\s+(\d+)\s+(\d+)\s+\S+\s+"([^"]*)"')
 
words=[]
 
for sent in sentences:
	m = rx.search(sent)
	if m:
		words.append(list(m.groups()))
	else:
		pass #words.append(nltk.word_tokenize(sent)  # uncomment in your code
 
print(words)

Success #stdin #stdout 0.02s 9664KB

stdin

copy

Standard input is empty

stdout

copy

[['Jan', '31', '19:28:14', 'nginx', '10.0.0.0', '31/Jan/2019:19:28:14', '+0100', 'POST', '/test/itf/', 'HTTP/x.x', '404', '146', 'Mozilla/5.2 [en] (X11, U; OpenVAS-XX 9.2.7)']]

https://ideone.com/zXDlb3

language:

Python 3 (python 3.12)

created:

visibility:

secret

Share or Embed source code

Discover > Sphere Engine API

The brand new service which powers Ideone!

Discover > IDE Widget

Widget for compiling and running the source code in a web browser!

Discover > Sphere Engine API

Discover > IDE Widget

Choose your language