Ideone.com

download

copy

from nltk.tokenize import RegexpTokenizer
 
def regex_tokenize(text="The cost of a youth pass for Caltrain costs $4.50"):
 
    pattern = r'''(?x)               # set flag to allow verbose regexps
                  ([A-Z]\.)+         # abbreviations, e.g. U.S.A.
                  | \$?\d+(\.\d+)?%? # numbers, incl. currency and percentages
                  | \w+([-']\w+)*    # words w/ optional internal hyphens/apostrophe
                  | @((\w)+([-']\w+))*
                  | [+/\-@&*]        # special characters with meanings
                '''
 
    #pattern = r'[+/\-@&*#](\w+)|(\w+)'
 
    tokenizer = RegexpTokenizer(pattern)
    token_list = tokenizer.tokenize(text)
 
    #print token_list
 
    return token_list

Runtime error #stdin #stdout #stderr 0.01s 7732KB

stdin

copy

Standard input is empty

stdout

copy

Standard output is empty

stderr

copy

Traceback (most recent call last):
  File "prog.py", line 1, in <module>
ImportError: No module named nltk.tokenize

https://ideone.com/wJpAtg

language:

Python (cpython 2.7.16)

created:

visibility:

public

Share or Embed source code

Discover > Sphere Engine API

The brand new service which powers Ideone!

Discover > IDE Widget

Widget for compiling and running the source code in a web browser!

Discover > Sphere Engine API

Discover > IDE Widget

Choose your language