fork download
  1. import re
  2.  
  3. pattern = r"\(pp\.\s+\d+(?:-\d+)?\)|\b\d+(?:-\d+)?(?=(?:\s*,\s*\d+(?:-\d+)?)*\.)"
  4.  
  5. s = ("- Mitchell, J.A. (2017). Citation: Why is it so important. Mendeley Journal, 67(2), (pp. 81-95). \n\n"
  6. "- Denhart, H. (2008). Deconstructing barriers: Perceptions of students labeled with learning disabilities in higher education. Journal of Learning Disabilities, 40,41, 483-497.\n\n"
  7. "(pp. 81). \n"
  8. "12-12\n"
  9. "http://t...content-available-to-author-only...t.com/12-23\n\n"
  10. "Usually the page numbers follow a commas and then there is a dot (like this: , 1-2. ) How can I change the code according to this? Same goes for when there is only one page listed , number. and the ` (pp. 12)` format.")
  11.  
  12. print(re.findall(pattern, s))
Success #stdin #stdout 0.04s 9444KB
stdin
Standard input is empty
stdout
['(pp. 81-95)', '40', '41', '483-497', '(pp. 81)', '1-2', '(pp. 12)']