MEHMET BALIOGLU

Python How To Remove Everything Except Letters & Numbers

Python How To Remove Everything Except Letters And Numbers

Letters and numbers are technically called alphanumeric characters. In this article, I am going to show you how to remove everything except letters and numbers from a string in Python. So you will keep only alphanumeric characters (letters and numbers) and delete everything else.

For this, we will use Python regular expression module. It is already installed, so no need to use pip. We just import it in our code.

Regular expressions are powerful tools. They are included in almost all major programming languages, including Python, Java, C, Javascript, PHP and SQL. Regular expressions are used to search and manipulate strings in a quite advanced way.

import re

mystring="2005-05/AA228. Supreme Court’s Advocate General Opinion"

mystring=re.sub(r'[^a-zA-Z0-9]', '',mystring)
print(mystring)

#200505AA228SupremeCourtsAdvocateGeneralOpinion

The code above removed everything and left only letters and numbers.

re.sub() replaces every occurrence of a pattern with a string or the result of a function. 

re.sub(r'[^a-zA-Z0-9]’, ”,mystring)

This is where the magic happens.

[^a-zA-Z0-9] means everything except letters (including capital letters) and numbers. The ^ at the beginning is a negation operator. It means not the following ones.

means do not replace the removed characters with anything. Just remove them.

Remove everything except letters and numbers and some exceptions

Delete everything except letters, numbers and spaces

We may want to remove everything except letters and numbers, but if we remove the spaces we loose the sentence structure. So we may want to keep spaces:

import re

mystring="2005-05/AA228. Supreme Court’s Advocate General Opinion"

#this keeps, letters, numbers and spaces:
mystring=re.sub(r'[^a-zA-Z0-9 ]', '',mystring)
print(mystring)

#200505AA228 Supreme Courts Advocate General Opinion

So, we add a space in the code:

#Attention to the space after "^a-zA-Z0-9"
re.sub(r'[^a-zA-Z0-9 ]', '',mystring)

Delete everything except letters and numbers, spaces and some characters

We removed all non alphanumeric characters, except spaces. But what if we want to keep some other characters, such as ?

import re

mystring="2005-05/AA228. Supreme Court’s Advocate General Opinion"

#this keeps, letters, numbers and spaces:
mystring=re.sub(r'[^a-zA-Z0-9- ]', '',mystring)
print(mystring)

#2005-05AA228 Supreme Courts Advocate General Opinion

That’s all. In this post, we showed how to remove everything except letters and numbers from a string in Python. Secondly, we showed how to remove everything except letters, numbers and spaces from a string. And finally we’ve seen how to remove everything except letters, numbers, spaces and certain characters that we wanted to keep, from a string.