In Python based data analysis projects, it is quite common to combine dataframe columns. Actually, it is super easy to to this. In this blog post, I am going to show you how to combine pandas dataframe columns easily.
There are more than one ways to combine multiple pandas dataframe columns. Luckily, the easiest method is also the fastest. However, there are some details you need to take into account, if you don’t pay attention to them, you will get errors.
First, let’s create a dataframe:
import pandas as pd
df = pd.DataFrame({
'Year': ['2019', '2020','2021','2022','2023'],
'Month': ['October','December','April','May','August']
})
print(df)
Year Month
0 2019 October
1 2020 December
2 2021 April
3 2022 May
4 2023 August
Combine Pandas Dataframe Columns:
In order to combine two or more pandas dataframe columns, just use the
df['Year-Month'] = df['Year']+'-'+df['Month']
print(df)
Year Month Year-Month
0 2019 October 2019-October
1 2020 December 2020-December
2 2021 April 2021-April
3 2022 May 2022-May
4 2023 August 2023-August
THINGS TO TAKE INTO ACCOUNT:
There are two things.
First thing to take into account:
When combining two or more pandas dataframe columns, data types of columns should be taken into account. It is only possible to combine if both columns are of string and/or object data type.
In order to learn the types of all columns of a dataframe, use
print(df.dtypes)
Year object
Month object
dtype: object
If, for example, one of the columns is string and the other one is integer, then you’ll get an error, because you can’t add an object array (containing strings) to a number array.
Let’s assume that one of the columns is of int data type:
df['Year']=df['Year'].astype(int)
#Now the Year column is "int" type
df['Year-Month'] = df['Year']+'-'+df['Month']
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')
In the above code, we’ve got an error, because we were trying to combine string arrays with number arrays.
So, we just make a type conversion:
df['Year-Month'] = df['Year'].astype(str)+'-'+df['Month']
print(df)
Year Month Year-Month
0 2019 October 2019-October
1 2020 December 2020-December
2 2021 April 2021-April
3 2022 May 2022-May
4 2023 August 2023-August
Problem solved!
Second thing to take into account:
The