MEHMET BALIOGLU

🏅 How to Combine Pandas Dataframe Columns Easily

Combine Pandas Dataframe Columns

In Python based data analysis projects, it is quite common to combine dataframe columns. Actually, it is super easy to to this. In this blog post, I am going to show you how to combine pandas dataframe columns easily.

There are more than one ways to combine multiple pandas dataframe columns. Luckily, the easiest method is also the fastest. However, there are some details you need to take into account, if you don’t pay attention to them, you will get errors.

First, let’s create a dataframe:

import pandas as pd

df = pd.DataFrame({
                   'Year': ['2019', '2020','2021','2022','2023'], 
                   'Month': ['October','December','April','May','August']
                  })

print(df)

   Year     Month
0  2019   October
1  2020  December
2  2021     April
3  2022       May
4  2023    August

Combine Pandas Dataframe Columns:

In order to combine two or more pandas dataframe columns, just use the + operator.

df['Year-Month'] = df['Year']+'-'+df['Month']
print(df)

   Year     Month     Year-Month
0  2019   October   2019-October
1  2020  December  2020-December
2  2021     April     2021-April
3  2022       May       2022-May
4  2023    August    2023-August

THINGS TO TAKE INTO ACCOUNT:

There are two things.

First thing to take into account:

When combining two or more pandas dataframe columns, data types of columns should be taken into account. It is only possible to combine if both columns are of string and/or object data type.

In order to learn the types of all columns of a dataframe, use dtypes method.

print(df.dtypes)

Year     object
Month    object
dtype: object

If, for example, one of the columns is string and the other one is integer, then you’ll get an error, because you can’t add an object array (containing strings) to a number array.

Let’s assume that one of the columns is of int data type:

df['Year']=df['Year'].astype(int)
#Now the Year column is "int" type

df['Year-Month'] = df['Year']+'-'+df['Month']

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')

In the above code, we’ve got an error, because we were trying to combine string arrays with number arrays.

So, we just make a type conversion:

df['Year-Month'] = df['Year'].astype(str)+'-'+df['Month']

print(df)

   Year     Month     Year-Month
0  2019   October   2019-October
1  2020  December  2020-December
2  2021     April     2021-April
3  2022       May       2022-May
4  2023    August    2023-August

Problem solved!

Second thing to take into account:

The + operator on numeric columns performs addition instead of concatenation. Therefore, if you want to concatenate two or more number arrays, you need to convert all of them to string.