Python Implementations of Encoding Methods

Below are the six encoding methods that I studied earlier, using both pandas and scikit-learn, along with sample outputs for each approach.

Encoding Methodpandas Implementationscikit-learn Implementation
Label Encodingpd.factorize()LabelEncoder()
Ordinal Encodingmap() with custom orderOrdinalEncoder()
One-Hot Encodingpd.get_dummies()OneHotEncoder()
Target Encodinggroupby() + map()Custom or TargetEncoder from category_encoders
Frequency Encodingvalue_counts() + map()Custom implementation
Binary EncodingCustom with cat.codes and apply()BinaryEncoder() from category_encoders

Let’s quickly demonstrate each.

Label Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B']})

# Label Encoding using pandas (factorize)
data['LabelEncoded'] = pd.factorize(data['Category'])[0]
print(data)
from sklearn.preprocessing import LabelEncoder

# Label Encoding using scikit-learn
label_encoder = LabelEncoder()
data['LabelEncoded'] = label_encoder.fit_transform(data['Category'])
print(data)

Output:

 Category  LabelEncoded
0        A             0
1        B             1
2        A             0
3        C             2
4        B             1

Ordinal Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['low', 'medium', 'high', 'medium', 'low']})

# Ordinal Encoding using pandas
order = ['low', 'medium', 'high']
data['OrdinalEncoded'] = data['Category'].map({value: idx for idx, value in enumerate(order)})
print(data)
from sklearn.preprocessing import OrdinalEncoder

# Ordinal Encoding using scikit-learn
ordinal_encoder = OrdinalEncoder(categories=[['low', 'medium', 'high']])
data['OrdinalEncoded'] = ordinal_encoder.fit_transform(data[['Category']])
print(data)

Output:

 Category  OrdinalEncoded
0      low             0.0
1   medium             1.0
2     high             2.0
3   medium             1.0
4      low             0.0

One-Hot Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B']})

# One-Hot Encoding using pandas
one_hot = pd.get_dummies(data['Category'], prefix='Category')
print(one_hot)
from sklearn.preprocessing import OneHotEncoder

# One-Hot Encoding using scikit-learn
encoder = OneHotEncoder(sparse=False)
one_hot_sklearn = encoder.fit_transform(data[['Category']])
print(pd.DataFrame(one_hot_sklearn, columns=encoder.get_feature_names_out(['Category'])))

Output:

  Category_A  Category_B  Category_C
0         1.0         0.0         0.0
1         0.0         1.0         0.0
2         1.0         0.0         0.0
3         0.0         0.0         1.0
4         0.0         1.0         0.0

Target Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B'], 'Target': [1, 0, 1, 0, 1]})

# Target Encoding using pandas
target_means = data.groupby('Category')['Target'].mean()
data['TargetEncoded'] = data['Category'].map(target_means)
print(data)

Note: scikit-learn does not have direct support for target encoding, but we can implement this manually (like above) or use category_encoders library.

Output:

 Category  Target  TargetEncoded
0        A       1            1.0
1        B       0            0.5
2        A       1            1.0
3        C       0            0.0
4        B       1            0.5

Frequency Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B']})

# Frequency Encoding using pandas
frequency = data['Category'].value_counts()
data['FrequencyEncoded'] = data['Category'].map(frequency)
print(data)

Scikit-learn does not have direct support for frequency encoding, but a manual approach would be like above.

Output:

  Category  FrequencyEncoded
0        A                 2
1        B                 2
2        A                 2
3        C                 1
4        B                 2

Binary Encoding

import pandas as pd
data = pd.DataFrame({'Category': ['A', 'B', 'C', 'D']})

# Convert to categorical and get the codes
data['Category_Code'] = data['Category'].astype('category').cat.codes

# Convert the numeric codes into binary
binary_encoded = data['Category_Code'].apply(lambda x: bin(x)[2:].zfill(2))  # Ensure 2-bit binary representation

# Split the binary string into separate columns
binary_encoded_df = binary_encoded.apply(lambda x: pd.Series(list(x))).astype(int)

# Rename columns
binary_encoded_df.columns = [f'Category_{i}' for i in range(binary_encoded_df.shape[1])]
print(binary_encoded_df)
import category_encoders as ce

# Binary Encoding using category_encoders
binary_encoder = ce.BinaryEncoder(cols=['Category'])
binary_encoded = binary_encoder.fit_transform(data)
print(binary_encoded)

Output:

   Category_0  Category_1
0           0           0
1           0           1
2           1           0
3           1           1