How to add feature interactions

Posted on 9-September-2020 by admin

Sometimes it is preferable to use simple machine learning algorithms such as logistic regression due to speed and explainability.
But usually these simple algorithms do not incorporate interactions of the features (in contrary to , say, neural networks, where sum/difference of the features is incorporated automatically, as each neuron would sum up incoming connections, and using log transform also can add product/division of the features).

Thus here we present simple way to add feature interactions into machine learning pipeline:

def addinteract(df,cols=None,inplace=False,ops=None,retfnames=False):
    df = df if inplace else df.copy()
    fnames=[]
    if cols is None:
        cols=df.columns
    if ops is None:
        ops=['sum','sub','prd']
    def sum(a,b):
        return a+b
    def sub(a,b):
        return a-b
    def prod(a,b):
        return a*b
    for i in range(len(cols)):
        for j in range(i+1,len(cols)):
            for op in ops:
                try:
                    fname=op+'('+cols[i]+','+cols[j]+')'
                    df[fname]=eval(op)(df[cols[i]],df[cols[j]])
                    fnames.append(fname)
                except Exception as e:
                    print(e)
    if retfnames:
        return df,fnames
    return df
pd.core.frame.DataFrame.addinteract=addinteract

We can use it in the following way:

xint=np.random.randint(0,10,20)
DF=pd.DataFrame
DF({'x':xint,'y':xint-1,'z':xint+2}).addinteract()

which results in:

How to add feature interactions

Quant Finance Test

Tags