Sometimes it is preferable to use simple machine learning algorithms such as logistic regression due to speed and explainability.
But usually these simple algorithms do not incorporate interactions of the features (in contrary to , say, neural networks, where sum/difference of the features is incorporated automatically, as each neuron would sum up incoming connections, and using log transform also can add product/division of the features).
Thus here we present simple way to add feature interactions into machine learning pipeline:
def addinteract(df,cols=None,inplace=False,ops=None,retfnames=False): df = df if inplace else df.copy() fnames=[] if cols is None: cols=df.columns if ops is None: ops=['sum','sub','prd'] def sum(a,b): return a+b def sub(a,b): return a-b def prod(a,b): return a*b for i in range(len(cols)): for j in range(i+1,len(cols)): for op in ops: try: fname=op+'('+cols[i]+','+cols[j]+')' df[fname]=eval(op)(df[cols[i]],df[cols[j]]) fnames.append(fname) except Exception as e: print(e) if retfnames: return df,fnames return df pd.core.frame.DataFrame.addinteract=addinteract
We can use it in the following way:
xint=np.random.randint(0,10,20) DF=pd.DataFrame DF({'x':xint,'y':xint-1,'z':xint+2}).addinteract()
which results in: