find index of top 3 largest values of each column:
df1.apply(lambda s: pd.Series(s.nlargest(3).index))
map dataframe column
df["ItemIdx"] = df["question"].map(lambda x: itemMap.get(x,np.NaN))
load a dictionay from a save pkl file
with open ("l.pkl","rb") as f: itemMap= pickle.lead(f)
find the startpoint of each session (after sorted):
offset = np.zeros(df["sessinId"].nunique()+1,dtype=np.int32) offset[1:] = df.groupby(‘sessinId‘).size().cumsum()
原文地址:https://www.cnblogs.com/pocahontas/p/11775756.html
时间: 2024-11-13 10:33:52