列中除某些词外的标题词
除了列表中的单词,我如何命名所有单词,保留?
keep = ['for', 'any', 'a', 'vs']
df.col
``
0 1. The start for one
1 2. Today's world any
2 3. Today's world vs. yesterday.
预期输出:
number title
0 1 The Start for One
1 2 Today's World any
2 3 Today's World vs. Yesterday.
我试过
df['col'] = df.col.str.title().mask(~clean['col'].isin(keep))
回答
这是使用str.replace和传递替换函数的一种方法:
def replace(match):
word = match.group(1)
if word not in keep:
return word.title()
return word
df['title'] = df['title'].str.replace(r'(w+)', replace)
number title
0 1 The Start for One
1 2 Today'S World any
2 3 Today'S World vs. Yesterday.
回答
首先,我们创建您的number和title列。然后我们使用Series.explode每行获取一个单词。如果单词在keep我们忽略它,否则应用Series.str.title:
keep = ['for', 'any', 'a', 'vs']
# create 'number' and 'title' column
df[['number', 'title']] = df['col'].str.split(".", expand=True, n=1)
df = df.drop(columns='col')
# apply str.title if not in keep
words = df['title'].str.split().explode()
words = words.str.replace(".", "", regex=False)
words = words.mask(words.isin(keep)).str.title().fillna(words)
df['title'] = words.groupby(level=0).agg(" ".join)
输出
number title
0 1 The Start for One
1 2 Today'S World any
2 3 Today'S World vs. Yesterday.