列中除某些词外的标题词

除了列表中的单词,我如何命名所有单词,保留?

keep = ['for', 'any', 'a', 'vs']
df.col
 ``         
0    1. The start for one
1    2. Today's world any
2    3. Today's world vs. yesterday.

预期输出:

     number   title
0     1       The Start for One
1     2       Today's World any
2     3       Today's World vs. Yesterday.


我试过

df['col'] = df.col.str.title().mask(~clean['col'].isin(keep))

回答

这是使用str.replace和传递替换函数的一种方法:

def replace(match):
    word = match.group(1)
    if word not in keep:
        return word.title()
    return word

df['title'] = df['title'].str.replace(r'(w+)', replace)

   number                         title
0       1             The Start for One
1       2             Today'S World any
2       3  Today'S World vs. Yesterday.


回答

首先,我们创建您的numbertitle列。然后我们使用Series.explode每行获取一个单词。如果单词在keep我们忽略它,否则应用Series.str.title

keep = ['for', 'any', 'a', 'vs']

# create 'number' and 'title' column
df[['number', 'title']] = df['col'].str.split(".", expand=True, n=1)
df = df.drop(columns='col')

# apply str.title if not in keep
words = df['title'].str.split().explode()
words = words.str.replace(".", "", regex=False)
words = words.mask(words.isin(keep)).str.title().fillna(words)
df['title'] = words.groupby(level=0).agg(" ".join)

输出

  number                         title
0      1             The Start for One
1      2             Today'S World any
2      3  Today'S World vs. Yesterday.


以上是列中除某些词外的标题词的全部内容。
THE END
分享
二维码
< <上一篇
下一篇>>