HowcanIreplaceemojiswithtextandtreatthemassinglewords?

I have to do a topic modeling based on pieces of texts containing emojis with R. Using the replace_emoji() and replace_emoticon functions let me analyze them, but there is a problem with the results.

A red heart emoji is translated as "red heart ufef". These words are then treated separately during the analysis and compromise the results.

Terms like "heart" can have a very different meaning as can be seen with "red heart ufef" and "broken heart"
The function replace_emoji_identifier() doesn't help either, as the identifiers make an analysis hard.

Dummy data set reproducible with by using dput() (including the step force to lowercase:

Emoji_struct <- c(
      list(content = " wow", " look at that", "this makes me angry", "?ufe0f, i love it!"),  
      list(content = "", " thanks for helping",  " oh no, why? ", "careful, challenging ???")
)

Current coding (data_orig is a list of several files):

library(textclean)
#The rest should be standard r packages for pre-processing

#pre-processing:
data <- gsub("'", "", data) 
data <- replace_contraction(data)
data <- replace_emoji(data) # replace emoji with words
data <- replace_emoticon(data) # replace emoticon with words
data <- replace_hash(data, replacement = "")
data <- replace_word_elongation(data)
data <- gsub("[[:punct:]]", " ", data)  #replace punctuation with space
data <- gsub("[[:cntrl:]]", " ", data) 
data <- gsub("[[:digit:]]", "", data)  #remove digits
data <- gsub("^[[:space:]]+", "", data) #remove whitespace at beginning of documents
data <- gsub("[[:space:]]+$", "", data) #remove whitespace at end of documents
data <- stripWhitespace(data)

Desired output:

[1] list(content = c("fire fire wow", 
                     "facewithopenmouth look at that", 
                     "facewithsteamfromnose this makes me angry facewithsteamfromnose", 
                     "smilingfacewithhearteyes redheart ufe0f, i love it!"), 
         content = c("smilingfacewithhearteyes smilingfacewithhearteyes", 
                     "smilingfacewithsmilingeyes thanks for helping", 
                     "cryingface oh no, why? cryingface", 
                     "careful, challenging crossmark crossmark crossmark"))

Any ideas? Lower cases would work, too.
Best regards. Stay safe. Stay healthy.

以上是HowcanIreplaceemojiswithtextandtreatthemassinglewords?的全部内容。
THE END
分享
二维码
< <上一篇
)">
下一篇>>