作者luenchang (luen)
看板R_Language
标题 [问题] 用regex提取2到3位的数字
时间Mon Dec 21 12:17:10 2020
各位先进,
我想从字串中提取出数字的部分。我的字串有规律性,结构上,开头是2到3位的数字,空白
,接着不等位数的字母,或字母和数字。我想提取的是开头的部分。我试了两个方法,方法
1只拿出数字的最末位,方法2拿出完整的数字。我不知道方法1的regex写法有什麽错。
以下是我的字串及code
# Strings to extract
strings <- c("130 UDINE", "162 BF02", "163 AS04", "164 AL08", "165 BR12", "166 S
A13", "167 MA14", "167 MA14", "168 OC15", "85 BERGAMO")
# Method 1 to extract the beginning part of the strings (not working)
gsub(pattern = "^(\\d){2,3}(\\s).*", replacement = "\\1", x=strings)
# [1] "0" "2" "3" "4" "5" "6" "7" "7" "8" "5"
# Method 2 to extract the beginning part of the strings (not working)
gsub(pattern = "^(\\d+)(\\s).*", replacement = "\\1", x=strings)
# [1] "130" "162" "163" "164" "165" "166" "167" "167" "168" "85"
谢谢
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 110.174.219.126 (澳大利亚)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1608524232.A.14D.html
1F:推 celestialgod: library(stringr); str_extract_all(strings, ”\\d 12/21 13:11
2F:→ celestialgod: {2,3}”) 12/21 13:11
3F:→ andrew43: 用tstrsplit很直觉 12/21 13:23
4F:→ andrew43: data.table::tstrsplit(strings, " ", keep = 1)[[1]] 12/21 13:24
5F:→ resentis: 大概是\\d的数量范围要跟紧\\d 12/21 20:30
6F:→ resentis: "^(\\d{2,3})(\\s).*" 12/21 20:30
7F:推 JuanMaestrow: str_extract(string, regex(“^\\d+”)) 就可以罗 12/21 21:45