从UniProtKB数据库检索蛋白信息-queryup包

queryup包可以从’ UniProtKB ’ REST API中检索蛋白质信息。

URL https://github.com/VoisinneG/queryup

BugReports https://github.com/VoisinneG/queryup/issues

安装

1
2
3
install.packages("queryup")
# 或者
devtools::install_github("VoisinneG/queryup")

函数及示例

可用的查询字段

查询字段,该字段可用于使用" queryup "生成查询以及相关的示例和描述。

1
2
3
4
5
6
7
8
9
library(queryup)
query_field

# 字段
query_fields$field
# 示例
query_fields$example
# 简述
query_fields$description

返回的查询字段

返回可以使用" queryup "检索的字段,以及它们在检索data.frame​中出现的标签。

1
2
3
4
5
return_fields
# 字段
return_fields$field
# 标签
return_fields$label

Mus musculus中1000个UniProt的条目名称和其他属性

1
uniprot_entries

查询

get_uniprot_data​ 和query_uniprot​ 均能获取UniProt数据。但query_uniprot​解析错误信息,并从查询中移除无效的条目。

1
2
3
4
5
6
7
8
9
#Retrieve data from UniProt using UniProt's REST API. To avoid non-responsive queries, they are split into smaller queries with at most max_keys items per query field. 
query_uniprot(
query = NULL,
base_url = "https://rest.uniprot.org/uniprotkb/",
columns = c("accession", "id", "gene_names", "organism_id", "reviewed"),
max_keys = 200,
updateProgress = NULL,
show_progress = TRUE
)
query 同上
base_url 同上
columns 同上,例如 “accession”, “id”, “genes”, “keywords”, “sequence”。更多见return_fields
max_keys 最大检索数
updateProgress 在shiny 中显示运行进程
show_progress 显示进度条
1
2
3
4
5
6
7
8
9
10
# Retrieve data from UniProt using UniProt's REST API
get_uniprot_data(
query = NULL,
base_url = "https://rest.uniprot.org/uniprotkb/",
columns = c("accession", "id", "gene_names", "organism_id", "reviewed")
)

query <- list("accession_id" = c("P22682", "P47941"))
df <- get_uniprot_data(query = query, columns = c("accession", "id", "gene_names", "keyword", "sequence"))$content
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#
query <- list("gene_exact" = "Akt1")
df <- query_uniprot(query, show_progress = FALSE)

df <- query_uniprot(query,
columns = c("id", "sequence", "keyword", "gene_primary"),
show_progress = FALSE)
# 查询单个基因
query2 <- list("gene_exact" = "Pik3r1",
"reviewed" = "true",
"organism_id" = "9606")
df <- query_uniprot(query, show_progress = FALSE)
print(df)

# 查询多个基因
query <- list("gene_exact" = c("Itpr", "CaMKI"),
"reviewed" = "true",
"organism_id" = c("9606", "10090"))
df <- query_uniprot(query, show_progress = FALSE)
print(df)

参考

  1. VoisinneG/queryup: R client for the UniProt REST API

  2. queryup: Query the ‘UniProtKB’ REST API