detect_any identifies SDGs in text using user provided query systems. Works like detect_sdg_systems but uses a user specified query system instead of an existing one like detect_sdg_systems does.

detect_any(
  text,
  system,
  queries = lifecycle::deprecated(),
  sdgs = NULL,
  output = c("features", "documents"),
  verbose = TRUE
)

Arguments

text

character vector or object of class tCorpus containing text in which SDGs shall be detected.

system

a data frame that must contain the following variables: a character vector with queries, a integer vector specifying which SDG each query maps to (values must be between 1 and 17) and a character with one unique value specifying the name of the used query system (can be anything as long as it is unique).

queries

deprecated.

sdgs

numeric vector with integers between 1 and 17 specifying the sdgs to identify in text. Defaults to 1:17.

output

character specifying the level of detail in the output. The default "features" returns a tibble with one row per matched query, include a variable containing the features of the query that were matched in the text. By contrast, "documents" returns an aggregated tibble with one row per matched sdg, without information on the features.

verbose

logical specifying whether messages on the function's progress should be printed.

Value

The function returns a tibble containing the SDG hits found in the vector of documents. Depending on the value of output the tibble will contain all or some of the following columns:

document

Index of the element in text where match was found. Formatted as a factor with the number of levels matching the original number of documents.

sdg

Label of the SDG found in document.

systems

The name of the query system that produced the match.

query_id

Index of the query within the query system that produced the match.

features

Concatenated list of words that caused the query to match.

hit

Index of hit for a given system.

Examples

# \donttest{
# create data frame with query system
my_queries <- tibble::tibble(
  system = "my_system",
  query = c(
    "theory",
    "analysis OR analyses OR analyzed",
    "study AND hypothesis"
  ),
  sdg = c(1, 2, 2)
)

# run sdg detection with own query system
hits <- detect_any(projects, my_queries)

# run sdg detection for sdg 2 only
hits <- detect_any(projects, my_queries, sdgs = 2)
# }