Regex Help

This database consists of EV publications sourced from EuropePMC that are designated as open access.

The tool offers advanced search capabilities using Regular Expressions to extract specific patterns from the text.

It also allows for targeted searches within specific manuscript sections (e.g., Title, Abstract, Introduction, Methods).

Post-search filtering capabilities enable users to refine search results based on multiple criteria, ensuring that the output is tailored to specific research interests.

The platform is also equipped with integrated data visualization tools. This facilitates immediate insights into publication trends, keyword usage patterns, and other relevant statistics without requiring external software.

The inclusion of word enrichment analysis helps users understand the prominence of specific terms in the searched subset of manuscripts compared to their overall occurrence in the entire database.

There is a dedicated section to assist users in constructing and understanding regular expressions, ensuring that even those unfamiliar with regex can harness its full potential.

Cut height as a percentage of the maximum distance (Number of Clusters):

Regex Help

Introduction:

Regular expressions, commonly known as "regex", are sequences of characters that define a search pattern. They can be utilized in various tasks, from simple searches of specific words or characters to more complex pattern recognitions, such as extracting email addresses or dates from large texts. The following guide provides an overview of basic regex characters, quantifiers, positional markers, character classes, special characters, and some common patterns with their applications in the context of medical and scientific texts.

Basic Characters:

.: Matches any character except a newline.
\\d: Matches any digit (equivalent to [0-9]).
\\D: Matches any non-digit.
\\w: Matches any word character (equivalent to [a-zA-Z0-9_]).
\\W: Matches any non-word character.
\\s: Matches any whitespace character (spaces, tabs, line breaks).
\\S: Matches any non-whitespace character.

Quantifiers:

*: Matches 0 or more of the preceding token.
+: Matches 1 or more of the preceding token.
?: Matches 0 or 1 of the preceding token.
{n}: Matches exactly 'n' of the preceding token.
{n,}: Matches 'n' or more of the preceding token.
{n,m}: Matches between 'n' and 'm' of the preceding token.

Positional:

^: Matches the beginning of a line or string.
$: Matches the end of a line or string.
\\b: Matches a word boundary.
\\B: Matches a non-word boundary.

Character Classes:

[abc]: Matches any one character from the set {a, b, c}.
[^abc]: Matches any one character not in the set {a, b, c}.
[a-z]: Matches one lowercase letter.
[A-Z]: Matches one uppercase letter.
[0-9]: Matches one digit.

Special Characters:

\\: Escapes a character that has special meaning in regex.
|: Acts as a logical OR.
(): Groups elements together.
[]: Creates a character class.

Examples:

Year Patterns:
Pattern: \\b\\d{4}\\b
Matches any 4-digit number, often used to capture years.
Example: In the context of a sentence like "The study conducted in 2020 showed that...", it will capture "2020".
Scientific Notations:
Pattern: \\d+\\.?\\d*\\s*[x×]\\s*10\\^\\d+
Matches numbers in scientific notation.
Example: In a sentence like "The sample had a concentration of 2.5 × 10^6 cells/mL.", it will capture "2.5 × 10^6".
Drug Dosages:
Pattern: \\d+\\.?\\d*\\s*mg
Matches dosages commonly given in milligrams.
Example: In a statement like "The patient was prescribed 50 mg of DrugX.", it will capture "50 mg".
Gene or Protein Codes:
Pattern: \\b[A-Z]{2,}\\d{2,}\\b
A simple pattern for some gene/protein codes that follow a format of two or more capital letters followed by two or more digits.
Example: For a mention like "The BRCA1 gene is linked to...", it can capture "BRCA1".
ICD Codes:
Pattern: \\b[A-Z]\\d{2}(\\.\\d+)?\\b
Matches ICD (International Classification of Diseases) codes.
Example: In a context like "The patient was diagnosed with ICD code B34.9.", it will capture "B34.9".
Percent Values:
Pattern: \\d+\\.?\\d*\\s*%
Matches percentages.
Example: In a sentence like "About 65% of the subjects showed improvement.", it captures "65%".
Citations:
Pattern: \\[\\d+\\]
Matches typical inline citation formats.
Example: For a statement like "This was previously studied by Smith et al. [23].", it will capture "[23]".
Emails:
Pattern: \\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z]{2,}\\b
Matches typical email formats.
Example: In a context like "For inquiries, contact editor@medjournal.org.", it will capture "editor@medjournal.org".