Espressioni Regolari (Regulars Expressions)

Identificatori di testo: . qualsiasi carattere [abc] a, b oppure c [^abc] né a, né b né c abc|def abc oppure def Quantificatori: ? 0 o 1 occorrenze dell’identificatore di testo precedente * 0 o N occorrenze dell’identificatore di testo precedente (N>0) + 1 o N occorrenze dell’identificatore di testo precedente (N>1) Raggruppamento: (identificatori di testo) le parentesi tonde sono un modo per identificare un gruppo di identificatori di testo come una singola unità atomica. Ancore: ^ inizio linea $ fine linea Escape: \ esegue l’escape del carattere che segue Negazione: è possibile eseguire la “negazione” di un determinato pattern facendolo precedere dal carattere punto esclamativo ! ALTRO Le espressioni ([a-z]+) e ([0-9]+) indicano una porzione variabile della url che può contenere una qualsiasi serie di lettere nel primo caso e una qualsiasi serie di numeri nel secondo. Queste variabili verranno usate per realizzare la rewrite. Infatti il simbolo $ seguito da un numero ($1 e $2 nel nostro caso) utilizzato nella parte destra della nostra regola serve per richiamare (posizionalmente) tali variabili presenti nella parte di sinistra. ---------------------------------------------- [] specifies a character class, in which any character within the brackets will be a match. e.g., [xyz] will match either an x, y, or z. []+ character class in which any combination of items within the brackets will be a match. e.g., [xyz]+ will match any number of x’s, y’s, z’s, or any combination of these characters. [^] specifies not within a character class. e.g., [^xyz] will match any character that is neither x, y, nor z. [a-z] a dash (-) between two characters within a character class ([]) denotes the range of characters between them. e.g., [a-zA-Z] matches all lowercase and uppercase letters from a to z. a{n} specifies an exact number, n, of the preceding character. e.g., x{3} matches exactly three x’s. a{n,} specifies n or more of the preceding character. e.g., x{3,} matches three or more x’s. a{n,m} specifies a range of numbers, between n and m, of the preceding character. e.g., x{3,7} matches three, four, five, six, or seven x’s. () used to group characters together, thereby considering them as a single unit. e.g., (perishable)?press will match press, with or without the perishable prefix. ^ denotes the beginning of a regex (regex = regular expression) test string. i.e., begin argument with the proceeding character. $ denotes the end of a regex (regex = regular expression) test string. i.e., end argument with the previous character. ? declares as optional the preceding character. e.g., monzas? will match monza or monzas, while mon(za)? will match either mon or monza. i.e., x? matches zero or one of x. ! declares negation. e.g., “!string” matches everything except “string”. . a dot (or period) indicates any single arbitrary character. - instructs “not to” rewrite the URL, as in “...domain.com.* - [F]”. + matches one or more of the preceding character. e.g., G+ matches one or more G’s, while "+" will match one or more characters of any kind. * matches zero or more of the preceding character. e.g., use “.*” as a wildcard. | declares a logical “or” operator. for example, (x|y) matches x or y. \ escapes special characters ( ^ $ ! . * | ). e.g., use “\.” to indicate/escape a literal dot. \. indicates a literal dot (escaped). / zero or more slashes. .* zero or more arbitrary characters. ^$ defines an empty string. ^.*$ the standard pattern for matching everything. [^/.] defines one character that is neither a slash nor a dot. [^/.]+ defines any number of characters which contains neither slash nor dot. http:// this is a literal statement — in this case, the literal character string, “http://”. ^domain.* defines a string that begins with the term “domain”, which then may be proceeded by any number of any characters. ^domain\.com$ defines the exact string “domain.com”.