casacore
Loading...
Searching...
No Matches
Regex.h
Go to the documentation of this file.
1//# Regex.h: Regular expression class
2//# Copyright (C) 1993,1994,1995,1996,1997,1999,2000,2001,2003
3//# Associated Universities, Inc. Washington DC, USA.
4//#
5//# This library is free software; you can redistribute it and/or modify it
6//# under the terms of the GNU Library General Public License as published by
7//# the Free Software Foundation; either version 2 of the License, or (at your
8//# option) any later version.
9//#
10//# This library is distributed in the hope that it will be useful, but WITHOUT
11//# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
12//# FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public
13//# License for more details.
14//#
15//# You should have received a copy of the GNU Library General Public License
16//# along with this library; if not, write to the Free Software Foundation,
17//# Inc., 675 Massachusetts Ave, Cambridge, MA 02139, USA.
18//#
19//# Correspondence concerning AIPS++ should be addressed as follows:
20//# Internet email: casa-feedback@nrao.edu.
21//# Postal address: AIPS++ Project Office
22//# National Radio Astronomy Observatory
23//# 520 Edgemont Road
24//# Charlottesville, VA 22903-2475 USA
25
26#ifndef CASA_REGEX_H
27#define CASA_REGEX_H
28
29//# Includes
30#include <casacore/casa/aips.h>
31#include <casacore/casa/iosfwd.h>
32#include <regex>
33#include <casacore/casa/BasicSL/String.h>
34
35namespace casacore { //# NAMESPACE CASACORE - BEGIN
36
37// <summary>
38// Regular expression class (based on std::regex)
39// </summary>
40
41// <use visibility=export>
42
43// <reviewed reviewer="Friso Olnon" date="1995/03/20" tests="tRegex" demos="">
44// </reviewed>
45
46// <synopsis>
47// This class provides regular expression functionality, such as
48// matching and searching in strings, comparison of expressions, and
49// input/output. It is built on the standard C++ regular expression class
50// using the ECMAScript syntax. It is almost the same as the regular expression
51// syntax used until March 2019 which used GNU's cregex.cc.
52// ECMAScript offers more functionality (such as non-greedy matching),
53// but there is a slight difference how brackets are used. In the old
54// regex they did not need to be escaped, while they have to for ECMAScript.
55// Furthermore, in the old Regex up to 9 backreferences could be given, so
56// \15 meant the first backreference followed by a 5. In ECMAScript it means
57// the 15th and parentheses are needed to get the old meaning.
58// These differences are solved in the Regex constructor which adds escape
59// characters as needed. Thus existing code using Regex does not need to be changed.
60//
61// Apart from proper regular expressions, it also supports glob patterns
62// (UNIX file name patterns) by means of a conversion to a proper regex string.
63// Also ordinary strings and SQL-style patterns can be converted to a proper
64// regex string.
65// <p>
66// See http://www.cplusplus.com/reference/regex/ECMAScript for the syntax.
67// <dl>
68// <dt> ^
69// <dd> matches the beginning of a line.
70// <dt> $
71// <dd> matches the end of a line.
72// <dt> .
73// <dd> matches any character
74// <dt> *
75// <dd> zero or more times the previous subexpression.
76// <dt> +
77// <dd> one or more times the previous subexpression.
78// <dt> ?
79// <dd> zero or one time the previous subexpression.
80// <dt> {n,m}
81// <dd> interval operator to specify how many times a subexpression
82// can match. See man page of egrep or regexp for more detail.
83// <dt> []
84// <dd> matches any character inside the brackets; e.g. <src>[abc]</src>.
85// A hyphen can be used for a character range; e.g. <src>[a-z]</src>.
86// <br>
87// A ^ right after the opening bracket indicates "not";
88// e.g. <src>[^abc]</src> means any character but a, b, and c.
89// If ^ is not the first character, it is a literal caret.
90// If - is the last character, it is a literal hyphen.
91// If ] is the first character, it is a literal closing bracket.
92// <br>
93// Special character classes are
94// [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:],
95// [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
96// The brackets are part of the name; e.g.
97// <src>[^[:upper:]]</src> is equal to <src>[^A-Z]</src>.
98// Note that [:upper:] is more portable, because A-Z fails
99// for the EBCDIC character set.
100// <dt> ( )
101// <dd> grouping to change the normal operator precedence.
102// <dt> |
103// <dd> or operator. Matches left side or right side.
104// <dt> \\1 till \\9. Backreference to a subexpression. Matches part of string
105// equal to string part that matched the subexpression.
106// </dl>
107// Special characters have to be escaped with a backslash to use them
108// literally. Only inside the square brackets, escaping should not be done.
109// See the man page of egrep or regexp for more information about
110// regular expressions.
111// <p>
112// Several global Regex objects are predefined for common functionality.
113// <dl>
114// <dt> RXwhite
115// <dd> one or more whitespace characters
116// <dt> RXint
117// <dd> integer number (also negative)
118// <dt> RXdouble
119// <dd> double number (with e or E as exponent)
120// <dt> RXalpha
121// <dd> one or more alphabetic characters (lowercase and/or uppercase)
122// <dt> RXlowercase
123// <dd> lowercase alphabetic
124// <dt> RXuppercase
125// <dd> uppercase alphabetic
126// <dt> RXalphanum
127// <dd> one or more alphabetic/numeric characters (lowercase and/or uppercase)
128// <dt> RXidentifier
129// <dd> identifier name (first alphabetic or underscore, then zero or
130// more alphanumeric and/or underscores
131// </dl>
132// The static member function <src>fromPattern</src> converts a shell-like
133// pattern to a String which can be used to create a Regex from it.
134// A pattern has the following special characters:
135// <dl>
136// <dt> *
137// <dd> Zero or more arbitrary characters.
138// <dt> ?
139// <dd> One arbitrary character
140// <dt> []
141// <dd> The same as [] in a regular expression (see above).
142// In addition to ^ a ! can be used to indicate "not".
143// <dt> {,}
144// <dd> A brace expression which is like brace expansion in some shells.
145// It is similar to the | construct in a regular expression.
146// <br>
147// E.g. <src>{abc,defg}</src> means <src>abc</src> or <src>defg</src>.
148// Brace expressions can be nested and can contain other
149// special characters.
150// <br>
151// E.g. St{Man*.{h,cc},Col?*.{h,cc,l,y}}
152// <br>A literal comma or brace in a brace expression can be given by
153// escaping it with a backslash.
154// </dl>
155// The static member function <src>fromSQLPattern</src> converts an SQL-like
156// pattern to a String which can be used to create a Regex from it.
157// A pattern has the following special characters:
158// <dl>
159// <dt> %
160// <dd> Zero or more arbitrary characters.
161// <dt> _
162// <dd> One arbitrary character
163// </dl>
164// The static member function <src>fromString</src> converts a normal
165// string to a regular expression. This function escapes characters in
166// the string which are special in a regular expression. In this way a
167// normal string can be passed to a function taking a regular expression.
168//
169// The static member function <src>makeCaseInsensitive</src> returns a
170// new regular expression string containing the case-insensitive version of
171// the given expression string.
172// </synopsis>
173
174// <example>
175// <srcblock>
176// Regex RXwhite("[ \n\t\r\v\f]+");
177// (blank, newline, tab, return, vertical tab, formfeed)
178// Regex RXint("[-+]?[0-9]+");
179// Regex RXdouble("[-+]?(([0-9]+\\.[0-9]*)|([0-9]+)|(\\.[0-9]+))([eE][+-]?[0-9]+)?");
180// Regex RXalpha("[A-Za-z]+");
181// Regex RXlowercase("[a-z]+");
182// Regex RXuppercase("[A-Z]+");
183// Regex RXalphanum("[0-9A-Za-z]+");
184// Regex RXidentifier("[A-Za-z_][A-Za-z0-9_]*");
185// </srcblock>
186// In RXdouble the . is escaped via a backslash to get it literally.
187// The second backslash is needed to escape the backslash in C++.
188// <srcblock>
189// Regex rx1 (Regex::fromPattern ("St*.{h,cc}");
190// results in regexp "St.*\.((h)|(cc))"
191// Regex rx2 (Regex::fromString ("tRegex.cc");
192// results in regexp "tRegex\.cc"
193// </srcblock>
194// </example>
195
196//# <todo asof="2001/07/15">
197//# </todo>
198
199
200class Regex: std::regex
201{
202public:
203 // Default constructor uses a zero-length regular expression.
205
206 // Construct a regular expression from the string.
207 // If toECMAScript=True, function toEcma is called to convert the old cregex
208 // syntax to the new ECMAScript syntax.
209 // If fast=True, matching efficiency is preferred over efficiency constructing
210 // the regex object.
211 explicit Regex(const String& exp, Bool fast=False, Bool toECMAScript=True);
212
213 // Construct a new regex (using the default Regex constructor arguments).
214 void operator=(const String& str);
215
216 // Convert the possibly old-style regex to the Ecma regex which means
217 // that unescaped [ and ] inside a bracket expression will be escaped and
218 // that a numeric character after a backreference is enclosed in brackets
219 // (otherwise the backreference uses multiple characters).
220 static String toEcma(const String& rx);
221
222 // Convert a shell-like pattern to a regular expression string.
223 // This is useful for people who are more familiar with patterns
224 // than with regular expressions.
226
227 // Convert an SQL-like pattern to a regular expression string.
228 // This is useful TaQL which mimics SQL.
230
231 // Convert a normal string to a regular expression string.
232 // This consists of escaping the special characters.
233 // This is useful when one wants to provide a normal string
234 // (which may contain special characters) to a function working
235 // on regular expressions.
236 static String fromString(const String& str);
237
238 // Create a case-insensitive regular expression string from the given
239 // regular expression string.
240 // It does it by inserting the lowercase and uppercase version of
241 // characters in the input string into the output string.
242 static String makeCaseInsensitive (const String& str);
243
244 // Get the regular expression string.
245 const String& regexp() const
246 { return itsStr; }
247
248 // Test if the regular expression matches (first part of) string <src>s</src>.
249 // The return value gives the length of the matching string part,
250 // or String::npos if there is no match or an error.
251 // The string has <src>len</src> characters and the test starts at
252 // position <src>pos</src>. The string may contain null characters.
253 // Negative p is allowed to define the start from the end.
254 //
255 // <note role=tip>
256 // Use the appropriate <linkto class=String>String</linkto> functions
257 // to test if a string matches a regular expression.
258 // <src>Regex::match</src> is pretty low-level.
259 // </note>
262 String::size_type pos=0) const;
263
264 // Test if the regular expression matches the entire string.
265 Bool fullMatch(const Char* s, String::size_type len) const;
266
267 // Test if the regular expression occurs anywhere in string <src>s</src>.
268 // The return value gives the position of the first substring
269 // matching the regular expression. The length of that substring
270 // is returned in <src>matchlen</src>.
271 // The string has <src>len</src> characters and the test starts at
272 // position <src>pos</src>. The string may contain null characters.
273 // If the pos given is negative, the search starts -pos from the end.
274 // <note role=tip>
275 // Use the appropriate <linkto class=String>String</linkto> functions
276 // to test if a regular expression occurs in a string.
277 // <src>Regex::search</src> is pretty low-level.
278 // </note>
279 // <group>
282 Int& matchlen,
283 Int pos=0) const;
285 Int& matchlen,
286 String::size_type pos=0) const;
287 // </group>
288
289 // Search backwards.
291 Int& matchlen,
292 uInt pos) const;
293
294 // Write the regex string.
295 friend ostream& operator<<(ostream& ios, const Regex& exp);
296
297protected:
298 String itsStr; // the reg. exp. string
299};
300
301
302// some built in regular expressions
303
304extern const Regex RXwhite; //# = "[ \n\t\r\v\f]+"
305extern const Regex RXint; //# = "-?[0-9]+"
306extern const Regex RXdouble; //# = "-?(([0-9]+\\.[0-9]*)|
307 //# ([0-9]+)|(\\.[0-9]+))
308 //# ([eE][+-]?[0-9]+)?"
309extern const Regex RXalpha; //# = "[A-Za-z]+"
310extern const Regex RXlowercase; //# = "[a-z]+"
311extern const Regex RXuppercase; //# = "[A-Z]+"
312extern const Regex RXalphanum; //# = "[0-9A-Za-z]+"
313extern const Regex RXidentifier; //# = "[A-Za-z_][A-Za-z0-9_]*"
314
315
316} //# NAMESPACE CASACORE - END
317
318#endif
void operator=(const String &str)
Construct a new regex (using the default Regex constructor arguments).
Regex()
Default constructor uses a zero-length regular expression.
friend ostream & operator<<(ostream &ios, const Regex &exp)
Write the regex string.
String::size_type search(const Char *s, String::size_type len, Int &matchlen, Int pos=0) const
Test if the regular expression occurs anywhere in string s.
const String & regexp() const
Get the regular expression string.
Definition Regex.h:245
static String fromPattern(const String &pattern)
Convert a shell-like pattern to a regular expression string.
Bool fullMatch(const Char *s, String::size_type len) const
Test if the regular expression matches the entire string.
String::size_type searchBack(const Char *s, String::size_type len, Int &matchlen, uInt pos) const
Search backwards.
Regex(const String &exp, Bool fast=False, Bool toECMAScript=True)
Construct a regular expression from the string.
static String toEcma(const String &rx)
Convert the possibly old-style regex to the Ecma regex which means that unescaped [ and ] inside a br...
String::size_type find(const Char *s, String::size_type len, Int &matchlen, String::size_type pos=0) const
static String fromString(const String &str)
Convert a normal string to a regular expression string.
static String makeCaseInsensitive(const String &str)
Create a case-insensitive regular expression string from the given regular expression string.
static String fromSQLPattern(const String &pattern)
Convert an SQL-like pattern to a regular expression string.
String::size_type match(const Char *s, String::size_type len, String::size_type pos=0) const
Test if the regular expression matches (first part of) string s.
String itsStr
Definition Regex.h:298
String: the storage and methods of handling collections of characters.
Definition String.h:223
string::size_type size_type
Definition String.h:231
this file contains all the compiler specific defines
Definition mainpage.dox:28
TableExprNode pattern(const TableExprNode &node)
Definition ExprNode.h:1491
const Regex RXint
const Bool False
Definition aipstype.h:42
const Regex RXalpha
unsigned int uInt
Definition aipstype.h:49
const Regex RXuppercase
const Regex RXdouble
int Int
Definition aipstype.h:48
bool Bool
Define the standard types used by Casacore.
Definition aipstype.h:40
const Regex RXidentifier
const Regex RXlowercase
const Bool True
Definition aipstype.h:41
char Char
Definition aipstype.h:44
const Regex RXwhite
some built in regular expressions
const Regex RXalphanum