Regexp (zero-width negative look-ahead)

剛剛在irc 看到的
(22:09:12) stormax: ‹一個正規式不知道怎樣寫, 我想搜尋 <? 但是要排除 <?php
(22:09:21) stormax: 有長輩知道怎麼寫嗎?
.
.
.
(22:31:19) mhsin: stormax: <?(!php) # 22:09 < stormax> �一個正規式不知道怎樣寫, 我想搜尋 < ? 但是要排除
(22:31:19) yinjieh: 那我會 < ?php
(22:31:33) yinjieh: 再 phpphp -> php
(22:31:53) mhsin: stormax: (!something) 叫 zero-width negative look-ahead
(22:32:01) mhsin: 啊錯了
(22:32:06) mhsin: (?!something) 才對
(22:32:24) mhsin: aaa(?!bbb) 就會 match 「後面不是 bbb 的 aaa」
然後去google 了一下
就找到一篇
Perl Regular Expressions Tip Sheet
裡面就有講到
zero-width negative look-ahead
我也去grep 實做了一下~~
還真的work

$ grep -P ‘aaa(?!bbb)’ ok.php
看irc 還真不少東西 ::-p:

UPDATE:2007/11/08
新增有關VIM 上的用法

							*/zero-width*
	When using "@=" (or "^", "$", "<", ">") no characters are included
	in the match.  These items are only used to check if a match can be
	made.  This can be tricky, because a match with following items will
	be done in the same position.  The last example above will not match
	"foobarfoo", because it tries match "foo" in the same position where
	"bar" matched.

	Note that using "&" works the same as using "@=": "foo&.." is the
	same as "(foo)@=..".  But using "&" is easier, you don't need the
	braces.

							*/@!*
@!	Matches with zero width if the preceding atom does NOT match at the
	current position. |/zero-width| {not in Vi}
	Like '(?!pattern)" in Perl.
	Example			matches 
	foo(bar)@!		any "foo" not followed by "bar"
	a.{-}p@!		"a", "ap", "app", etc. not followed by a "p"
	if ((then)@!.)*$	"if " not followed by "then"

	Using "@!" is tricky, because there are many places where a pattern
	does not match.  "a.*p@!" will match from an "a" to the end of the
	line, because ".*" can match all characters in the line and the "p"
	doesn't match at the end of the line.  "a.{-}p@!" will match any
	"a", "ap", "aap", etc. that isn't followed by a "p", because the "."
	can match a "p" and "p@!" doesn't match after that.

	You can't use "@!" to look for a non-match before the matching
	position: "(foo)@!bar" will match "bar" in "foobar", because at the
	position where "bar" matches, "foo" does not match.  To avoid matching
	"foobar" you could use "(foo)@!...bar", but that doesn't match a
	bar at the start of a line.  Use "(foo)@<!bar".

							*/@<=*
@<=	Matches with zero width if the preceding atom matches just before what
	follows. |/zero-width| {not in Vi}
	Like '(?<=pattern)" in Perl, but Vim allows non-fixed-width patterns.
	Example			matches 
	(an_s+)@<=file	"file" after "an" and white space or an
				end-of-line
	For speed it's often much better to avoid this multi.  Try using "zs"
	instead |/zs|.  To match the same as the above example:
		an_s+zsfile

	"@<=" and "@<!" check for matches just before what follows.
	Theoretically these matches could start anywhere before this position.
	But to limit the time needed, only the line where what follows matches
	is searched, and one line before that (if there is one).  This should
	be sufficient to match most things and not be too slow.
	The part of the pattern after "@<=" and "@<!" are checked for a
	match first, thus things like "1" don't work to reference () inside
	the preceding atom.  It does work the other way around:
	Example			matches 
	1@<=,([a-z]+)	",abc" in "abc,abc"

							*/@<!*
@<!	Matches with zero width if the preceding atom does NOT match just
	before what follows.  Thus this matches if there is no position in the
	current or previous line where the atom matches such that it ends just
	before what follows.  |/zero-width| {not in Vi}
	Like '(?<!pattern)" in Perl, but Vim allows non-fixed-width patterns.
	The match with the preceding atom is made to end just before the match
	with what follows, thus an atom that ends in ".*" will work.
	Warning: This can be slow (because many positions need to be checked
	for a match).
	Example			matches 
	(foo)@<!bar		any "bar" that's not in "foobar"
	(//.*)@<!in	"in" which is not after "//"

							*/@>*
@>	Matches the preceding atom like matching a whole pattern. {not in Vi}
	Like '(?>pattern)" in Perl.
	Example		matches 
	(a*)@>a	nothing (the "a*" takes all the "a"'s, there can't be
			another one following)

	This matches the preceding atom as if it was a pattern by itself.  If
	it doesn't match, there is no retry with shorter sub-matches or
	anything.  Observe this difference: "a*b" and "a*ab" both match
	"aaab", but in the second case the "a*" matches only the first two
	"a"s.  "(a*)@>ab" will not match "aaab", because the "a*" matches
	the "aaa" (as many "a"s as possible), thus the "ab" can't match.

發佈留言

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料