<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>not just random &#187; Z</title>
	<atom:link href="http://www.notjustrandom.com/category/algorithms/z/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.notjustrandom.com</link>
	<description></description>
	<lastBuildDate>Thu, 24 Jun 2010 18:12:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>The Z Algorithm</title>
		<link>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/</link>
		<comments>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/#comments</comments>
		<pubDate>Sat, 01 Nov 2008 22:26:24 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Z]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/?p=128</guid>
		<description><![CDATA[In Algorithms on Strings, Trees and Sequences, Dan Gusfield presents the Z algorithm. A string prefix P is a substring of S that begins at S[0]. The Z algorithm calculates for each position k > 0 in S the maximum length of P that is matched by a substring starting at k. The algorithm performs [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.amazon.com/Algorithms-Strings-Trees-Sequences-Computational/dp/0521585198">Algorithms on Strings, Trees and Sequences</a>, Dan Gusfield presents the Z algorithm. </p>
<p>A string prefix P is a substring of S that begins at S[0]. The Z algorithm calculates for each position k > 0 in S the maximum length of P that is matched by a substring starting at k. The algorithm performs in linear time by keeping track of previously calculated values and recognizing, if the character at a currently examined start position is within a previously detected substring.</p>
<p><strong>An example</strong></p>
<p>Here is a string S and its Z-values for each index position k:</p>
<p>S = &#8216;aabcaabxaaaz&#8217;</p>
<table>
<tr>
<th width="35%">k
<th width="35%">s[k]
<th width="30%">Z</p>
<tr>
<td>0
<td>a
<td>n/a</p>
<tr>
<td>1
<td>a
<td>1</p>
<tr>
<td>2
<td>b
<td>0</p>
<tr>
<td>3
<td>c
<td>0</p>
<tr>
<td>4
<td>a
<td>3</p>
<tr>
<td>5
<td>a
<td>1</p>
<tr>
<td>6
<td>b
<td>0</p>
<tr>
<td>7
<td>x
<td>0</p>
<tr>
<td>8
<td>a
<td>2</p>
<tr>
<td>9
<td>a
<td>2</p>
<tr>
<td>10
<td>a
<td>1</p>
<tr>
<td>11
<td>z
<td>0<br />
</table>
<p><strong>Step by step</strong></p>
<p>Let&#8217;s look at a these one at a time to show how easily the Z-values can be computed.</p>
<p><strong>k = 1</strong></p>
<p>Then s[0] = s[1], but s[1] != [2], so Z[1] = 1. The matched substring has the boundaries l = 1 and r = 1, so s[l..r] = s[1..1] = &#8216;a&#8217;</p>
<p><strong>k = 2</strong></p>
<p>Then k > r, which means s[2] is outside the previously discovered substring. s[0] != s[2], so<br />
Z[2] = 0.</p>
<p><strong>k = 3</strong></p>
<p>In the same manner, if k = 3, s[0] != s[3], so Z[3] = 0.</p>
<p><strong>k = 4</strong></p>
<p>Then s[0] = s[4], s[1] = s[5], s[2] = s[6], but s[3] != s[7], so Z[4] = 3. The matched substring has the boundaries l = 4 and r = 6, so s[l..r] = s[4..6] = &#8216;aab&#8217;</p>
<p><strong>k = 5</strong></p>
<p>Then k <= r: s[5] is within a previously discovered substring. The previously discovered substring starts at k - l = 5 - 4 = 1. The Z-value found at that position is Z[1] = 1. Since that value is smaller than the length of the remaining substring S[k..r], there is no need to perform other character comparisons. Z[5] = Z[1] = 1. l and r remain unchanged.</p>
<p><strong>k = 6</strong></p>
<p>In the same manner, if k = 6, then Z[6] = z[2] = 0. </p>
<p><strong>k = 7</strong></p>
<p>Then k > r, s[0] != s[7], so Z[7] = 0.</p>
<p><strong>k = 8</strong></p>
<p>Then k > r, s[0] = s[8], s[1] = s[9] and s[2] != s[10], so Z[8] = 2. The matched substring has the boundaries l = 8 and r = 9, so s[l..r] = s[8..9] = &#8216;aa&#8217;.</p>
<p><strong>k = 9</strong></p>
<p>Then k <= r, so s[k] is within a previously discovered substring. The previously discovered substring starts at k - l = 9 - 8 = 1. The Z-value found at that position is Z[1] = 1. That value is not smaller than the length of the remaining substring S[k..r] = s[9..9] = 'a', so additional character comparisons need to be performed. Let b = r - k + 1 = 9 - 9 + 1. Z[9] will be >= b. s[b] = s[1] = s[k + b] = s[10], but s[2] != s[11], so Z[9] = 2. The new substring has the boundaries l = 9 and r = 10.</p>
<p><strong>k = 10</strong></p>
<p>Then k = r, so s[k] is within the previously discovered substring of s[l..r]. The z-value for that string is Z[k - l] = Z[10-9] = Z[1] = 1. Since that value is equal to the length of the remainder of the substring s[k..r], additional comparisons are performed, but since s[2] != s[11], Z[10] = Z[1] = 1.</p>
<p><strong>k = 11</strong></p>
<p>Then k < r and since s[0] != s[11], Z[11] = 0.</p>
<p><strong>Implementation</strong></p>
<p>Here is the implementation of the algorithm (as outlined in the book) in Python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getZ<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        result = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
&nbsp;
        l = r = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #66cc66;">&gt;</span> r:
                        zk = <span style="color: #ff4500;">0</span>
                        <span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                                <span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
                                        s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
                                        <span style="color: #ff7700;font-weight:bold;">pass</span>
                                <span style="color: #ff7700;font-weight:bold;">else</span>:
                                        <span style="color: #ff7700;font-weight:bold;">break</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> si <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>:
                                zk = si
                                r = zk + k - <span style="color: #ff4500;">1</span>
                                l = k
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        kOld = k - l
                        zOld = result<span style="color: black;">&#91;</span>kOld<span style="color: black;">&#93;</span>
                        b = r - k + <span style="color: #ff4500;">1</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> zOld <span style="color: #66cc66;">&lt;</span> b:
                                zk = zOld
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                zk = b
&nbsp;
                                <span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>b, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                                        <span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> \
                                                <span style="color: #ff7700;font-weight:bold;">and</span> s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> \
                                                == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
                                                <span style="color: #ff7700;font-weight:bold;">pass</span>
                                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                                <span style="color: #ff7700;font-weight:bold;">break</span>
                                zk = si
                                r = zk + k - <span style="color: #ff4500;">1</span>
                                l = k
                result<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = zk
        <span style="color: #ff7700;font-weight:bold;">return</span> result</pre></div></div>

<p>That code deserves some cleanup, but it does yield the correct result:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">s = <span style="color: #483d8b;">'aabcaabxaaaz'</span>
z = getZ<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> k, z<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
<span style="color: #ff4500;">1</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">2</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">3</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">4</span> <span style="color: #ff4500;">3</span>
<span style="color: #ff4500;">5</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">6</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">7</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">8</span> <span style="color: #ff4500;">2</span>
<span style="color: #ff4500;">9</span> <span style="color: #ff4500;">2</span>
<span style="color: #ff4500;">10</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">11</span> <span style="color: #ff4500;">0</span></pre></div></div>

<p><strong>Next?</strong></p>
<p>The Z algorithm can be used as a precursor for additional string analysis. With little modification it can also be changed to work as a simple, linear exact matching algorithm.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
