<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>not just random &#187; python</title>
	<atom:link href="http://www.notjustrandom.com/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.notjustrandom.com</link>
	<description></description>
	<lastBuildDate>Tue, 08 Nov 2011 00:38:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>(Very) simple Twitter user similarity</title>
		<link>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/</link>
		<comments>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 20:54:28 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1420</guid>
		<description><![CDATA[In this post I am using basic web data extraction combined with ideas and python code from Toby Segaran&#8216;s Programming Collective Intelligence to show a (very) simple Twitter user similarity mechanism. Generating a list of users There are lots of ways of putting together a list of Twitter users. If you&#8217;re on Twitter, you could [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I am using basic web data extraction combined with ideas and python code from <a href="http://blog.kiwitobes.com/">Toby Segaran</a>&#8216;s <a href="http://en.wikipedia.org/wiki/Programming_Collective_Intelligence">Programming Collective Intelligence</a> to show a (very) simple Twitter user similarity mechanism.</p>
<p><strong>Generating a list of users</strong></p>
<p>There are lots of ways of putting together a list of Twitter users. If you&#8217;re on Twitter, you could use the list of your followers or the list of those you are following. You could extract user names from a list of <a href="http://search.twitter.com/search?q=technology">search results</a>, the <a href="http://twitter.com/public_timeline">public timeline</a> or a <a href="http://wefollow.com/">twitter directory</a>. There are lots of options. The following code uses a regular expression to extract the user names from a wefollow page.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> getWefollowTwitterUsers<span style="color: black;">&#40;</span>category = <span style="color: #483d8b;">&quot;tech&quot;</span><span style="color: black;">&#41;</span>:
        users = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        url = <span style="color: #483d8b;">&quot;http://wefollow.com/twitter/&quot;</span>
        url += category
        html = <span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        users = <span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;&quot;nofollow&quot;&gt;(.*?)&lt;/a&gt;&lt;/strong&gt;&quot;&quot;&quot;</span>, html<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> users
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> getWefollowTwitterUsers<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
Output:
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'kevinrose'</span>, <span style="color: #483d8b;">'google'</span>, <span style="color: #483d8b;">'LeoLaporte'</span>, <span style="color: #483d8b;">'mashable'</span>, <span style="color: #483d8b;">'TechCrunch'</span>, <span style="color: #483d8b;">'Veronica'</span>, 
<span style="color: #483d8b;">'alexalbrecht'</span>, <span style="color: #483d8b;">'ev'</span>, <span style="color: #483d8b;">'patricknorton'</span>, <span style="color: #483d8b;">'Scobleizer'</span>, <span style="color: #483d8b;">'woot'</span>, <span style="color: #483d8b;">'ijustine'</span>, <span style="color: #483d8b;">'timoreilly'</span>, 
<span style="color: #483d8b;">'guykawasaki'</span>, <span style="color: #483d8b;">'engadget'</span>, <span style="color: #483d8b;">'CaliLewis'</span>, <span style="color: #483d8b;">'chrispirillo'</span>, <span style="color: #483d8b;">'wired'</span>, <span style="color: #483d8b;">'ryan'</span>, <span style="color: #483d8b;">'sarahlane'</span>, 
<span style="color: #483d8b;">'ambermac'</span>, <span style="color: #483d8b;">'ginatrapani'</span>, <span style="color: #483d8b;">'tferriss'</span>, <span style="color: #483d8b;">'fforward'</span>, <span style="color: #483d8b;">'mollywood'</span><span style="color: black;">&#93;</span></pre></div></div>

<p><strong>Retrieving a list of messages for each user</strong></p>
<p>Each user&#8217;s messages are available in an RSS feed of the format http://twitter.com/statuses/user_timeline/<user>.rss?count=[1..200]. The count parameter is optional and controls the maximum number of messages contained in the feed. The following code uses <a href="http://feedparser.org/">Universal Feed Parser</a> to extract the entries from the data feed.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> feedparser
&nbsp;
<span style="color: #808080; font-style: italic;"># [...]</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> getUserMessages<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>:
        url = <span style="color: #483d8b;">&quot;http://twitter.com/statuses/user_timeline/&quot;</span> + <span style="color: #dc143c;">user</span> + <span style="color: #483d8b;">&quot;.rss?count=200&quot;</span>
        feed_data = feedparser.<span style="color: black;">parse</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> feed_data.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;entries&quot;</span>, <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p><strong>Generating keyword scores</strong></p>
<p>The following code goes through a user&#8217;s messages, breaks them into fragments and counts the number of instances for each encountered word.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getKeywordScores<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, messages<span style="color: black;">&#41;</span>:
        keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        blacklist = <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;a&quot;</span>, <span style="color: #483d8b;">&quot;an&quot;</span>, <span style="color: #483d8b;">&quot;by&quot;</span>, <span style="color: #483d8b;">&quot;on&quot;</span>, <span style="color: #483d8b;">&quot;that&quot;</span>, <span style="color: #483d8b;">&quot;the&quot;</span>, <span style="color: #483d8b;">&quot;these&quot;</span>, <span style="color: #483d8b;">&quot;this&quot;</span>, <span style="color: #483d8b;">&quot;those&quot;</span>, <span style="color: #483d8b;">&quot;to&quot;</span><span style="color: black;">&#93;</span>
        <span style="color: #808080; font-style: italic;"># and many more words</span>
        blacklist.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> message <span style="color: #ff7700;font-weight:bold;">in</span> messages:
                tweet = message<span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;summary&quot;</span><span style="color: black;">&#93;</span>
                words = <span style="color: #dc143c;">re</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot; &quot;</span>, tweet<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words:
                        word = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;^<span style="color: #000099; font-weight: bold;">\W</span>*&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span>, word<span style="color: black;">&#41;</span>
                        word = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\W</span>*$&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span>, word<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> word.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;http://&quot;</span><span style="color: black;">&#41;</span>:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        word = word.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> word <span style="color: #ff7700;font-weight:bold;">in</span> blacklist:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> word:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        count = keywords.<span style="color: black;">get</span><span style="color: black;">&#40;</span>word, <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
                        keywords<span style="color: black;">&#91;</span>word<span style="color: black;">&#93;</span> = count + <span style="color: #ff4500;">1</span>
        final_keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> keywords:
                <span style="color: #ff7700;font-weight:bold;">if</span> keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
                        final_keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> final_keywords</pre></div></div>

<p><strong>Computing similarities</strong></p>
<p>The code to compute similarity scores and the ideas behind that are presented in <a href="http://en.wikipedia.org/wiki/Programming_Collective_Intelligence">Programming Collective Intelligence</a>. The source code for the book <a href="http://blog.kiwitobes.com/?p=44">is available online</a>. The relevant pieces are in chapter2/recommendations.py &#8211; sim_distance() (<a href="http://en.wikipedia.org/wiki/Euclidean_distance">Euclidian Distance</a>), sim_pearson() (<a href="http://en.wikipedia.org/wiki/Correlation_and_dependence#Pearson.27s_product-moment_coefficient">Pearson Coefficient</a>) and topMatches(). The latter compares one user to all others and returns the list of <em>n</em> most similar users along with their respective similarity scores.</p>
<p><strong>Similar users</strong></p>
<p>The following code brings it all together and demonstrates how we can show users that are similar to a specific one, given the computed dictionary of keyword scores.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> recommendations <span style="color: #ff7700;font-weight:bold;">import</span> sim_pearson, sim_distance, topMatches
<span style="color: #808080; font-style: italic;"># [...]</span>
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
        users = getWefollowTwitterUsers<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># add my own</span>
        users.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abendig&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> users
        user_keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #dc143c;">user</span> <span style="color: #ff7700;font-weight:bold;">in</span> users:
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;processing data for:&quot;</span>, <span style="color: #dc143c;">user</span>
                messages = getUserMessages<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>
                user_keywords<span style="color: black;">&#91;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#93;</span> = getKeywordScores<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">user</span>, messages = messages<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># Similarity between the first user and three others</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># My top three matches</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> topMatches<span style="color: black;">&#40;</span>user_keywords, <span style="color: #483d8b;">&quot;abendig&quot;</span>, n = <span style="color: #ff4500;">3</span>, similarity = sim_pearson<span style="color: black;">&#41;</span></pre></div></div>

<p>Here is the output that this produces (at the time of this writing):</p>

<div class="wp_syntax"><div class="code"><pre class="generic" style="font-family:monospace;">['kevinrose', 'google', 'LeoLaporte', 'mashable', 'TechCrunch', 'Veronica', 'alexalbrecht', 
'ev', 'patricknorton', 'Scobleizer', 'woot', 'ijustine', 'timoreilly', 'guykawasaki', 
'engadget', 'CaliLewis', 'chrispirillo', 'sarahlane', 'ryan', 'wired', 'ambermac', 
'ginatrapani', 'tferriss', 'fforward', 'mollywood', 'abendig']
processing data for: kevinrose
processing data for: google
processing data for: LeoLaporte
processing data for: mashable
processing data for: TechCrunch
processing data for: Veronica
processing data for: alexalbrecht
processing data for: ev
processing data for: patricknorton
processing data for: Scobleizer
processing data for: woot
processing data for: ijustine
processing data for: timoreilly
processing data for: guykawasaki
processing data for: engadget
processing data for: CaliLewis
processing data for: chrispirillo
processing data for: sarahlane
processing data for: ryan
processing data for: wired
processing data for: ambermac
processing data for: ginatrapani
processing data for: tferriss
processing data for: fforward
processing data for: mollywood
processing data for: abendig
0.693852667302
0.57137732992
0.350957713398
[(0.85762813072101673, 'ginatrapani'), 
(0.81973579573386002, 'CaliLewis'), 
(0.81455896587667598, 'timoreilly')]</pre></div></div>

<p>The results suggest the users <a href="http://twitter.com/ginatrapani">ginatrapani</a>, <a href="http://twitter.com/CaliLewis">CaliLewis</a> and <a href="http://twitter.com/timoreilly">timoreilly</a> as related to <a href="http://twitter.com/abendig">abendig</a> based on the available data and thus maybe worth following.</p>
<p><strong>Next</strong></p>
<p>This showed an example of directly applying code and ideas from the book Programming Collective Intelligence to Twitter users and their message streams. This is of course also pretty simplified. User similarity is an interesting problem though. </p>
<p>There are lots of ways to make this more useful. The realtime nature of the message streams should be taken into account. Users&#8217; posting frequency may matter. Also, people&#8217;s interests certainly change. Overall similarity is useful, but similarity based on time ranges could also be interesting. </p>
<p>URLs that are included in the messages are currently mostly ignored. It would of course make a lot of sense to include them (don&#8217;t forget to deduplicate the various URL shortener versions of the same URL) to be able to take into account that several people may be talking about the same articles. </p>
<p>Simple keyword counts are pretty crude. Semantic analysis of the messages would be useful to get an indicator of whether two people are talking about similar things even though they are using different words, if their opinions are similar, and so forth. </p>
<p>Oh, and scale it up to include millions of users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Finding frequent items in a data stream</title>
		<link>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/</link>
		<comments>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 16:44:59 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1016</guid>
		<description><![CDATA[In Finding the Frequent Items in Streams of Data [PDF], Graham Cormode and Marios Hadjieleftheriou discuss the frequent items problem and some of the algorithms that are used to solve it: The frequent items problem is to process a stream of items and ﬁnd all those which occur more than a given fraction of the [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://portal.acm.org/citation.cfm?id=1562789&#038;dl=GUIDE&#038;coll=GUIDE&#038;CFID=61620557&#038;CFTOKEN=15416114">Finding the Frequent Items in Streams of Data</a> [<a href="http://dimacs.rutgers.edu/~graham/pubs/papers/freqcacm.pdf">PDF</a>], <a href="http://dimacs.rutgers.edu/~graham/">Graham Cormode</a> and <a href="http://www2.research.att.com/~marioh/">Marios Hadjieleftheriou</a> discuss the frequent items problem and some of the algorithms that are used to solve it:</p>
<blockquote><p>
The frequent items problem is to process a stream of items and ﬁnd all those which occur more than a given fraction of the time. It is one of the most heavily studied problems in mining data streams, dating back to the 1980s. Many other applications rely directly or indirectly on ﬁnding the frequent items, and implementations are in use in large scale industrial systems. In this paper, we describe the most important algorithms for this problem in a common framework. We place the different solutions in their historical context, and describe the connections between them, with the aim of clarifying some of the confusion that has surrounded their properties.
</p></blockquote>
<p>Some of the interesting bits here are that the data stream will easily contain millions (or billions) of items and the algorithm will typically only get to take one look at each item as it comes up in the stream.</p>
<p><strong>Space-Saving</strong></p>
<p>In this post I focus on the Space-Saving algorithm and provide an implementation in Python. <span id="more-1016"></span>The algorithm itself is originally described in <strong>Efficient Computation of Frequent and Top-k Elements in Data Streams</strong> [<a href="http://www.cs.ucsb.edu/~dsl/publications/2005/ICDT2005-metwally.pdf">PDF</a>] by <a href="http://www.cs.ucsb.edu/~metwally/">Ahmed Metwally</a>, <a href="http://www.cs.ucsb.edu/~agrawal/">Divyakant Agrawal</a>, and <a href="http://www.cs.ucsb.edu/~amr/">Amr El Abbadi</a>:</p>
<blockquote><p>
We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-<em>k</em> and frequent elements with tight guarantees on errors. For general data distributions, our top-<em>k</em> algorithm can return a set of <em>k&#8217;</em> elements, where <em>k&#8217;</em> &asymp; <em>k</em>, which are guaranteed to be the top-<em>k&#8217;</em> elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-<em>k</em> queries, we ensure that only the top-<em>k</em> elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.
</p></blockquote>
<p>The algorithm basically works like this: The stream is processed one item at a time. A collection of <em>k</em> distinct items and their associated counters is maintained. If a new item is encountered and fewer than <em>k</em> items are in the collection, then the item is added and its counter is set to 1. If the item is already in the collection, its counter is increased by 1. If the item is not in the collection and the collection already has a size of <em>k</em>, then the item with lowest counter is removed and the new item is added, with its counter set to one larger than the previous minimum counter.</p>
<p>Here is some pseudo code to make this clearer:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudo" style="font-family:monospace;">SpaceSaving(k, stream):
collection = empty collection
for each element in stream:
    if element in collection:
    then collection[element] += 1
    else if length of collection &lt; k:
        then add element to collection, collection[element] = 1
    else:
        current_minimum_element = element with lowest count value in collection
        current_minimum = collection[current_minimum_element]
        remove current_minimum_element from collection
        collection[element] = current_minimum + 1</pre></div></div>

<p><strong>The straightforward approach</strong></p>
<p>A first, easy implementation would use a simple hashtable, such as in the following piece of code:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> space_saving_frequent_k1<span style="color: black;">&#40;</span>k, stream, debug=<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> get_smallest_key<span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Given dictionary d, returns the key associated with
                the lowest value in the dictionary.
                &quot;&quot;&quot;</span>
                min_key = <span style="color: #008000;">None</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> key <span style="color: #ff7700;font-weight:bold;">in</span> d:
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> min_key <span style="color: #ff7700;font-weight:bold;">or</span> d<span style="color: black;">&#91;</span>key<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span> d<span style="color: black;">&#91;</span>min_key<span style="color: black;">&#93;</span>:
                                min_key = key
                <span style="color: #ff7700;font-weight:bold;">return</span> min_key
&nbsp;
        counters = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> element <span style="color: #ff7700;font-weight:bold;">in</span> stream:
                <span style="color: #ff7700;font-weight:bold;">if</span> counters.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>counters<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> k:
                        counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        current_minimum_key = get_smallest_key<span style="color: black;">&#40;</span>counters<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> current_minimum_key:
                                counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = counters<span style="color: black;">&#91;</span>current_minimum_key<span style="color: black;">&#93;</span> + <span style="color: #ff4500;">1</span>
                                <span style="color: #ff7700;font-weight:bold;">del</span> counters<span style="color: black;">&#91;</span>current_minimum_key<span style="color: black;">&#93;</span>
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> counters</pre></div></div>

<p>This works for smaller data sets and particularly, when there is never a need to find that smallest element. Otherwise however, (repeatedly) retrieving the element with the minimum count remains a comparatively costly challenge.</p>
<p><strong>Stream-Summary</strong></p>
<p>When describing the Space-Saving algorithm, the authors also introduced the Stream-Summary data structure (inspired by work in <a href="http://portal.acm.org/citation.cfm?id=740658">Frequency Estimation of Internet Packet Streams with Limited Space</a> [<a href="http://erikdemaine.org/papers/NetworkStats_ESA2002/paper.pdf">PDF</a>]), which groups elements with equal values together (in buckets) and allows quick retrieval of the element with the lowest count.</p>
<p>Here is a diagram of this structure, using three buckets and a total of six elements (E1-E6).</p>
<p><img src="http://www.notjustrandom.com/wp-content/uploads/2009/11/frequent_items.jpg" alt="frequent_items" title="Stream-Summary" style="border: 1px solid black;" width="431" height="171" class="alignnone size-full wp-image-1116" /></p>
<p>Buckets are stored in a list sorted by the buckets&#8217; respective values. Each bucket maintains knowledge of associated elements. Each element in turn maintains a pointer to its bucket. The latter is implemented using a simple hashtable. If an element&#8217;s count needs to be increased, the element is removed from its current bucket and added to the neighboring bucket with value one greater than the previous one. If no such bucket exists, it is inserted in the bucket list. Empty buckets are removed.</p>
<p>The Python implementation using the Stream-Summary data structure may then look like this:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> Bucket<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, value = <span style="color: #ff4500;">1</span>, elements = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = value
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__str__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;%s: %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">value</span>, <span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> append<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> first_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">None</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> has_elements<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> remove<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> StreamSummary<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;
        Maintains a dictionary of elements and a list of buckets. Each element
        points to a (parent) bucket.
        The bucket list is sorted based on the buckets' values. Each bucket also
        maintains a list of elments.
        This has the effect of grouping elements with equal values in buckets.
        &quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span> = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__len__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">keys</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__str__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                result = <span style="color: #483d8b;">&quot;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> b <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>:
                        result += <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>b<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; &quot;</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> result
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> add_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Adds an element and ensures it's assigned to the correct bucket.
                &quot;&quot;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span> <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span> <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">1</span>:
                                <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">insert</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, Bucket<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                        <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">elements</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> increase_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Increasing an element's value also means assigning it to the
                correct bucket. That can result in creating a new bucket and/or
                removing an empty one.
                &quot;&quot;&quot;</span>
                current_bucket = <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span>
                bucket_index = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">index</span><span style="color: black;">&#40;</span>current_bucket<span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#41;</span> == bucket_index + <span style="color: #ff4500;">1</span>:
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>Bucket<span style="color: black;">&#40;</span>value = current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index + <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span> <span style="color: #66cc66;">&gt;</span> current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span>:
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">insert</span><span style="color: black;">&#40;</span>bucket_index + <span style="color: #ff4500;">1</span>,
                                            Bucket<span style="color: black;">&#40;</span>value = current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                current_bucket.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index + <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> current_bucket.<span style="color: black;">has_elements</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                        <span style="color: #ff7700;font-weight:bold;">del</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index<span style="color: black;">&#93;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> has_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> get_minimum<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">first_element</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">None</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> replace_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, old_element, new_element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Replaces an existing element with an entirely new element in
                the old element's bucket.
                &quot;&quot;&quot;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>old_element<span style="color: black;">&#93;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span>.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>old_element<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>new_element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">del</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>old_element<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> space_saving_frequent_k<span style="color: black;">&#40;</span>k, stream<span style="color: black;">&#41;</span>:
        summary = StreamSummary<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> element <span style="color: #ff7700;font-weight:bold;">in</span> stream:
                <span style="color: #ff7700;font-weight:bold;">if</span> summary.<span style="color: black;">has_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        summary.<span style="color: black;">increase_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>summary<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> k:
                        summary.<span style="color: black;">add_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        current_minimum_key = summary.<span style="color: black;">get_minimum</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> current_minimum_key:
                                summary.<span style="color: black;">replace_element</span><span style="color: black;">&#40;</span>current_minimum_key, element<span style="color: black;">&#41;</span>
                                summary.<span style="color: black;">increase_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                summary.<span style="color: black;">add_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> summary</pre></div></div>

<p>For larger data sets, where <em>k</em> is noticeably smaller than the number of distinct elements in the set, the Stream-Summary data structure proves advantageous.</p>
<p><strong>Onward</strong></p>
<p>There is a lot of ongoing research in this problem area. This article is clearly just barely offering a small (and simplified) glimpse. Explore the research. Find out what real-world applications use some version of this as part of their problem solving approach. Applications can be found in web access log processing, search applications, mining of real-time message streams, and so forth. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Matching circular string rotations</title>
		<link>http://www.notjustrandom.com/2008/11/12/matching-circular-string-rotations/</link>
		<comments>http://www.notjustrandom.com/2008/11/12/matching-circular-string-rotations/#comments</comments>
		<pubDate>Thu, 13 Nov 2008 03:11:55 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/?p=139</guid>
		<description><![CDATA[This post examines two questions: Is one string a circular rotation of a second string? Is one string a substring of a (potentially rotated) second string? Previously discussed Z-values are used for a solution. Circular string rotation? Given a prefix p of string t and a suffix s of string t, such that p + [...]]]></description>
			<content:encoded><![CDATA[<p>This post examines two questions:</p>
<ol>
<li>Is one string a circular rotation of a second string?</li>
<li>Is one string a substring of a (potentially rotated) second string?</li>
</ol>
<p><a href="http://www.notjustrandom.com/blog/2008/11/01/the-z-algorithm/">Previously discussed Z-values</a> are used for a solution.</p>
<p><strong>Circular string rotation?</strong></p>
<p>Given a prefix p of string t and a suffix s of string t, such that p + s = t, then a circular string rotation r is a string of the form s + p. Thus, it is a string that consists of the suffix of string t, directly followed by the prefix of string t, such that the resulting string r has the same length as the original string t: |r| = |t|.</p>
<p>Example:</p>
<p>Original:<br />
t = &#8220;abcd&#8221;</p>
<p>Rotation:<br />
r = &#8220;cdab&#8221;, with p = &#8220;ab&#8221; and s = &#8220;cd&#8221;</p>
<p><strong>How can one string be shown to be a circular rotation of a second string?</strong></p>
<p>If it is, then the lengths of the two strings t and r have to be identical. Also, since t = p + s and the rotation r = s + p, then 2r = r + r = s + p + s + p = s + t + p. Thus, if r is a rotation of t, then t is included completely within 2r. Not only that, but if it is shown that r is a rotation of t, then if t is found within 2r, more specific information can be shown about the rotation, because p and s can easily be indicated, too.</p>
<p>Assuming access to a fast string matching routine, the mechanism to answer the question should be straightforward:</p>
<pre>
If |t| = |r| and t in r+r
Then Circular Rotation
Else No Circular Rotation
</pre>
<p><strong>Saving space</strong></p>
<p>It may not be desirable to create a new string that contains twice the data of r, particularly assuming very large instances of strings r and t. Since r  = s + p is available though, it is easy to imagine what 2r would look like, without actually creating a new string consisting of 2r. </p>
<p>The <a href="http://www.notjustrandom.com/blog/2008/11/01/the-z-algorithm/">previously mentioned Z algorithm</a> can be heavily modified to </p>
<ol>
<li>operate on two strings, and </li>
<li>allow for Z-values up to maximum of the string length, regardless of the index position, effectively allowing for a substring to begin towards the end of the string and continue at the beginning of the string.</li>
</ol>
<p><strong>getMaxSuffixZ</strong></p>
<p>Let getMaxSuffixZ be a process that takes as input two strings, t and r and returns a maximum Z-value <= |t|. This involves calculating Z-values for each position in r and returning the largest one. In this case the Z-values are not calculated based on a prefix of r. Rather, string t is considered the prefix. So, if r is a rotation of t and t = p + s, then r = s + p, so maxZ should return |p|. It then only makes sense for maxZ to pick that maximum Z-Value that represents a substring that stretches to the end of r.</p>
<p>Here is the modified implementation:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getMaxSuffixZ<span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span>:
	result = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
	l = <span style="color: #ff4500;">0</span>
	r =  -<span style="color: #ff4500;">1</span>
	<span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
		<span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #66cc66;">&gt;</span> r:
			zk = <span style="color: #ff4500;">0</span>
			<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
				<span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
					si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
					p<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
					zk += <span style="color: #ff4500;">1</span>
				<span style="color: #ff7700;font-weight:bold;">else</span>:
					<span style="color: #ff7700;font-weight:bold;">break</span>
			<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>:
				r = zk + k - <span style="color: #ff4500;">1</span>
				l = k
		<span style="color: #ff7700;font-weight:bold;">else</span>:
			kOld = k - l - <span style="color: #ff4500;">1</span>
			zOld = result<span style="color: black;">&#91;</span>kOld<span style="color: black;">&#93;</span>
			b = r - k + <span style="color: #ff4500;">1</span>
			<span style="color: #ff7700;font-weight:bold;">if</span> zOld <span style="color: #66cc66;">&lt;</span> b:
				zk = zOld
			<span style="color: #ff7700;font-weight:bold;">else</span>:
				zk = b
				<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>b, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
					<span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> \
						<span style="color: #ff7700;font-weight:bold;">and</span> si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> \
						<span style="color: #ff7700;font-weight:bold;">and</span> p<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> \
						== s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
						<span style="color: #ff7700;font-weight:bold;">pass</span>
					<span style="color: #ff7700;font-weight:bold;">else</span>:
						<span style="color: #ff7700;font-weight:bold;">break</span>
				zk = si
				r = zk + k - <span style="color: #ff4500;">1</span>
				l = k
		result<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = zk
		<span style="color: #ff7700;font-weight:bold;">if</span> zk == <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> - k:
			<span style="color: #ff7700;font-weight:bold;">return</span> zk
	<span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span></pre></div></div>

<p>This returns the expected results at least for this sample set:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;a&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;b&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;defabc&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">3</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;defabcd&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;abcdef&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;fabcde&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">5</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'all tests passed.'</span>
&nbsp;
<span style="color: #008000;">all</span> tests passed.</pre></div></div>

<p>Let z = getMaxSuffixZ(t, r). z is the prefix of t (and the suffix of r). If z > 0 then t[0..z-1] = r[z..|r|]. Similarly getMaxSuffixZ(r, t) yields the prefix of r (and the suffix of t).</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">s = <span style="color: #483d8b;">&quot;abcdef&quot;</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;a&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">1</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;b&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;defabc&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">3</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;defabcd&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abcdef&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxSuffixZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;fabcde&quot;</span>, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">1</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'all tests passed.'</span>
&nbsp;
<span style="color: #008000;">all</span> tests passed.</pre></div></div>

<p>If r is a rotation of t, then getMaxSuffixZ(t, r) + getMaxSuffixZ(r, t) = |t| = |r|. If t = r, then getMaxSuffixZ(t, r) = getMaxSuffixZ(r, t) = |t| = |r|. This leads to the following flow:</p>
<pre>
If |t| = |r|
Then
    zTr = getMaxSuffixZ(t, r)
    If zTr = |t|:
    Then Circular Rotation (t = r)
    Else
        If zTr = getMaxSuffixZ(r, t)
        Then Circular Rotation
        Else No Circular Rotation
Else No Circular Rotation
</pre>
<p><strong>getMaxZ</strong></p>
<p>Let getMaxZ have the same attributes of getMaxSuffixZ, with one exception: When finding matching substrings that begin at position k and reach to the end of the string s, the algorithm continues comparisons at the beginning of string s up to a maximum of position k-1. It no longer makes sense to stop at the first substring that reaches to the end of the string, as later substrings could have a longer reach into the beginning of the string. It does make sense to return a result as soon as a z-value is found that equals the length of the prefix.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getMaxZ<span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span>:
	result = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
	l = <span style="color: #ff4500;">0</span>
	r =  -<span style="color: #ff4500;">1</span>
	maxZk = <span style="color: #ff4500;">0</span>
	<span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
		<span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #66cc66;">&gt;</span> r:
			zk = <span style="color: #ff4500;">0</span>
			<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
				<span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
					si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
					p<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
					zk += <span style="color: #ff4500;">1</span>
				<span style="color: #ff7700;font-weight:bold;">else</span>:
					<span style="color: #ff7700;font-weight:bold;">break</span>
			<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>:
				r = zk + k - <span style="color: #ff4500;">1</span>
				l = k
				<span style="color: #ff7700;font-weight:bold;">if</span> r == <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> - <span style="color: #ff4500;">1</span>:
					r2 = <span style="color: #ff4500;">0</span>
					<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, k<span style="color: black;">&#41;</span>:
						<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
						p<span style="color: black;">&#91;</span>zk<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span>:
							zk += <span style="color: #ff4500;">1</span>
							r2 += <span style="color: #ff4500;">1</span>
						<span style="color: #ff7700;font-weight:bold;">else</span>:
							<span style="color: #ff7700;font-weight:bold;">break</span>
		<span style="color: #ff7700;font-weight:bold;">else</span>:
			kOld = k - l - <span style="color: #ff4500;">1</span>
			zOld = result<span style="color: black;">&#91;</span>kOld<span style="color: black;">&#93;</span>
			b = r - k + <span style="color: #ff4500;">1</span>
			<span style="color: #ff7700;font-weight:bold;">if</span> zOld <span style="color: #66cc66;">&lt;</span> b:
				zk = zOld
			<span style="color: #ff7700;font-weight:bold;">else</span>:
				zk = b
				<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>b, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
					<span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> \
						<span style="color: #ff7700;font-weight:bold;">and</span> si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> \
						<span style="color: #ff7700;font-weight:bold;">and</span> p<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> \
						== s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
						<span style="color: #ff7700;font-weight:bold;">pass</span>
					<span style="color: #ff7700;font-weight:bold;">else</span>:
						<span style="color: #ff7700;font-weight:bold;">break</span>
				zk = si
				r = zk + k - <span style="color: #ff4500;">1</span>
				l = k
				<span style="color: #ff7700;font-weight:bold;">if</span> r == <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> - <span style="color: #ff4500;">1</span>:
					r2 = <span style="color: #ff4500;">0</span>
					<span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, k<span style="color: black;">&#41;</span>:
						<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
						p<span style="color: black;">&#91;</span>zk<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span>:
							zk += <span style="color: #ff4500;">1</span>
							r2 += <span style="color: #ff4500;">1</span>
						<span style="color: #ff7700;font-weight:bold;">else</span>:
							<span style="color: #ff7700;font-weight:bold;">break</span>
		result<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = zk
		<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&gt;</span>= <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> - k:
			<span style="color: #ff7700;font-weight:bold;">if</span> zk <span style="color: #66cc66;">&gt;</span> maxZk:
				maxZk = zk
		<span style="color: #ff7700;font-weight:bold;">if</span> zk == <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
			<span style="color: #ff7700;font-weight:bold;">return</span> zk
	<span style="color: #ff7700;font-weight:bold;">return</span> maxZk</pre></div></div>

<p>Tests show the correct results:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">s = <span style="color: #483d8b;">&quot;abcdef&quot;</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, s<span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;fabcde&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;fgabcde&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;a&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">1</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;ba&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">2</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;xyz&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;defabcabcdabc&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">assert</span> getMaxZ<span style="color: black;">&#40;</span>s, <span style="color: #483d8b;">&quot;defabc&quot;</span><span style="color: black;">&#41;</span> == <span style="color: #ff4500;">6</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'all tests passed.'</span>
&nbsp;
<span style="color: #008000;">all</span> tests passed.</pre></div></div>

<p>Following the above manner, the resulting flow looks like this:</p>
<pre>
If |t| = |r| and maxZ(t, r) = |t|
Then Circular Rotation
Else No Circular Rotation
</pre>
<p><strong>Matching substrings in string rotations</strong></p>
<p>Given the above discussion, a mechanism can be shown that allows the matching of substrings in circular string rotations. To illustrate the problem, in a string t = &#8220;abcd&#8221; it would then be possible to find substrings such as &#8220;abc&#8221;, &#8220;cda&#8221;, &#8220;da&#8221;, &#8220;dab&#8221;, etc.</p>
<p>The above algorithm can be applied directly:</p>
<pre>
Let t = text and p = substring to search
If getMaxZ(p, t) = |p|
Then p found in t
Else p not found in t
</pre>
<p>Likewise, getMaxZ can be applied to this problem directly:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> showFound<span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">if</span> getMaxZ<span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span> == <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>p<span style="color: black;">&#41;</span>:
		<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%s found in %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span>
	<span style="color: #ff7700;font-weight:bold;">else</span>:
		<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%s not found in %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>p, s<span style="color: black;">&#41;</span>
&nbsp;
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;ab&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;fa&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abcdef&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;x&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abc&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span><span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;efabcd&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;efgabc&quot;</span>, s<span style="color: black;">&#41;</span>
showFound<span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abd&quot;</span>, s<span style="color: black;">&#41;</span></pre></div></div>

<p>The results look correct.</p>
<pre>
ab found in abcdef
fa found in abcdef
abcdef found in abcdef
x not found in abcdef
abc not found in
efabcd found in abcdef
efgabc not found in abcdef
abd not found in abcdef
</pre>
<p>It would make sense to add modifications to allow detecting all instances (up to a specifiable number) of substring matches along with their positions.</p>
<p><strong>Next?</strong></p>
<p>Run-time analysis. And lots of code cleanup!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2008/11/12/matching-circular-string-rotations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Z Algorithm</title>
		<link>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/</link>
		<comments>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/#comments</comments>
		<pubDate>Sat, 01 Nov 2008 22:26:24 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Z]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/?p=128</guid>
		<description><![CDATA[In Algorithms on Strings, Trees and Sequences, Dan Gusfield presents the Z algorithm. A string prefix P is a substring of S that begins at S[0]. The Z algorithm calculates for each position k > 0 in S the maximum length of P that is matched by a substring starting at k. The algorithm performs [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.amazon.com/Algorithms-Strings-Trees-Sequences-Computational/dp/0521585198">Algorithms on Strings, Trees and Sequences</a>, Dan Gusfield presents the Z algorithm. </p>
<p>A string prefix P is a substring of S that begins at S[0]. The Z algorithm calculates for each position k > 0 in S the maximum length of P that is matched by a substring starting at k. The algorithm performs in linear time by keeping track of previously calculated values and recognizing, if the character at a currently examined start position is within a previously detected substring.</p>
<p><strong>An example</strong></p>
<p>Here is a string S and its Z-values for each index position k:</p>
<p>S = &#8216;aabcaabxaaaz&#8217;</p>
<table>
<tr>
<th width="35%">k
<th width="35%">s[k]
<th width="30%">Z</p>
<tr>
<td>0
<td>a
<td>n/a</p>
<tr>
<td>1
<td>a
<td>1</p>
<tr>
<td>2
<td>b
<td>0</p>
<tr>
<td>3
<td>c
<td>0</p>
<tr>
<td>4
<td>a
<td>3</p>
<tr>
<td>5
<td>a
<td>1</p>
<tr>
<td>6
<td>b
<td>0</p>
<tr>
<td>7
<td>x
<td>0</p>
<tr>
<td>8
<td>a
<td>2</p>
<tr>
<td>9
<td>a
<td>2</p>
<tr>
<td>10
<td>a
<td>1</p>
<tr>
<td>11
<td>z
<td>0<br />
</table>
<p><strong>Step by step</strong></p>
<p>Let&#8217;s look at a these one at a time to show how easily the Z-values can be computed.</p>
<p><strong>k = 1</strong></p>
<p>Then s[0] = s[1], but s[1] != [2], so Z[1] = 1. The matched substring has the boundaries l = 1 and r = 1, so s[l..r] = s[1..1] = &#8216;a&#8217;</p>
<p><strong>k = 2</strong></p>
<p>Then k > r, which means s[2] is outside the previously discovered substring. s[0] != s[2], so<br />
Z[2] = 0.</p>
<p><strong>k = 3</strong></p>
<p>In the same manner, if k = 3, s[0] != s[3], so Z[3] = 0.</p>
<p><strong>k = 4</strong></p>
<p>Then s[0] = s[4], s[1] = s[5], s[2] = s[6], but s[3] != s[7], so Z[4] = 3. The matched substring has the boundaries l = 4 and r = 6, so s[l..r] = s[4..6] = &#8216;aab&#8217;</p>
<p><strong>k = 5</strong></p>
<p>Then k <= r: s[5] is within a previously discovered substring. The previously discovered substring starts at k - l = 5 - 4 = 1. The Z-value found at that position is Z[1] = 1. Since that value is smaller than the length of the remaining substring S[k..r], there is no need to perform other character comparisons. Z[5] = Z[1] = 1. l and r remain unchanged.</p>
<p><strong>k = 6</strong></p>
<p>In the same manner, if k = 6, then Z[6] = z[2] = 0. </p>
<p><strong>k = 7</strong></p>
<p>Then k > r, s[0] != s[7], so Z[7] = 0.</p>
<p><strong>k = 8</strong></p>
<p>Then k > r, s[0] = s[8], s[1] = s[9] and s[2] != s[10], so Z[8] = 2. The matched substring has the boundaries l = 8 and r = 9, so s[l..r] = s[8..9] = &#8216;aa&#8217;.</p>
<p><strong>k = 9</strong></p>
<p>Then k <= r, so s[k] is within a previously discovered substring. The previously discovered substring starts at k - l = 9 - 8 = 1. The Z-value found at that position is Z[1] = 1. That value is not smaller than the length of the remaining substring S[k..r] = s[9..9] = 'a', so additional character comparisons need to be performed. Let b = r - k + 1 = 9 - 9 + 1. Z[9] will be >= b. s[b] = s[1] = s[k + b] = s[10], but s[2] != s[11], so Z[9] = 2. The new substring has the boundaries l = 9 and r = 10.</p>
<p><strong>k = 10</strong></p>
<p>Then k = r, so s[k] is within the previously discovered substring of s[l..r]. The z-value for that string is Z[k - l] = Z[10-9] = Z[1] = 1. Since that value is equal to the length of the remainder of the substring s[k..r], additional comparisons are performed, but since s[2] != s[11], Z[10] = Z[1] = 1.</p>
<p><strong>k = 11</strong></p>
<p>Then k < r and since s[0] != s[11], Z[11] = 0.</p>
<p><strong>Implementation</strong></p>
<p>Here is the implementation of the algorithm (as outlined in the book) in Python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getZ<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>:
        result = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
&nbsp;
        l = r = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> k <span style="color: #66cc66;">&gt;</span> r:
                        zk = <span style="color: #ff4500;">0</span>
                        <span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                                <span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> \
                                        s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
                                        <span style="color: #ff7700;font-weight:bold;">pass</span>
                                <span style="color: #ff7700;font-weight:bold;">else</span>:
                                        <span style="color: #ff7700;font-weight:bold;">break</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> si <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>:
                                zk = si
                                r = zk + k - <span style="color: #ff4500;">1</span>
                                l = k
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        kOld = k - l
                        zOld = result<span style="color: black;">&#91;</span>kOld<span style="color: black;">&#93;</span>
                        b = r - k + <span style="color: #ff4500;">1</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> zOld <span style="color: #66cc66;">&lt;</span> b:
                                zk = zOld
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                zk = b
&nbsp;
                                <span style="color: #ff7700;font-weight:bold;">for</span> si <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>b, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
                                        <span style="color: #ff7700;font-weight:bold;">if</span> k + si <span style="color: #66cc66;">&lt;</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span> \
                                                <span style="color: #ff7700;font-weight:bold;">and</span> s<span style="color: black;">&#91;</span>si<span style="color: black;">&#93;</span> \
                                                == s<span style="color: black;">&#91;</span>k + si<span style="color: black;">&#93;</span>:
                                                <span style="color: #ff7700;font-weight:bold;">pass</span>
                                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                                <span style="color: #ff7700;font-weight:bold;">break</span>
                                zk = si
                                r = zk + k - <span style="color: #ff4500;">1</span>
                                l = k
                result<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = zk
        <span style="color: #ff7700;font-weight:bold;">return</span> result</pre></div></div>

<p>That code deserves some cleanup, but it does yield the correct result:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">s = <span style="color: #483d8b;">'aabcaabxaaaz'</span>
z = getZ<span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>s<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
	<span style="color: #ff7700;font-weight:bold;">print</span> k, z<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
<span style="color: #ff4500;">1</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">2</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">3</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">4</span> <span style="color: #ff4500;">3</span>
<span style="color: #ff4500;">5</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">6</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">7</span> <span style="color: #ff4500;">0</span>
<span style="color: #ff4500;">8</span> <span style="color: #ff4500;">2</span>
<span style="color: #ff4500;">9</span> <span style="color: #ff4500;">2</span>
<span style="color: #ff4500;">10</span> <span style="color: #ff4500;">1</span>
<span style="color: #ff4500;">11</span> <span style="color: #ff4500;">0</span></pre></div></div>

<p><strong>Next?</strong></p>
<p>The Z algorithm can be used as a precursor for additional string analysis. With little modification it can also be changed to work as a simple, linear exact matching algorithm.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2008/11/01/the-z-algorithm/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>overlapping matches</title>
		<link>http://www.notjustrandom.com/2007/07/03/overlapping-matches/</link>
		<comments>http://www.notjustrandom.com/2007/07/03/overlapping-matches/#comments</comments>
		<pubDate>Wed, 04 Jul 2007 03:39:07 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/2007/07/03/overlapping-matches/</guid>
		<description><![CDATA[Let&#8217;s assume the following string is given: s = "this is a test" I want to get a list of all the 2-word pairs from the string. Here is one attempt: import re re.findall&#40;'[a-z]+\s[a-z]+', s&#41; The result is not satisfactory: ['this is', 'a test'] findall() returns non-overlapping matches. In this context this means that the [...]]]></description>
			<content:encoded><![CDATA[<p>Let&#8217;s assume the following string is given:</p>
<pre lang ="python">
s = "this is a test"
</pre>
<p>I want to get a list of all the 2-word pairs from the string. </p>
<p>Here is one attempt:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
<span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'[a-z]+<span style="color: #000099; font-weight: bold;">\s</span>[a-z]+'</span>, s<span style="color: black;">&#41;</span></pre></div></div>

<p>The result is not satisfactory:</p>
<pre>
['this is', 'a test']
</pre>
<p>findall() returns non-overlapping matches. In this context this means that the pair &#8220;is a&#8221; will not be returned, since &#8220;is&#8221; was already matched in the &#8220;this is&#8221; string.</p>
<p>This returns a complete list of pairings:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">words = <span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;[a-z]+&quot;</span>, s<span style="color: black;">&#41;</span>
maxPos = <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>words<span style="color: black;">&#41;</span> - <span style="color: #ff4500;">1</span>
currentPos = <span style="color: #ff4500;">0</span>
<span style="color: #ff7700;font-weight:bold;">while</span> currentPos <span style="color: #66cc66;">&lt;</span> maxPos:
    <span style="color: #ff7700;font-weight:bold;">print</span> words<span style="color: black;">&#91;</span>currentPos: currentPos + <span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>
    currentPos += <span style="color: #ff4500;">1</span></pre></div></div>

<p>Here is the result:</p>
<pre>
['this', 'is']
['is', 'a']
['a', 'test']
</pre>
<p>Generalizing this to make it usable for n-grams of sizes other than two:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getNGrams<span style="color: black;">&#40;</span>words, n<span style="color: black;">&#41;</span>:
    maxPos = <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>words<span style="color: black;">&#41;</span> - n + <span style="color: #ff4500;">1</span>
    currentPos = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> currentPos <span style="color: #66cc66;">&lt;</span> maxPos:
        <span style="color: #ff7700;font-weight:bold;">print</span> words<span style="color: black;">&#91;</span>currentPos: currentPos + <span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>
        currentPos += <span style="color: #ff4500;">1</span></pre></div></div>

<p>Here is how that works for n = 3:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #66cc66;">&gt;&gt;&gt;</span> getNGrams<span style="color: black;">&#40;</span>words, <span style="color: #ff4500;">3</span><span style="color: black;">&#41;</span>
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'this'</span>, <span style="color: #483d8b;">'is'</span>, <span style="color: #483d8b;">'a'</span><span style="color: black;">&#93;</span>
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'is'</span>, <span style="color: #483d8b;">'a'</span>, <span style="color: #483d8b;">'test'</span><span style="color: black;">&#93;</span></pre></div></div>

<p>It is probably more interesting to have that functions return a list of n-grams along with frequency information. </p>
<p>Also, this seems reasonably quick, using larger strings (> 200,000 words). Still I wonder, if this can be accomplished using only regular expressions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2007/07/03/overlapping-matches/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Turning a pickle string into a pickle list</title>
		<link>http://www.notjustrandom.com/2007/07/02/turning-a-pickles-string-into-a-pickles-list/</link>
		<comments>http://www.notjustrandom.com/2007/07/02/turning-a-pickles-string-into-a-pickles-list/#comments</comments>
		<pubDate>Tue, 03 Jul 2007 02:03:35 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/2007/07/02/turning-a-pickles-string-into-a-pickles-list/</guid>
		<description><![CDATA[Suppose we have a few pickled items, maybe even using different protocols: import pickle allPickles = '' allPickles += pickle.dumps&#40;'test0', protocol = 0&#41; allPickles += pickle.dumps&#40;'test1', protocol = 1&#41; allPickles += pickle.dumps&#40;'test2', protocol = 2&#41; This would result in the following data: &#34;S'test0'\np0\n.U\x05test1q\x00.\x80\x02U\x05test2q\x00.&#34; Here is an easy way to break the string apart into individual [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose we have a few pickled items, maybe even using different protocols:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">pickle</span>
allPickles = <span style="color: #483d8b;">''</span>
allPickles += <span style="color: #dc143c;">pickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test0'</span>, protocol = <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
allPickles += <span style="color: #dc143c;">pickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test1'</span>, protocol = <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>
allPickles += <span style="color: #dc143c;">pickle</span>.<span style="color: black;">dumps</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'test2'</span>, protocol = <span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This would result in the following data:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #483d8b;">&quot;S'test0'<span style="color: #000099; font-weight: bold;">\n</span>p0<span style="color: #000099; font-weight: bold;">\n</span>.U<span style="color: #000099; font-weight: bold;">\x</span>05test1q<span style="color: #000099; font-weight: bold;">\x</span>00.<span style="color: #000099; font-weight: bold;">\x</span>80<span style="color: #000099; font-weight: bold;">\x</span>02U<span style="color: #000099; font-weight: bold;">\x</span>05test2q<span style="color: #000099; font-weight: bold;">\x</span>00.&quot;</span></pre></div></div>

<p>Here is an easy way to break the string apart into individual pickles:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">pickles = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">StringIO</span>
sio = <span style="color: #dc143c;">StringIO</span>.<span style="color: #dc143c;">StringIO</span><span style="color: black;">&#40;</span>allPickles<span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">while</span> sio.<span style="color: #008000;">len</span> <span style="color: #66cc66;">&gt;</span> sio.<span style="color: black;">pos</span>:
    currentPos = sio.<span style="color: black;">pos</span>
    <span style="color: #dc143c;">pickle</span>.<span style="color: black;">load</span><span style="color: black;">&#40;</span>sio<span style="color: black;">&#41;</span>
    pickles.<span style="color: black;">append</span><span style="color: black;">&#40;</span>sio.<span style="color: black;">buf</span><span style="color: black;">&#91;</span>currentPos:sio.<span style="color: black;">pos</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Here is the resulting list:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;S'test0'<span style="color: #000099; font-weight: bold;">\n</span>p0<span style="color: #000099; font-weight: bold;">\n</span>.&quot;</span>, 
<span style="color: #483d8b;">'U<span style="color: #000099; font-weight: bold;">\x</span>05test1q<span style="color: #000099; font-weight: bold;">\x</span>00.'</span>, 
<span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\x</span>80<span style="color: #000099; font-weight: bold;">\x</span>02U<span style="color: #000099; font-weight: bold;">\x</span>05test2q<span style="color: #000099; font-weight: bold;">\x</span>00.'</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Just to verify:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> pickles:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #dc143c;">pickle</span>.<span style="color: black;">loads</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span></pre></div></div>

<p>The result is correct:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">test0
test1
test2</pre></div></div>

<p>If this displays with correct syntax highlighting on the blog, then this was also a successful test of the <a href="http://wordpress.org/extend/plugins/wp-syntax/">wp-syntax plugin</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2007/07/02/turning-a-pickles-string-into-a-pickles-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computing the day of week</title>
		<link>http://www.notjustrandom.com/2006/06/27/computing-the-day-of-week/</link>
		<comments>http://www.notjustrandom.com/2006/06/27/computing-the-day-of-week/#comments</comments>
		<pubDate>Wed, 28 Jun 2006 07:50:38 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/blog/2006/06/27/computing-the-day-of-week/</guid>
		<description><![CDATA[Here is a nice gem of an algorithm by Kim S. Larsen to compute the day of the week, where m, d and y represent month, day and year, respectively. def dow(m, d, y): if ((m == 1) or (m == 2)): m += 12 y -= 1 return (d + 2 * m + [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ddj.com/184409541">Here</a> is a nice gem of an algorithm by Kim S. Larsen to compute the day of the week, where <em>m</em>, <em>d</em> and <em>y </em>represent month, day and year, respectively.</p>
<pre>def dow(m, d, y):
    if ((m == 1) or (m == 2)):
        m += 12
        y -= 1
    return (d + 2 * m + 3 * (m + 1) / 5 +
    y + y / 4 - y / 100 + y / 400) % 7
</pre>
<p>The returned integer is a value between 0 (Monday) and 6 (Sunday).</p>
<p>Combine it with a list of day strings:</p>
<pre>days = ('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday')
</pre>
<p>Like in the following example to display the name of day, given a date:</p>
<pre>days[dow(6, 28, 2006)]
</pre>
<p>I found the algorithm in Dr. Dobbs Journal issue 229 (April 1995) and the article features a very nice description of how the formula was derived. The original C implementation of the algorithm is about as readable as the above pieces of code.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2006/06/27/computing-the-day-of-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

