<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>not just random</title>
	<atom:link href="http://www.notjustrandom.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.notjustrandom.com</link>
	<description></description>
	<lastBuildDate>Wed, 03 Mar 2010 20:23:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Pivot from MS Live Labs</title>
		<link>http://www.notjustrandom.com/2010/03/03/pivot-from-ms-live-labs/</link>
		<comments>http://www.notjustrandom.com/2010/03/03/pivot-from-ms-live-labs/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 20:23:55 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1472</guid>
		<description><![CDATA[Often connections and patterns only become clear, if the larger set is observed and not just individual pieces of data. Microsoft Live labs Pivot shows an interesting approach to visualizing online data. Gary Flake presents the project in the following video (from Ted).

The software is available for download and experimentation. I would love to try [...]]]></description>
			<content:encoded><![CDATA[<p>Often connections and patterns only become clear, if the larger set is observed and not just individual pieces of data. <a href="http://livelabs.com/">Microsoft Live labs</a> <a href="http://www.getpivot.com">Pivot</a> shows an interesting approach to visualizing online data. <a href="http://flakenstein.net/">Gary Flake</a> presents the project in the following video (from <a href="http://www.ted.com/talks/lang/eng/gary_flake_is_pivot_a_turning_point_for_web_exploration.html">Ted</a>).</p>
<p><!--copy and paste--><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/GaryFlake_2010-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/GaryFlake-2010.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=783&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=gary_flake_is_pivot_a_turning_point_for_web_exploration;year=2010;theme=what_s_next_in_tech;theme=a_taste_of_ted2010;theme=new_on_ted_com;event=TED2010;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" flashvars="vu=http://video.ted.com/talks/dynamic/GaryFlake_2010-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/GaryFlake-2010.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=783&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=gary_flake_is_pivot_a_turning_point_for_web_exploration;year=2010;theme=what_s_next_in_tech;theme=a_taste_of_ted2010;theme=new_on_ted_com;event=TED2010;"></embed></object></p>
<p>The software is available for <a href="http://www.getpivot.com">download and experimentation</a>. I would love to try it out but the <a href="http://www.getpivot.com/download/">requirements</a> make it sound like that will have to wait until I can come up with a Windows machine.</p>
<p>This looks very interesting though.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/03/03/pivot-from-ms-live-labs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>(Very) simple Twitter user similarity</title>
		<link>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/</link>
		<comments>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 20:54:28 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Twitter]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1420</guid>
		<description><![CDATA[In this post I am using basic web data extraction combined with ideas and python code from Toby Segaran&#8217;s Programming Collective Intelligence to show a (very) simple Twitter user similarity mechanism.
Generating a list of users
There are lots of ways of putting together a list of Twitter users. If you&#8217;re on Twitter, you could use the [...]]]></description>
			<content:encoded><![CDATA[<p>In this post I am using basic web data extraction combined with ideas and python code from <a href="http://blog.kiwitobes.com/">Toby Segaran</a>&#8217;s <a href="http://en.wikipedia.org/wiki/Programming_Collective_Intelligence">Programming Collective Intelligence</a> to show a (very) simple Twitter user similarity mechanism.</p>
<p><strong>Generating a list of users</strong></p>
<p>There are lots of ways of putting together a list of Twitter users. If you&#8217;re on Twitter, you could use the list of your followers or the list of those you are following. You could extract user names from a list of <a href="http://search.twitter.com/search?q=technology">search results</a>, the <a href="http://twitter.com/public_timeline">public timeline</a> or a <a href="http://wefollow.com/">twitter directory</a>. There are lots of options. The following code uses a regular expression to extract the user names from a wefollow page.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> getWefollowTwitterUsers<span style="color: black;">&#40;</span>category = <span style="color: #483d8b;">&quot;tech&quot;</span><span style="color: black;">&#41;</span>:
        users = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        url = <span style="color: #483d8b;">&quot;http://wefollow.com/twitter/&quot;</span>
        url += category
        html = <span style="color: #dc143c;">urllib</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        users = <span style="color: #dc143c;">re</span>.<span style="color: black;">findall</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;&quot;&quot;nofollow&quot;&gt;(.*?)&lt;/a&gt;&lt;/strong&gt;&quot;&quot;&quot;</span>, html<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> users
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> getWefollowTwitterUsers<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
Output:
<span style="color: black;">&#91;</span><span style="color: #483d8b;">'kevinrose'</span>, <span style="color: #483d8b;">'google'</span>, <span style="color: #483d8b;">'LeoLaporte'</span>, <span style="color: #483d8b;">'mashable'</span>, <span style="color: #483d8b;">'TechCrunch'</span>, <span style="color: #483d8b;">'Veronica'</span>, 
<span style="color: #483d8b;">'alexalbrecht'</span>, <span style="color: #483d8b;">'ev'</span>, <span style="color: #483d8b;">'patricknorton'</span>, <span style="color: #483d8b;">'Scobleizer'</span>, <span style="color: #483d8b;">'woot'</span>, <span style="color: #483d8b;">'ijustine'</span>, <span style="color: #483d8b;">'timoreilly'</span>, 
<span style="color: #483d8b;">'guykawasaki'</span>, <span style="color: #483d8b;">'engadget'</span>, <span style="color: #483d8b;">'CaliLewis'</span>, <span style="color: #483d8b;">'chrispirillo'</span>, <span style="color: #483d8b;">'wired'</span>, <span style="color: #483d8b;">'ryan'</span>, <span style="color: #483d8b;">'sarahlane'</span>, 
<span style="color: #483d8b;">'ambermac'</span>, <span style="color: #483d8b;">'ginatrapani'</span>, <span style="color: #483d8b;">'tferriss'</span>, <span style="color: #483d8b;">'fforward'</span>, <span style="color: #483d8b;">'mollywood'</span><span style="color: black;">&#93;</span></pre></div></div>

<p><strong>Retrieving a list of messages for each user</strong></p>
<p>Each user&#8217;s messages are available in an RSS feed of the format http://twitter.com/statuses/user_timeline/<user>.rss?count=[1..200]. The count parameter is optional and controls the maximum number of messages contained in the feed. The following code uses <a href="http://feedparser.org/">Universal Feed Parser</a> to extract the entries from the data feed.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> feedparser
&nbsp;
<span style="color: #808080; font-style: italic;"># [...]</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> getUserMessages<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>:
        url = <span style="color: #483d8b;">&quot;http://twitter.com/statuses/user_timeline/&quot;</span> + <span style="color: #dc143c;">user</span> + <span style="color: #483d8b;">&quot;.rss?count=200&quot;</span>
        feed_data = feedparser.<span style="color: black;">parse</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> feed_data.<span style="color: black;">get</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;entries&quot;</span>, <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span></pre></div></div>

<p><strong>Generating keyword scores</strong></p>
<p>The following code goes through a user&#8217;s messages, breaks them into fragments and counts the number of instances for each encountered word.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> getKeywordScores<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span>, messages<span style="color: black;">&#41;</span>:
        keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        blacklist = <span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;a&quot;</span>, <span style="color: #483d8b;">&quot;an&quot;</span>, <span style="color: #483d8b;">&quot;by&quot;</span>, <span style="color: #483d8b;">&quot;on&quot;</span>, <span style="color: #483d8b;">&quot;that&quot;</span>, <span style="color: #483d8b;">&quot;the&quot;</span>, <span style="color: #483d8b;">&quot;these&quot;</span>, <span style="color: #483d8b;">&quot;this&quot;</span>, <span style="color: #483d8b;">&quot;those&quot;</span>, <span style="color: #483d8b;">&quot;to&quot;</span><span style="color: black;">&#93;</span>
        <span style="color: #808080; font-style: italic;"># and many more words</span>
        blacklist.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> message <span style="color: #ff7700;font-weight:bold;">in</span> messages:
                tweet = message<span style="color: black;">&#91;</span><span style="color: #483d8b;">&quot;summary&quot;</span><span style="color: black;">&#93;</span>
                words = <span style="color: #dc143c;">re</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot; &quot;</span>, tweet<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> word <span style="color: #ff7700;font-weight:bold;">in</span> words:
                        word = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;^<span style="color: #000099; font-weight: bold;">\W</span>*&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span>, word<span style="color: black;">&#41;</span>
                        word = <span style="color: #dc143c;">re</span>.<span style="color: black;">sub</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\W</span>*$&quot;</span>, <span style="color: #483d8b;">&quot;&quot;</span>, word<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> word.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;http://&quot;</span><span style="color: black;">&#41;</span>:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        word = word.<span style="color: black;">lower</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> word <span style="color: #ff7700;font-weight:bold;">in</span> blacklist:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> word:
                                <span style="color: #ff7700;font-weight:bold;">continue</span>
                        count = keywords.<span style="color: black;">get</span><span style="color: black;">&#40;</span>word, <span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>
                        keywords<span style="color: black;">&#91;</span>word<span style="color: black;">&#93;</span> = count + <span style="color: #ff4500;">1</span>
        final_keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> k <span style="color: #ff7700;font-weight:bold;">in</span> keywords:
                <span style="color: #ff7700;font-weight:bold;">if</span> keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
                        final_keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span> = keywords<span style="color: black;">&#91;</span>k<span style="color: black;">&#93;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> final_keywords</pre></div></div>

<p><strong>Computing similarities</strong></p>
<p>The code to compute similarity scores and the ideas behind that are presented in <a href="http://en.wikipedia.org/wiki/Programming_Collective_Intelligence">Programming Collective Intelligence</a>. The source code for the book <a href="http://blog.kiwitobes.com/?p=44">is available online</a>. The relevant pieces are in chapter2/recommendations.py &#8211; sim_distance() (<a href="http://en.wikipedia.org/wiki/Euclidean_distance">Euclidian Distance</a>), sim_pearson() (<a href="http://en.wikipedia.org/wiki/Correlation_and_dependence#Pearson.27s_product-moment_coefficient">Pearson Coefficient</a>) and topMatches(). The latter compares one user to all others and returns the list of <em>n</em> most similar users along with their respective similarity scores.</p>
<p><strong>Similar users</strong></p>
<p>The following code brings it all together and demonstrates how we can show users that are similar to a specific one, given the computed dictionary of keyword scores.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> recommendations <span style="color: #ff7700;font-weight:bold;">import</span> sim_pearson, sim_distance, topMatches
<span style="color: #808080; font-style: italic;"># [...]</span>
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">&quot;__main__&quot;</span>:
        users = getWefollowTwitterUsers<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># add my own</span>
        users.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;abendig&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> users
        user_keywords = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #dc143c;">user</span> <span style="color: #ff7700;font-weight:bold;">in</span> users:
                <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;processing data for:&quot;</span>, <span style="color: #dc143c;">user</span>
                messages = getUserMessages<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">user</span><span style="color: black;">&#41;</span>
                user_keywords<span style="color: black;">&#91;</span><span style="color: #dc143c;">user</span><span style="color: black;">&#93;</span> = getKeywordScores<span style="color: black;">&#40;</span><span style="color: #dc143c;">user</span> = <span style="color: #dc143c;">user</span>, messages = messages<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># Similarity between the first user and three others</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> sim_pearson<span style="color: black;">&#40;</span>user_keywords, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, users<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># My top three matches</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> topMatches<span style="color: black;">&#40;</span>user_keywords, <span style="color: #483d8b;">&quot;abendig&quot;</span>, n = <span style="color: #ff4500;">3</span>, similarity = sim_pearson<span style="color: black;">&#41;</span></pre></div></div>

<p>Here is the output that this produces (at the time of this writing):</p>

<div class="wp_syntax"><div class="code"><pre class="generic" style="font-family:monospace;">['kevinrose', 'google', 'LeoLaporte', 'mashable', 'TechCrunch', 'Veronica', 'alexalbrecht', 
'ev', 'patricknorton', 'Scobleizer', 'woot', 'ijustine', 'timoreilly', 'guykawasaki', 
'engadget', 'CaliLewis', 'chrispirillo', 'sarahlane', 'ryan', 'wired', 'ambermac', 
'ginatrapani', 'tferriss', 'fforward', 'mollywood', 'abendig']
processing data for: kevinrose
processing data for: google
processing data for: LeoLaporte
processing data for: mashable
processing data for: TechCrunch
processing data for: Veronica
processing data for: alexalbrecht
processing data for: ev
processing data for: patricknorton
processing data for: Scobleizer
processing data for: woot
processing data for: ijustine
processing data for: timoreilly
processing data for: guykawasaki
processing data for: engadget
processing data for: CaliLewis
processing data for: chrispirillo
processing data for: sarahlane
processing data for: ryan
processing data for: wired
processing data for: ambermac
processing data for: ginatrapani
processing data for: tferriss
processing data for: fforward
processing data for: mollywood
processing data for: abendig
0.693852667302
0.57137732992
0.350957713398
[(0.85762813072101673, 'ginatrapani'), 
(0.81973579573386002, 'CaliLewis'), 
(0.81455896587667598, 'timoreilly')]</pre></div></div>

<p>The results suggest the users <a href="http://twitter.com/ginatrapani">ginatrapani</a>, <a href="http://twitter.com/CaliLewis">CaliLewis</a> and <a href="http://twitter.com/timoreilly">timoreilly</a> as related to <a href="http://twitter.com/abendig">abendig</a> based on the available data and thus maybe worth following.</p>
<p><strong>Next</strong></p>
<p>This showed an example of directly applying code and ideas from the book Programming Collective Intelligence to Twitter users and their message streams. This is of course also pretty simplified. User similarity is an interesting problem though. </p>
<p>There are lots of ways to make this more useful. The realtime nature of the message streams should be taken into account. Users&#8217; posting frequency may matter. Also, people&#8217;s interests certainly change. Overall similarity is useful, but similarity based on time ranges could also be interesting. </p>
<p>URLs that are included in the messages are currently mostly ignored. It would of course make a lot of sense to include them (don&#8217;t forget to deduplicate the various URL shortener versions of the same URL) to be able to take into account that several people may be talking about the same articles. </p>
<p>Simple keyword counts are pretty crude. Semantic analysis of the messages would be useful to get an indicator of whether two people are talking about similar things even though they are using different words, if their opinions are similar, and so forth. </p>
<p>Oh, and scale it up to include millions of users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/02/24/very-simple-twitter-user-similarity/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reaching the right people</title>
		<link>http://www.notjustrandom.com/2010/02/17/reaching-the-right-people/</link>
		<comments>http://www.notjustrandom.com/2010/02/17/reaching-the-right-people/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 15:07:20 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[email]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1345</guid>
		<description><![CDATA[Imagine this situation: A company has hundreds (maybe thousands) of employees. All of them have their own skills and areas of expertise. There is probably lots of overlap, however any one person will not know everyone in the larger group who has particular skill sets. It someone is working on a project and needs assistance [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine this situation: A company has hundreds (maybe thousands) of employees. All of them have their own skills and areas of expertise. There is probably lots of overlap, however any one person will not know everyone in the larger group who has particular skill sets. It someone is working on a project and needs assistance to overcome some technical hurdle, it could be very helpful, if they could communicate with those people who also have experience in that area. Those people might be located in entirely different parts of the company.</p>
<p><a href="http://www.computer.org/portal/web/computingnow/0209/whatsnew/internetcomputing">Semantic email addressing</a> [<a href="http://www.computer.org/portal/c/document_library/get_file?uuid=9cfee474-eb79-4147-93ec-fbb54deec9e5&#038;groupId=53319">PDF</a>] aims to solve this problem:</p>
<blockquote><p>
Email addresses are a means to an end. The goal is usually not to send an email to a particular address, but to a particular person. You want to say hello to your friend Steve or send a message to the VP of marketing at Microsoft or to the head caterer for your wedding. Ideally, you could send a message to a person just by entering his or her name, position, or some other descriptive attribute. If a person’s email address changes, the email system should send to the new address automatically. If the person matching a description differs over time, the email system should send to the person currently matching that description.
</p></blockquote>
<p>In the given example, the user would be able to get answers to his or her questions by reaching out to the people with the fitting skill sets without previously having known those people: The email system can decide, who the most appropriate receivers of the messages are.</p>
<p>I cannot help thinking that <a href="http://vark.com/">Aardvark</a> was at least a little inspired by the ideas behind semantic email addressing. Their process is simple: Users send in questions (using email, twitter, IM, etc.), Aarvark routes the question to another user is (hopefully) qualified to answer it and the user will eventually receive a response, often just a few minutes later. In this <a href="http://en.wikipedia.org/wiki/Social_search">social search</a> approach, Aardvark accomplishes the job of finding information by <a href="http://blog.vark.com/?p=352">finding the right people</a> who can provide it. The service has received very good press and was recently <a href="http://blog.vark.com/?p=361">acquired by google</a>.</p>
<p>Twitter seems like it might be a good platform for this problem area. If someone has a public twitter feed, they are essentially broadcasting their updates to the open stream and anyone can see them. It is probably safe to assume, they are at least open to the idea of talking to strangers/responding to messages from people they do not already know.</p>
<p>How could one go about finding the best people to message though? One method is certainly to <a href="http://search.twitter.com">search the message stream</a> for specific keywords and basically manually look for people who might be active in areas of interest. You can also search in and add yourself to <a href="http://wefollow.com/">one</a> <a href="http://justtweetit.com/">of</a> <a href="http://www.twellow.com">the</a> <a href="http://twitr.org">many</a> <a href="http://www.tweetfind.com">directories</a> that are being developed.</p>
<p>But, if I simply need to talk to someone and ask them &#8220;May I ask you a question about XYZ?&#8221; then clearly, a) broadcasting my question hoping that someone will answer could be very inefficient and b) first researching who the best person might be for my question(s) puts all the burden on me. </p>
<p>What if the user could simply send out the question and the system would ensure that the most appropriate people see it?</p>
<p><img style="border-left:5px solid #9999FF;" src="http://www.notjustrandom.com/wp-content/uploads/2010/02/twitter.jpg" alt="" title="twitter" width="577" height="248" class="alignnone size-full wp-image-1404" /></p>
<p>The basic idea here is this: The user submits the question (along with a set of keywords) to his or her software. The software has analyzed other users&#8217; message streams, extracted keywords, etc. and generated a knowledge base. If the query can be confidently matched to another user, a message is generated and send to that user. The message will be visible to that user as a regular name mention and they can choose whether to engage in that conversation.</p>
<p>Some of the obvious challenges:</p>
<ul>
<li>Generating of meaningful keywords/subject areas based on a person&#8217;s message stream.</li>
<li>Successful matching of queries with users.</li>
<li>Establishing an effective communication protocol that does not easily lend itself to abuse, i.e. spam.</li>
</ul>
<p>A lot of web-based social networks are great at helping you connect with people you already know. Twitter makes it easy to connect with new people. The outlined approach (or a variation thereof) might be a good way of further supporting creation of those new connections, based on areas of interest.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/02/17/reaching-the-right-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Letting them understand us better</title>
		<link>http://www.notjustrandom.com/2010/02/10/letting-them-understand-us-better/</link>
		<comments>http://www.notjustrandom.com/2010/02/10/letting-them-understand-us-better/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 04:26:48 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Affective Computing]]></category>
		<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Human Computer Interaction]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1313</guid>
		<description><![CDATA[As you first start reading Can your computer make you happy?, scenes from Space Odyssey 2001 or the more recent (and well done) Moon may readily come to mind. The author appears to foresee that reaction.

In sci-fi films, when anyone gives a computer emotions, it all goes horribly wrong. The computer becomes vain, doubtful and [...]]]></description>
			<content:encoded><![CDATA[<p>As you first start reading <a href="http://www.independent.co.uk/life-style/gadgets-and-tech/features/can-your-computer-make-you-happy-1894258.html">Can your computer make you happy?</a>, scenes from <a href="http://en.wikipedia.org/wiki/2001_%28film%29">Space Odyssey 2001</a> or the more recent (and well done) <a href="http://en.wikipedia.org/wiki/Moon_%28film%29">Moon</a> may readily come to mind. The author appears to foresee that reaction.</p>
<blockquote><p>
In sci-fi films, when anyone gives a computer emotions, it all goes horribly wrong. The computer becomes vain, doubtful and irrational and Armageddon by wayward technology is only narrowly avoided.
</p></blockquote>
<p>This is not surprising &#8211; science fiction has been informing us and becoming part of our culture for a while. It is increasingly really all around us: <a href="http://interactions.acm.org/content/?p=1291">We Are Living in a Sci-Fi World</a>.</p>
<p>Affective computing is <a href="http://en.wikipedia.org/wiki/Affective_computing">an intriguing concept</a> though: </p>
<blockquote><p>
<b>Affective computing</b> is a branch of the study and development of <a href="http://en.wikipedia.org/wiki/Artificial_intelligence" title="Artificial intelligence">artificial intelligence</a> that deals with the design of systems and devices that can recognize, interpret, and process human <a href="http://en.wikipedia.org/wiki/Emotions" title="Emotions" class="mw-redirect">emotions</a>. It is an interdisciplinary field spanning <a href="http://en.wikipedia.org/wiki/Computer_sciences" title="Computer sciences" class="mw-redirect">computer sciences</a>, <a href="http://en.wikipedia.org/wiki/Psychology" title="Psychology">psychology</a>, and <a href="http://en.wikipedia.org/wiki/Cognitive_science" title="Cognitive science">cognitive science</a>.</p></blockquote>
<p>Imagine educational software that modifies its teaching style depending on the user&#8217;s mood. Cars that communicate with other drivers, if its driver is angry, intoxicated or talking on the phone. Music players could adjust their playlist based on the listener frowning, smiling or similarly expressing themselves. Email clients could disable the send button, if the user is clearly upset and about to send out an email he or she may regret later.</p>
<p>A lot of different uses are conceivable here and this could contribute to much more personalized computing experiences.</p>
<p>Modern laptops and desktop computers are typically already equipped with microphones and cameras. Future operating systems may well feature a mood evaluation component and search engines may take information from that component as part of the search query. Similar scenarios are conceivable for other types of web-enabled applications.</p>
<p>Imagine logging in to Facebook some evening and finding a notification &#8220;John has been having a bad day. Check in with him to make sure he&#8217;s okay.&#8221; Intriguing. </p>
<p>And at least a little bit eerie.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/02/10/letting-them-understand-us-better/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>talking, questions and learning</title>
		<link>http://www.notjustrandom.com/2010/02/03/talking-questions-and-learning/</link>
		<comments>http://www.notjustrandom.com/2010/02/03/talking-questions-and-learning/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 19:59:31 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1275</guid>
		<description><![CDATA[In How Pair Programming Really Works [PDF], Stuart Wray discusses four mechanisms that contribute to successful pair programming practice. The author uses findings from cognitive psychology and neuroscience to provide evidence for his conclusions. There are some followup discussions at computingnow, reddit and hacker news.
I found particularly interesting the discussion around talking to develop understanding:

Around [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.computer.org/portal/web/computingnow/0110/whatsnew/software">How Pair Programming Really Works</a> [<a href="http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2010/0110/W_SW_PairProgramming.pdf">PDF</a>], <a href="http://www.stuartwray.net/">Stuart Wray</a> discusses four mechanisms that contribute to successful <a href="http://en.wikipedia.org/wiki/Pair_programming">pair programming</a> practice. The author uses findings from cognitive psychology and neuroscience to provide evidence for his conclusions. There are some followup discussions at <a href="http://www.computer.org/portal/web/computingnow/0110/whatsnew/software">computingnow</a>, <a href="http://www.reddit.com/r/programming/comments/aq7oo/how_pair_programming_really_works/">reddit</a> and <a href="http://news.ycombinator.com/item?id=1056174">hacker news</a>.</p>
<p>I found particularly interesting the discussion around talking to develop understanding:</p>
<blockquote><p>
Around 1980, as computer science undergraduate students at the University of Cambridge, my friends and I noticed a strange phenomenon that we called expert programmer theory. When one of us had trouble getting our programs to work, we’d describe the nonfunctioning state of our code to each other over coffee. Quite often, we’d realize in a flash what was wrong and how to solve it. These epiphanies were quite independent of the other person having any real understanding of our problems—the listener often seemed little wiser about the subject.
</p></blockquote>
<p>I have experienced similar scenarios and this can be both relieving (finally solved the problem!) and frustrating (why didn&#8217;t I think of this a few minutes ago?).</p>
<p>Explaining something to another person or even an <a href="http://www.cb1.com/~john/computing/rubber-plant-effect.html">object</a> can help the person&#8217;s own understanding.  Wray points out that it is helpful, if we can talk to an expert, even if that expertise is large based on perception. The main reason seems to be that that person would be more likely to ask us deep questions that we can ponder or that may influence our thinking.</p>
<p>The ability to ask questions that are most appropriate for the given situation seems most valuable: Questions that don&#8217;t require too large a leap, but rather motivate the person to advance just a little further &#8211; questions that stimulate thinking.</p>
<p>What if software that we use daily asked us questions?</p>
<p>Lots of scenarios are conceivable, but here is one example. Imagine a news website that attaches to each article a module that contains at least one interesting question, such as &#8220;Do you think this policy change will effectively solve problem XYZ?&#8221;, &#8220;What do you think of senator X&#8217;s position on Y?&#8221;, &#8220;What if the economic situation in Y would change in Z way?&#8221; and so forth. These would be meaningful questions, based on the content of the article and meant to stimulate intelligent discourse (readers could leave responses and discuss amongst themselves). These questions would also ideally be automatically generated.</p>
<p>If we can accept that good questions at the right time can help our understanding and that deeper understanding is generally a good thing, then I think we will benefit from giving software more of an ability to ask questions &#8211; for our own benefit.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2010/02/03/talking-questions-and-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>From offline shopping to wanting an Internet sense</title>
		<link>http://www.notjustrandom.com/2009/12/16/from-offline-shopping-to-wanting-an-internet-sense/</link>
		<comments>http://www.notjustrandom.com/2009/12/16/from-offline-shopping-to-wanting-an-internet-sense/#comments</comments>
		<pubDate>Wed, 16 Dec 2009 18:06:55 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1244</guid>
		<description><![CDATA[Shopping online has fundamentally changed my expectations and comfort level, when I buy things in general. I noticed this clearly, when I recently ventured to a local shopping mall to attempt some not-yet-too-late holiday shopping &#8211; offline.
I do a significant portion of my shopping on the Internet and I have come to appreciate customer reviews, [...]]]></description>
			<content:encoded><![CDATA[<p>Shopping online has fundamentally changed my expectations and comfort level, when I buy things in general. I noticed this clearly, when I recently ventured to a local shopping mall to attempt some not-yet-too-late holiday shopping &#8211; offline.</p>
<p>I do a significant portion of my shopping on the Internet and I have come to appreciate customer reviews, <a href="http://en.wikipedia.org/wiki/Recommender_system">recommender systems</a> and many other features that have become common at a lot of online stores. I often take information provided by those systems into account when making buying decisions. I have gotten used to those features and &#8211; as I realized on that day at the mall &#8211; I miss them in their absence.</p>
<p>Usually, I am content to <a href="http://en.wikipedia.org/wiki/Satisficing">satisfice</a>, but when I am in the store without access to those familiar features, I feel a bit deprived, as if one of my senses were shut off. I like that imagery, too: The idea of an additional sense, based on Internet data is an intriguing one.</p>
<p>The following video shows how MIT&#8217;s <a href="http://www.pranavmistry.com/projects/sixthsense/">Sixth Sense</a> may have the potential to act as an additional sense to equip you with the features that you may have gotten used to on the Internet.</p>
<p><!--copy and paste--><object width="446" height="326"><param name="movie" value="http://video.ted.com/assets/player/swf/EmbedPlayer.swf"></param><param name="allowFullScreen" value="true" /><param name="wmode" value="transparent"></param><param name="bgColor" value="#ffffff"></param><param name="flashvars" value="vu=http://video.ted.com/talks/dynamic/PattieMaes_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/PattieMaes-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=481&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=pattie_maes_demos_the_sixth_sense;year=2009;theme=what_s_next_in_tech;event=TED2009;&#038;preAdTag=tconf.ted/embed;tile=1;sz=512x288;" /><embed src="http://video.ted.com/assets/player/swf/EmbedPlayer.swf" pluginspace="http://www.macromedia.com/go/getflashplayer" type="application/x-shockwave-flash" wmode="transparent" bgColor="#ffffff" width="446" height="326" allowFullScreen="true" flashvars="vu=http://video.ted.com/talks/dynamic/PattieMaes_2009-medium.flv&#038;su=http://images.ted.com/images/ted/tedindex/embed-posters/PattieMaes-2009.embed_thumbnail.jpg&#038;vw=432&#038;vh=240&#038;ap=0&#038;ti=481&#038;introDuration=16500&#038;adDuration=4000&#038;postAdDuration=2000&#038;adKeys=talk=pattie_maes_demos_the_sixth_sense;year=2009;theme=what_s_next_in_tech;event=TED2009;"></embed></object></p>
<p>I wonder when we will routinely wear devices that integrate cameras, microphones, displays/projectors, etc. and that continuously scan our surroundings and have the ability to feed us data about it back in real time. It could be a version of Sixth Sense using discreet packaging. </p>
<p>Quick access to product reviews, as we look at a book or CD in a store sounds like a useful feature. Maybe sunglasses (and their integrated display) could provide directions as we are walking. It could also display quick stats regarding our surroundings, incl. a warning of nearby danger. The potential for applications seems endless.</p>
<p>In the meantime, I am of course still stuck with unfinished shopping.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/12/16/from-offline-shopping-to-wanting-an-internet-sense/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Peter Norvig on Innovation in Search and Artificial Intelligence</title>
		<link>http://www.notjustrandom.com/2009/12/09/peter-norvig-on-innovation-in-search-and-artificial-intelligence/</link>
		<comments>http://www.notjustrandom.com/2009/12/09/peter-norvig-on-innovation-in-search-and-artificial-intelligence/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 23:09:07 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1194</guid>
		<description><![CDATA[Peter Norvig gave this presentation at Citris on September 2. He emphasizes (with several recent examples), how the usage and availability of large data models and increased computing power improves problem solving approaches.

A lot of interesting subjects are covered in the presentation. Here are references to projects or papers that are mentioned:

Seam Carving for Content-Aware [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.norvig.com">Peter Norvig</a> gave this presentation at <a href="http://www.citris-uc.org/">Citris</a> on <a href="http://www.citris-uc.org/events/RE-Sept02">September 2</a>. He emphasizes (with several recent examples), how the usage and availability of large data models and increased computing power improves problem solving approaches.</p>
<p><object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/HT540VrCDwg&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/HT540VrCDwg&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<p>A lot of interesting subjects are covered in the presentation. Here are references to projects or papers that are mentioned:</p>
<ul>
<li><a href="http://www.seamcarving.com/">Seam Carving for Content-Aware Image Resizing</a> [<a href="http://www.shaiavidan.org/papers/imretFinal.pdf">PDF</a>] by <a href="http://www.shaiavidan.org/">Shai Avidan</a> and <a href="http://www.faculty.idc.ac.il/arik/site/index.asp">Ariel Shamir</a> presents a smarter method of image resizing. Speed of processing of modern computers greatly helped with the development of this algorithm.
</li>
<li><a href="http://graphics.cs.cmu.edu/projects/scene-completion/">Scene Completion Using Millions of Photographs</a> by <a href="http://www-2.cs.cmu.edu/~jhhays/">James Hays</a> and <a href="http://www.cs.cmu.edu/~efros/">Alexei Efros</a> is only possible and successful because of its large data sets. </li>
<li>The More Data vs Better Algorithms slide is from <a href="http://www.cs.washington.edu/homes/banko/">Michele Banko</a>&#8217;s and <a href="http://en.wikipedia.org/wiki/Eric_Brill">Eric Brill</a>&#8217;s 2001 paper <a href="http://portal.acm.org/citation.cfm?id=1072204">Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing</a>.</li>
<li><a href="http://portal.acm.org/citation.cfm?id=1282324">Canonical image selection from the web</a> by <a href="http://www.esprockets.com/academic/">Shumeet Baluja</a>, <a href="http://www.cc.gatech.edu/~yjing/">Yushi Jing</a> and <a href="http://research.google.com/pubs/author37.html">Henry Rowley</a> compares low level features of image result matches for given queries to rank the images.</li>
<li><a href="http://portal.acm.org/citation.cfm?id=1290121">Learning people annotation from the web via consistency learning</a> by <a href="http://research.google.com/pubs/author36197.html">Jay Yagnik</a> and Atiq Islam uses <a href="http://en.wikipedia.org/wiki/Eigenface">Eigenface representations</a> and large collections of images to annotate them.</li>
<li><a href="http://portal.acm.org/citation.cfm?id=1584236">Audiovisual Celebrity Recognition in Unconstrained Web Videos</a> [<a href="http://www.ece.ucsb.edu/~msargin/papers/icassp09.pdf">PDF</a>] by Mehmet Sargin, Hrishikesh Aradhye, <a href="http://sites.google.com/site/pmoreno/">Pedro Moreno</a> and <a href="http://research.google.com/pubs/author1502.html">Ming Zhao</a> uses both face and speech recognition to detect celebrities in videos.</li>
<li><a href="http://norvig.com/spell-correct.html">How to Write a Spelling Corrector</a> by <a href="http://www.norvig.com">Peter Norvig</a> provides spell checking in just over 20 lines of Python.</li>
<li>Google made an n-gram corpus <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13">publicly available</a> at <a href="http://www.ldc.upenn.edu/">Linguistic Data Consortium</a>.</li>
<li>Google&#8217;s <a href="http://www.google.org/flutrends/">Flu Trends</a> is based on search data.</li>
<li><a href="http://labs.google.com/sets">Google Sets</a> allows generating of sets of expressions similar to an initial set of expressions.</li>
</ul>
<p>Also discussed: <a href="http://en.wikipedia.org/wiki/Text_segmentation">Text segmentation</a>, <a href="http://en.wikipedia.org/wiki/Statistical_machine_translation">statistical machine translation</a>, <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>, Web bias and more.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/12/09/peter-norvig-on-innovation-in-search-and-artificial-intelligence/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Nature and nurture in software development</title>
		<link>http://www.notjustrandom.com/2009/12/02/nature-and-nurture-in-software-development/</link>
		<comments>http://www.notjustrandom.com/2009/12/02/nature-and-nurture-in-software-development/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 05:19:54 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1159</guid>
		<description><![CDATA[Over on the Seattle 2.0 blog, Anthony Stevens&#8216; Are Great Programmers Born, or Made? posed an interesting question that also generated insightful thoughts in the comments. I am very intrigued by this topic and the direction of some of the research in this area. So, here is my take on it.
Intuitively, I think, we tend [...]]]></description>
			<content:encoded><![CDATA[<p>Over on the <a href="http://www.seattle20.com/">Seattle 2.0 blog</a>, <a href="http://thepursuitofalife.com/">Anthony Stevens</a>&#8216; <a href="http://www.seattle20.com/blog/Are-Great-Programmers-Born-or-Made.aspx">Are Great Programmers Born, or Made?</a> posed an interesting question that also generated insightful thoughts in the comments. I am very intrigued by this topic and the direction of some of the research in this area. So, here is my take on it.</p>
<p>Intuitively, I think, we tend to read that question as <em>Are great programmers born xor made?</em> &#8211; understanding it such that it is either one or the other. I believe that is false: It is not one or the other; it is both, at least to some degree. However, the ratio is important.</p>
<p>Innate ability, such as a baseline degree of brain capacity is absolutely required, maybe measured as at least average <a href="http://en.wikipedia.org/wiki/Intelligence_quotient">IQ</a>. That baseline or innate ability is the smaller part of the whole.</p>
<p>I would argue that it helps to be strong at abstract and critical thinking, logic, mathematics, pattern matching/prediction, memory and recall, and so forth. However, those are largely skills. They serve as very useful prerequisites or corequisites, but they are learnable. The same is true for other skills in software development, such as deep understanding of programming language usage, the ability to follow code style guidelines, writing good unit tests, coming up with &#8220;clean&#8221; designs, etc.</p>
<p>From what I understand, deliberate practice is key to acquiring expertise in an area. I provided some notes on that in <a href="http://www.notjustrandom.com/2009/04/05/accelerated-learning-with-ai-systems/">Accelerated Learning with AI systems?</a>, though with a slightly different angle. <a href="http://www.poppendieck.com/">Mary Poppendieck</a> very much relates this principle to software development in her presentation <a href="http://www.infoq.com/presentations/poppendieck-deliberate-practice-in-software-development">Deliberate Practice in Software Development</a> that she gave at <a href="http://agile2009.agilealliance.org/">Agile 2009</a>.</p>
<p>When I first taught myself programming (<a href="http://en.wikipedia.org/wiki/Turbo_Pascal">Turbo Pascal</a>, if you are curious), it felt like it came easy to me. It was also great fun, which served (at least partly) as motivation for me to learn and experiment more, eventually turn it into a profession.</p>
<p>If you either &#8220;have it&#8221; or &#8220;don&#8217;t have it,&#8221; then there does not really seem to be a chance for greatness for the person who is missing that innate ability. On the other hand, if training/deliberate practice can play such a significant role, then there are options: The opportunity of a new challenge. I think, this should be very encouraging.</p>
<p>Practice.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/12/02/nature-and-nurture-in-software-development/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Passion and Leidenschaft</title>
		<link>http://www.notjustrandom.com/2009/11/25/passion-and-leidenschaft/</link>
		<comments>http://www.notjustrandom.com/2009/11/25/passion-and-leidenschaft/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 22:09:58 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1142</guid>
		<description><![CDATA[In No Pain, No Gain: Pleasure and Suffering in Technologies of Leidenschaft [PDF], Bernd Ploderer, Peter Wright, Steve Howard and Peter Thomas discuss how technology can support people&#8217;s passions.
Except instead of passion, the authors deliberately use the German word leidenschaft. Leidenschaft combines the words leiden (to suffer, experience pain) and schaffen (to make, create, achieve).
People [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://interactions.acm.org/content/?p=1277">No Pain, No Gain: Pleasure and Suffering in Technologies of Leidenschaft</a> [<a href="http://portal.acm.org/ft_gateway.cfm?id=1572628&#038;type=pdf&#038;coll=&#038;dl=GUIDE&#038;CFID=15151515&#038;CFTOKEN=6184618">PDF</a>], <a href="http://disweb.dis.unimelb.edu.au/student/rhd/berndp/">Bernd Ploderer</a>, <a href="http://www3.shu.ac.uk/c3ri/Details.cfm?Action=DetailsOfStaff&#038;StaffID=975">Peter Wright</a>, <a href="http://disweb.dis.unimelb.edu.au/staff/showard/">Steve Howard</a> and <a href="http://www.findanexpert.unimelb.edu.au/researcher/person98498.html">Peter Thomas</a> discuss how technology can support people&#8217;s passions.</p>
<p>Except instead of <em>passion</em>, the authors deliberately use the German word <em>leidenschaft</em>. Leidenschaft combines the words <em>leiden</em> (to suffer, experience pain) and <em>schaffen</em> (to make, create, achieve).</p>
<p>People passionate about their pursuit are willing to suffer in the process.</p>
<p>I believe that the connotations of the word <em>leidenschaft</em> have changed a bit and modern day usage is much closer to the positive aspects of a passion: To pursue an activity or subject with great interest, dedication or enthusiasm. Still, I think it is instructive to keep that implied duality in mind, not just as an introspective exercise to more fully understand oneself, but also to find concrete opportunities for growth.</p>
<p>It is worth examining software/technology and evaluating how different products strengthen the positive aspects of a passion and how they help deal with potential negatives. Then, do something about it.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/11/25/passion-and-leidenschaft/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding frequent items in a data stream</title>
		<link>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/</link>
		<comments>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 16:44:59 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.notjustrandom.com/?p=1016</guid>
		<description><![CDATA[In Finding the Frequent Items in Streams of Data [PDF], Graham Cormode and Marios Hadjieleftheriou discuss the frequent items problem and some of the algorithms that are used to solve it:

The frequent items problem is to process a stream of items and ﬁnd all those which occur more than a given fraction of the time. [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://portal.acm.org/citation.cfm?id=1562789&#038;dl=GUIDE&#038;coll=GUIDE&#038;CFID=61620557&#038;CFTOKEN=15416114">Finding the Frequent Items in Streams of Data</a> [<a href="http://dimacs.rutgers.edu/~graham/pubs/papers/freqcacm.pdf">PDF</a>], <a href="http://dimacs.rutgers.edu/~graham/">Graham Cormode</a> and <a href="http://www2.research.att.com/~marioh/">Marios Hadjieleftheriou</a> discuss the frequent items problem and some of the algorithms that are used to solve it:</p>
<blockquote><p>
The frequent items problem is to process a stream of items and ﬁnd all those which occur more than a given fraction of the time. It is one of the most heavily studied problems in mining data streams, dating back to the 1980s. Many other applications rely directly or indirectly on ﬁnding the frequent items, and implementations are in use in large scale industrial systems. In this paper, we describe the most important algorithms for this problem in a common framework. We place the different solutions in their historical context, and describe the connections between them, with the aim of clarifying some of the confusion that has surrounded their properties.
</p></blockquote>
<p>Some of the interesting bits here are that the data stream will easily contain millions (or billions) of items and the algorithm will typically only get to take one look at each item as it comes up in the stream.</p>
<p><strong>Space-Saving</strong></p>
<p>In this post I focus on the Space-Saving algorithm and provide an implementation in Python. <span id="more-1016"></span>The algorithm itself is originally described in <strong>Efficient Computation of Frequent and Top-k Elements in Data Streams</strong> [<a href="http://www.cs.ucsb.edu/~dsl/publications/2005/ICDT2005-metwally.pdf">PDF</a>] by <a href="http://www.cs.ucsb.edu/~metwally/">Ahmed Metwally</a>, <a href="http://www.cs.ucsb.edu/~agrawal/">Divyakant Agrawal</a>, and <a href="http://www.cs.ucsb.edu/~amr/">Amr El Abbadi</a>:</p>
<blockquote><p>
We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-<em>k</em> and frequent elements with tight guarantees on errors. For general data distributions, our top-<em>k</em> algorithm can return a set of <em>k&#8217;</em> elements, where <em>k&#8217;</em> &asymp; <em>k</em>, which are guaranteed to be the top-<em>k&#8217;</em> elements; and we use minimal space for calculating frequent elements. For realistic Zipfian data, our space requirement for the frequent elements problem decreases dramatically with the parameter of the distribution; and for top-<em>k</em> queries, we ensure that only the top-<em>k</em> elements, in the correct order, are reported. Our experiments show significant space reductions with no loss in accuracy.
</p></blockquote>
<p>The algorithm basically works like this: The stream is processed one item at a time. A collection of <em>k</em> distinct items and their associated counters is maintained. If a new item is encountered and fewer than <em>k</em> items are in the collection, then the item is added and its counter is set to 1. If the item is already in the collection, its counter is increased by 1. If the item is not in the collection and the collection already has a size of <em>k</em>, then the item with lowest counter is removed and the new item is added, with its counter set to one larger than the previous minimum counter.</p>
<p>Here is some pseudo code to make this clearer:</p>

<div class="wp_syntax"><div class="code"><pre class="pseudo" style="font-family:monospace;">SpaceSaving(k, stream):
collection = empty collection
for each element in stream:
    if element in collection:
    then collection[element] += 1
    else if length of collection &lt; k:
        then add element to collection, collection[element] = 1
    else:
        current_minimum_element = element with lowest count value in collection
        current_minimum = collection[current_minimum_element]
        remove current_minimum_element from collection
        collection[element] = current_minimum + 1</pre></div></div>

<p><strong>The straightforward approach</strong></p>
<p>A first, easy implementation would use a simple hashtable, such as in the following piece of code:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> space_saving_frequent_k1<span style="color: black;">&#40;</span>k, stream, debug=<span style="color: #008000;">False</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> get_smallest_key<span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Given dictionary d, returns the key associated with
                the lowest value in the dictionary.
                &quot;&quot;&quot;</span>
                min_key = <span style="color: #008000;">None</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> key <span style="color: #ff7700;font-weight:bold;">in</span> d:
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> min_key <span style="color: #ff7700;font-weight:bold;">or</span> d<span style="color: black;">&#91;</span>key<span style="color: black;">&#93;</span> <span style="color: #66cc66;">&lt;</span> d<span style="color: black;">&#91;</span>min_key<span style="color: black;">&#93;</span>:
                                min_key = key
                <span style="color: #ff7700;font-weight:bold;">return</span> min_key
&nbsp;
        counters = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> element <span style="color: #ff7700;font-weight:bold;">in</span> stream:
                <span style="color: #ff7700;font-weight:bold;">if</span> counters.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>counters<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> k:
                        counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        current_minimum_key = get_smallest_key<span style="color: black;">&#40;</span>counters<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> current_minimum_key:
                                counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = counters<span style="color: black;">&#91;</span>current_minimum_key<span style="color: black;">&#93;</span> + <span style="color: #ff4500;">1</span>
                                <span style="color: #ff7700;font-weight:bold;">del</span> counters<span style="color: black;">&#91;</span>current_minimum_key<span style="color: black;">&#93;</span>
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                counters<span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #ff4500;">1</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> counters</pre></div></div>

<p>This works for smaller data sets and particularly, when there is never a need to find that smallest element. Otherwise however, (repeatedly) retrieving the element with the minimum count remains a comparatively costly challenge.</p>
<p><strong>Stream-Summary</strong></p>
<p>When describing the Space-Saving algorithm, the authors also introduced the Stream-Summary data structure (inspired by work in <a href="http://portal.acm.org/citation.cfm?id=740658">Frequency Estimation of Internet Packet Streams with Limited Space</a> [<a href="http://erikdemaine.org/papers/NetworkStats_ESA2002/paper.pdf">PDF</a>]), which groups elements with equal values together (in buckets) and allows quick retrieval of the element with the lowest count.</p>
<p>Here is a diagram of this structure, using three buckets and a total of six elements (E1-E6).</p>
<p><img src="http://www.notjustrandom.com/wp-content/uploads/2009/11/frequent_items.jpg" alt="frequent_items" title="Stream-Summary" style="border: 1px solid black;" width="431" height="171" class="alignnone size-full wp-image-1116" /></p>
<p>Buckets are stored in a list sorted by the buckets&#8217; respective values. Each bucket maintains knowledge of associated elements. Each element in turn maintains a pointer to its bucket. The latter is implemented using a simple hashtable. If an element&#8217;s count needs to be increased, the element is removed from its current bucket and added to the neighboring bucket with value one greater than the previous one. If no such bucket exists, it is inserted in the bucket list. Empty buckets are removed.</p>
<p>The Python implementation using the Stream-Summary data structure may then look like this:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">class</span> Bucket<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, value = <span style="color: #ff4500;">1</span>, elements = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">value</span> = value
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__str__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #483d8b;">&quot;%s: %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">value</span>, <span style="color: #008000;">str</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> append<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> first_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">None</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> has_elements<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">0</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> remove<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> StreamSummary<span style="color: black;">&#40;</span><span style="color: #008000;">object</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;&quot;
        Maintains a dictionary of elements and a list of buckets. Each element
        points to a (parent) bucket.
        The bucket list is sorted based on the buckets' values. Each bucket also
        maintains a list of elments.
        This has the effect of grouping elements with equal values in buckets.
        &quot;&quot;&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span> = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span> = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__len__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">keys</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__str__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                result = <span style="color: #483d8b;">&quot;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">for</span> b <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>:
                        result += <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>b<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; &quot;</span>
                <span style="color: #ff7700;font-weight:bold;">return</span> result
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> add_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Adds an element and ensures it's assigned to the correct bucket.
                &quot;&quot;&quot;</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span> <span style="color: #ff7700;font-weight:bold;">or</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span> <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">1</span>:
                                <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">insert</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, Bucket<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                        <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">elements</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> increase_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Increasing an element's value also means assigning it to the
                correct bucket. That can result in creating a new bucket and/or
                removing an empty one.
                &quot;&quot;&quot;</span>
                current_bucket = <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span>
                bucket_index = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">index</span><span style="color: black;">&#40;</span>current_bucket<span style="color: black;">&#41;</span>
&nbsp;
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#41;</span> == bucket_index + <span style="color: #ff4500;">1</span>:
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>Bucket<span style="color: black;">&#40;</span>value = current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index + <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">value</span> <span style="color: #66cc66;">&gt;</span> current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span>:
                        <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>.<span style="color: black;">insert</span><span style="color: black;">&#40;</span>bucket_index + <span style="color: #ff4500;">1</span>,
                                            Bucket<span style="color: black;">&#40;</span>value = current_bucket.<span style="color: black;">value</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
                current_bucket.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index + <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> current_bucket.<span style="color: black;">has_elements</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                        <span style="color: #ff7700;font-weight:bold;">del</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span>bucket_index<span style="color: black;">&#93;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>element<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> has_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, element<span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span>.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> get_minimum<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">buckets</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">first_element</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">None</span>
&nbsp;
        <span style="color: #ff7700;font-weight:bold;">def</span> replace_element<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, old_element, new_element<span style="color: black;">&#41;</span>:
                <span style="color: #483d8b;">&quot;&quot;&quot;
                Replaces an existing element with an entirely new element in
                the old element's bucket.
                &quot;&quot;&quot;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span> = <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>old_element<span style="color: black;">&#93;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span>.<span style="color: black;">remove</span><span style="color: black;">&#40;</span>old_element<span style="color: black;">&#41;</span>
                <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>new_element<span style="color: black;">&#93;</span>.<span style="color: black;">append</span><span style="color: black;">&#40;</span>new_element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">del</span> <span style="color: #008000;">self</span>.<span style="color: black;">elements</span><span style="color: black;">&#91;</span>old_element<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> space_saving_frequent_k<span style="color: black;">&#40;</span>k, stream<span style="color: black;">&#41;</span>:
        summary = StreamSummary<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> element <span style="color: #ff7700;font-weight:bold;">in</span> stream:
                <span style="color: #ff7700;font-weight:bold;">if</span> summary.<span style="color: black;">has_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>:
                        summary.<span style="color: black;">increase_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">elif</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>summary<span style="color: black;">&#41;</span> <span style="color: #66cc66;">&lt;</span> k:
                        summary.<span style="color: black;">add_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                <span style="color: #ff7700;font-weight:bold;">else</span>:
                        current_minimum_key = summary.<span style="color: black;">get_minimum</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">if</span> current_minimum_key:
                                summary.<span style="color: black;">replace_element</span><span style="color: black;">&#40;</span>current_minimum_key, element<span style="color: black;">&#41;</span>
                                summary.<span style="color: black;">increase_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
                        <span style="color: #ff7700;font-weight:bold;">else</span>:
                                summary.<span style="color: black;">add_element</span><span style="color: black;">&#40;</span>element<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> summary</pre></div></div>

<p>For larger data sets, where <em>k</em> is noticeably smaller than the number of distinct elements in the set, the Stream-Summary data structure proves advantageous.</p>
<p><strong>Onward</strong></p>
<p>There is a lot of ongoing research in this problem area. This article is clearly just barely offering a small (and simplified) glimpse. Explore the research. Find out what real-world applications use some version of this as part of their problem solving approach. Applications can be found in web access log processing, search applications, mining of real-time message streams, and so forth. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.notjustrandom.com/2009/11/13/finding-frequent-items-in-a-data-stream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
