<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://132.161.132.157/drupal6"  xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>Computer Science - Web crawling</title>
 <link>http://132.161.132.157/drupal6/taxonomy/term/468/0</link>
 <description></description>
 <language>en</language>
<item>
 <title>Thursday Extra: &quot;Computational linguistics: crawling the Web for non-English data&quot;</title>
 <link>http://132.161.132.157/drupal6/node/649</link>
 <description>&lt;p&gt;
On Thursday, September 19, Kim Spasaro 2014 will discuss the construction of an digital collection of written text in a specific language.  She writes:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
This summer I interned with Carnegie Mellon&#039;s Language Technologies Institute.  While there, I was part of a project working to enable machine translation for Bantu languages.  More specifically, I was responsible for building a corpus of Kinyarwanda phrases to be used for machine learning.  At this talk, I will discuss how I used the Apache Nutch web crawler to launch a large-scale web crawl in search of Kinyarwanda data.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Refreshments will be served at 4:15 p.m. in the Computer Science Commons (Noyce 3817).  The talk, &amp;ldquo;Computational linguistics: crawling the Web for non-English data,&amp;rdquo; will follow at 4:30 p.m. in Noyce 3821.  Everyone is welcome to attend!
&lt;/p&gt;
</description>
 <comments>http://132.161.132.157/drupal6/node/649#comments</comments>
 <category domain="http://132.161.132.157/drupal6/taxonomy/term/466">computational lingustics</category>
 <category domain="http://132.161.132.157/drupal6/taxonomy/term/467">corpus linguistics</category>
 <category domain="http://132.161.132.157/drupal6/taxonomy/term/42">Thursday Extras</category>
 <category domain="http://132.161.132.157/drupal6/taxonomy/term/468">Web crawling</category>
 <pubDate>Mon, 16 Sep 2013 17:55:26 +0000</pubDate>
 <dc:creator>stone</dc:creator>
 <guid isPermaLink="false">649 at http://132.161.132.157/drupal6</guid>
</item>
</channel>
</rss>
