Monday 23 November 2009
Twittering OCR
'things that go bump in the tunnel'
As I've beem following the LHC restart I've written a parser for the vistar status feed to send it to twitter. The basic method is:
Grab URL (see image) then do some imagemagick hackery to cut out the corner. Resize larger (helps with the OCR) and save as tiff. Run the image through OCR software, compare the output to the last run, if different then upload to twitter.
ie
curl -o $IMG $SRC
convert $IMG +repage -crop 509x205+1+533 -resize 1000x -threshold 39000 $IMG
convert -monochrome $IMG $TIFF
mv $OUT.txt $OUT.old # make a backup of old
tesseract $TIFF $OUT
# Strip out ready for Twitter
DATE=`date +%d-%m-%Y`
sed -i "s/Comments $DATE /#LHC Status /" $OUT.txt
diff -q $OUT.txt $OUT.old
if [ $? -eq 1 ] ; then
# Post to Twitter.
curl --basic --user lhcstatus:password --data status="`cat $OUT.txt`" http://twitter.com/statuses/update.json
fi
and lo: http://twitter.com/lhcstatus
Feeling Pumped!
Having just had a day without power, and then going round the site to check everything came back online correctly (including services such a...
-
Those of you who follow my twitter stream will have noticed that I managed to 'lose' my home machine today. It was online and activ...
-
Since there's no european satellite stream of Nasa TV it means you have to watch a streamed version over here. Also I'd like to w...
-
We have a cheapo Chinese incubator for hatching eggs. According to popular Internet postings, the calibration of the 'temperature settin...
1 comment:
Minor update to the above - it now does a wc -m of the new message. if its over 140 chars it trims some stuff out.
Post a Comment