Monday, 23 November 2009
Twittering OCR
'things that go bump in the tunnel'
As I've beem following the LHC restart I've written a parser for the vistar status feed to send it to twitter. The basic method is:
Grab URL (see image) then do some imagemagick hackery to cut out the corner. Resize larger (helps with the OCR) and save as tiff. Run the image through OCR software, compare the output to the last run, if different then upload to twitter.
ie
curl -o $IMG $SRC
convert $IMG +repage -crop 509x205+1+533 -resize 1000x -threshold 39000 $IMG
convert -monochrome $IMG $TIFF
mv $OUT.txt $OUT.old # make a backup of old
tesseract $TIFF $OUT
# Strip out ready for Twitter
DATE=`date +%d-%m-%Y`
sed -i "s/Comments $DATE /#LHC Status /" $OUT.txt
diff -q $OUT.txt $OUT.old
if [ $? -eq 1 ] ; then
# Post to Twitter.
curl --basic --user lhcstatus:password --data status="`cat $OUT.txt`" http://twitter.com/statuses/update.json
fi
and lo: http://twitter.com/lhcstatus
Word of mouth Skye History
Many years ago we lived in the Old Manse in Waternish, Skye. If you look on the maps, you'll spot that unlike nearly all the other house...
-
Those of you who follow my twitter stream will have noticed that I managed to 'lose' my home machine today. It was online and activ...
-
Many years ago we lived in the Old Manse in Waternish, Skye. If you look on the maps, you'll spot that unlike nearly all the other house...
-
Since there's no european satellite stream of Nasa TV it means you have to watch a streamed version over here. Also I'd like to w...
1 comment:
Minor update to the above - it now does a wc -m of the new message. if its over 140 chars it trims some stuff out.
Post a Comment