Quick and dirty script for converting from NXT/AMI format to SRI's
.ref format (similar to .stm)
This script was written originally in late 2005 when I thought the
AMI meetings were going to part of the training set. It was abandoned
when I realized the meetings were not yet public. I've cleaned
it up only slightly, so beware things like hard-coded paths and
SRI/ICSI specific assumptions. Also, I have not included a dictionary,
since I originally used the SRI dictionary, and I'm not sure if I
can redistribute it. The dictionary is only used to resolve some
hyphenation rules, and it's just a simple list of words. It should be
easy to generate for your own system.
nxt2ref.pl
Perl script to do the conversion. See the file itself for comments and
usage. It requires XML::Parser, available from CPAN. Sample usage:
nxt2ref.pl -m ami.mapping -d meetings-2005.vocab ES2007b.C > ES2007b.C.ref
ami.map
A simple word mapping file, consisting mostly of British to English
conversions and some SRI/ICSI specific conventions.