Doing this project was an exercise in discovery. I’ve long said that, ‘if you can imagine it, it’s already on the web.’ And indeed it is. While checking various stock photo agencies I thought, there must be a website with pics licensed under Creative Commons, and indeed there was. Pixabay was by far the best. Though sometimes I had to search a bit, I was able to find 99% of the images I needed here. Only 2, the body outline at the bottom of the stairs and the earring shot, needed further Photoshopping. I used a couple other photo repositories but often came up short: Pexels & Unsplash
The outline of a body in masking tape is added to a royalty-free shot of a staircase in Hearst Castle. Click to zoom.
One of the hardest things to come up with when writing is names for people and places and companies and streets and on and on…. You can get stuck for hours, days even, on choosing just the right name for a character. Not any more. I used the Name Generator to generate names for Pets, People and Places as needed.
When faking evidence like an autopsy report or police report, plain vanilla Google Images are fine for coughing up models to follow but getting your lawyers, forensic scientists and coroners to sign off requires a Fake Signature Generator which I found at Fontmeme.
Making fake newspapers was also possible but online generators or MS Word templates tended to be too limited or clunky so I just ended up doing it myself in LibreOffice Writer, part of an excellent Open Source office suite.
However, for generating pics of fake newspapers I took a visit to Photofunia where I could fake all kinds of evidence. In fact, the kinds of output available here dictated the story line of Whodunnit to some degree. The evidence pic of Valentines Day chocolate was enough to suggest a context and the romantic interest part of the story.
My go-to audio editor forever has been the Open Source Audacity. Beyond fine-tuning some of the machine-generated utterances, Audacity is a great tool also for inserting Sound FX like doors creaking open or footsteps receding. Those you can find at Freesound.
By far, the biggest challenge with implementing Whodunnit was finding machine-generated voices that didn’t sound [too much] like machine-generated voices. I spent 100s of hours searching, testing, and trying to figure out some of the more advanced technologies in a field that is very much still emerging. A lot of the biggest companies like Amazon [Polly or Alexa] or IBM [Watson] or Google or Microsoft [SAPI5] have highly arcane implementations directed at software engineers. Often they lack the capability of switching voices on the fly even though the <voice> tag is at the core of the Simulated Speech Mark-up Language [SSML].
The best Text-To-Speech editor I could find was a downloadable freeware program called Balabolka. Unfortunately, accessing Microsoft’s SAPI5 voices is a nightmare. The voices are in place, by default, but arcane settings on Win 10 make accessing all the SAPI5 voices virtually impossible. The voices are pretty good but not great so I finally gave up on that line of inquiry.
I eventually found a solution in the MAC world. I found a pretty good editor called Ghostreader. It doesn’t support the entire suite of SSML tags but changing voices on the fly couldn’t have been easier. Their native voices sucked but I found out that I could use third party voices from Proloquo4Text, a maker of assistive technologies. The demo voices were plentiful and accurate. Going this route would have required a significant outlay of cash for the editor and 7 or 8 additional voices and it would have limited me to working in the MacOS environment, something I’m not overly fond of anyway. So to make an even longer odyssey, shorter, I eventually went back to the Windows options and chose NaturalReader Commercial. Their editor is decent for small jobs though what I’d really appreciate is the ability to Find & Replace voice tags rather than clicking an endless succession of drop-down boxes which seems to be the industry norm. Their voices are quite good out of the box and utterances are editable right down to the phonemic level. The results, not too shabby…
So, Text2Speech may finally be ready for the big-time. Everyone in the biz is going for the customer service market or the book reader market [a complete waste of time IMHO] or learning deficit support, small markets, way-over supplied. What nobody seems to have figured out is that there’s a massive market in educational applications, and I don’t just mean apps. There is some significant potential for classroom use that no one seems to be aware of yet. It’s a field that’s wide open for any company that is willing to take the blinkers off and restructure their tools.
For example, a teacher of literature in high school could activate student interest in Shakespeare by tasking the class to create an audio play of a key scene from Hamlet or some such work. In order to convey the meaning and nuance, students will need to understand it first and foremost. SSML provides the tools to tweak individual utterances with emphasis or length or volume, exactly what’s needed for a dramatic performance. Chances are, many students who would normally be snoozing at the back of the class would suddenly be motivated to work on Hamlet after class or at home, coming to appreciate the meaning as a by-product. It doesn’t matter how we get them there, it just matters that we do.
There is perhaps no drier subject than linguistics. Text2Speech could be used by students to discover – not read about – the significance of things like pitch, intonation, linking, pauses, etc. by using SSML to finely sculpt utterances.
ESL students could likewise be tasked to explore and discover the features that make up accurate, native-like speech, by comparing their own utterances with a natural, native model and then engaging with T2S bots to tweak the features of robotic output to approach the model, learning what features they may personally need to tweak in order to achieve more native-like speech in the process.
| Home |