Pages

Tuesday, September 19, 2017

About Clever CATs and TeMpTing Free MT Offers

This is a guest post by Christine Bruckner, that looks into the Data and Information Security issue with free online MT services from a translators perspective. She was kind enough to do a summary translation of a longer article she recently wrote in German, referenced below.This subject is closely related to my previous post and shows that this issue is gaining in visibility and prominence.

Jost Zetzsche has also written about this MT security and data privacy issue, and some of you may have noticed our Twitter banter on the Google Translate security policy. Jost feels that this text from a FAQ suggests that use of the Translator API overrides the "Your Content in our Services" policy, even though the legal language in the Terms of Service very clearly states the following: "When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones." I guess it is a matter of interpretation, and my lack of trust perhaps, given how Google presents facts sometimes. Knowing what I do about how an API works, I see that the GT GUI (which is just a front-end software interface like any other,) connects via an API to the Translate Service, thus, I maintain that the aforementioned TOS is very much in effect whenever your data touches the Translate Service. Good luck to anyone who wants Google to confirm or deny this, because they tend to NOT RESPOND.  Don DePalma's quote from his 2014 article shown below, also seems to support the view that the TOS is in effect. It is up to you to decide for yourself, my sense is that they tell you very clearly what they have  the right to do, so be wary if you don't like this policy. 

It is good that this issue is getting attention, so that all MT use in professional settings can be more informed on the data security issue. I thank Christine for also clarifying the MyMemory privacy policy below and also alerting us to the more stringent security requirements in the EU. We should not fault these MT service providers for using your data if you use their services for free, as it can sometimes be useful to improve the MT capabilities. "There is no such thing as a free lunch" , as they say in America. Given the sheer volume of the MT use on any given day, I doubt it is possible to analyze the translated text in any way other than through some machine learning process. The risks are always higher when you have a case of incompetence like translate.com, or when the MT provider is under-resourced  and unable to implement proper safeguards. Some providers do give you an option to buy the privacy and that too is a fair and reasonable policy, as the options to go on-premise and private cloud also come at a cost.


------------------------


Most of the CAT (computer-aided translation) tools used in the professional translator’s workplace offer integrations with online machine translation (MT) solutions. Better MT quality and self-learning capabilities thanks to neural and adaptive MT technologies make the classic TM (translation memory) and innovative MT combination (also called “augmented translation”) more attractive for professional translators, too.

Much has been written and said about MT post-editing, quality, pricing, process impacts, etc. – but it seems that so far, minimal awareness has been provided about potential and actual information security leaks when professional translators plug-in free online MT into their translation environments.

For an article written in August 2017 for the 04/17 edition of MDÜ (the journal of Germany’s Association of Interpreters and Translators, BDÜ), I have taken a closer look at the most popular MT plugins that are available for common CAT tools and the information security aspects of such MT offers. A free reading sample of my German article is available under http://www.bdue-fachverlag.de/download/mdue/1870).


Recently, Slator called MDÜ’s spotlight on information security “timely”, as one day after publication of this MDÜ edition, news spread about the massive data privacy breach in Norway due to use of free online MT (see https://slator.com/technology/translate-com-exposes-highly-sensitive-information-massive-privacy-breach). In my opinion and experience, such problems have also existed in the past, but have been largely unnoticed by a wider audience. But now with several MT solutions providers heavily advertising their secure MT solutions, the marketing departments of MT solution providers and the media seem to be much more attentive to such issues. (The elevated  concern for data security may also have been exacerbated by incidents like the Equifax and Russian hacker stories that fill our news sources today.)

MT Integration within CAT tools

In my MDÜ article, I have focused on the four CAT tools that, according to a Slator research in April 2017 are the most popular ones among German technical translators: SDL Trados Studio, Across, memoQ and STAR Transit.

The plug-ins available for these CAT tools offer access to MT technology solutions in two modes:
  • batch mode / via pre-translation
  • interactive lookup and use during translation


CAT Tools
Plugins for Free Online MT
Other MT Plugins for Paid MT Solutions
SDL Trados Studio 2017
SDL Language Cloud, Google Cloud Translation, MyMemory, Microsoft Translator (via Enhanced MT Plugin), iTranslate4.eu;
additional free MT plugins available via SDL AppStore
Systran, Omniscien, KantanMT, CrossLang Gateway MT, Promt, LucyLT (now OctaveMT) and other providers (available via SDL AppStore or directly from the MT vendor)

Across 6.3

Google MT

Moses, Omniscien, Reverso, SmartMATE, LucyLT (now OctaveMT), Systran; additional connectors can be ordered from Across
MemoQ 8.1.5
MyMemory, Google MT, Bing/Microsoft Translator, iTranslate4.eu
KantanMT, Omniscien, CrossLang Gateway MT, Iconic IPTranslator MT, Let’sMT!, PangeaMT, Systran, Slate Desktop, tauyou, Tilde MT and other providers
Transit NXT SP 9
Google MT, Microsoft Translator, MyMemory, iTranslate4.eu
(all of them only in interactive mode)
Systran, SmartMATE, Omniscien, STAR-MT (also for pre-translation; only available in Transit Freelance Pro and Professional versions; needs to be activated via license number)




These modes of integrating MT with TM are by no means recent advances, such combinations were already available in the 1990s, for example in the Trados Translator's Workbench (see article by Matthias Heyn on pp. 111-123 of the MT archive).

The interaction between CAT tool and MT solution takes place on the segment (i.e. usually sentence) level. The MT suggestions are returned to the CAT tool both on the segment level, and often also on the sub-segment level: individual words or phrases from the MT system are presented interactively via predictive typing to the translator, or via MT enhanced (repaired) fuzzy matches in pre-translation and / or interactive mode.

I have selected the following online MT services for my further investigation:
  • Google Cloud Translation (integrated in all four CAT tools)
  • Microsoft Translator (available in three of the CAT tools)
  • MyMemory (available in three CAT tools)
  • SDL Language Cloud AdaptiveMT (available since SDL Trados Studio version 2017 for some language directions; free access for owners of a Studio Freelance or Professional license)
I have only taken into consideration the respective free, non-payable MT service offers: This means that the MT service is generally restricted regarding translation volume and, in some solutions, also restricted in features (e.g. only SMT and no NMT). Except for the MyMemory MT service, all of these free MT services require user registration.

Figure 1: MT plug-ins and MT pre-translation in SDL Trados Studio 2017 SR-1 via Google Cloud Translation API with NMT option

Information Security Aspects

When talking about sending/uploading data – or in the context of professional translation, often complete text – on the internet, there are two main aspects that need to be considered: data privacy and information security.

In my BDÜ article, I focused on information security aspects (as this was the topic of this MDÜ edition; for data privacy aspects see Addendum 1 below), and its 3 three basic components: confidentiality, availability, and integrity.

Availability of free MT online services is usually not crucial for professional translators: they will still be able to work if the online MT system fails. And in the free MT offering range, none of the service providers guarantee availability anyway.

Integrity is also a minor concern for professional translators as they do not expect that the machine-translated data will be "complete" and "unchanged" – neither in terms of content nor structure (formatting, tags, etc. are often lost during MT processing).

The crucial topics are related to the question of data confidentiality.

In an article published in 2014, Don DePalma of Common Sense Advisory mentioned two problem areas associated with using online MT in general:
  • a) The “wrong” people can see information in transit.
  • b) MT sites can use your data in ways you did not intend.

My research regarding aspect a) for the TeMpTing MT plugins showed:
The API of some cloud MT providers such as Google or Microsoft offer encrypted data transfer options (e. g. via SSL protocol), but this is not used in most CAT tool integrations or not available in the free version or not recognizable by a normal user. Only the CAT tools MemoQ and Across provide an option to configure the own computer or server as the referrer in the API key.

And concerning aspect b): With the help of AI methods, Internet data collectors are able to re-construct the whole text even when the free TeMpTing MT plug-ins are used only in interactive translation mode, whereby the individual segments are sent to the online MT provider at irregular intervals.

And what Don DePalma found in 2014: "While content ownership remains with the creator, free MT providers claim usage rights under their terms and conditions. For example, Google notes that it “does not claim any ownership in the content that you submit or in the translations of that content returned by the API.” However, as you follow the policy links, you learn that “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.”, still seems to be true.

I started to look for links to the Terms of Service of the free online MT offers in the CAT tool interfaces and documentation, but I have found only a few “road signs” with warnings and links.


Figure 2: MT connection screen in Across Freelance Edition

When I dug deeper and went into the “jungle” of the Terms of Service on the websites of the MT service providers, I found some quite worrisome facts: In their long and dispersed terms of use and service, providers like Google, Microsoft, and SDL try to assure the user that they really care about your data as much as you do and that your data belongs to you. But none of these three providers offer MT as a regional service, which means that the servers are not guaranteed to be located in the EU (even SDL’s servers are located in the US) – and the stricter European terms of service of these vendors and/or European data protection regulations might not apply.

For further details on the Terms of Service of Microsoft and Google, see also Kirti Vashee's most recent blog on Data Security Risks with Generic and Free Machine Translation.
 
Translated srl, the Italian service provider behind MyMemory, even boldly claims in its Service Terms and Conditions of Use:  "We collect any segment submitted and store it on a long term basis, whether it’s public or private. [..] The contributions to the archive, whether they are "Public Data" or "Private Data", are collected, processed and used by Translated to create statistics, set up new services and improve existing ones.”

Clever CAT Recommendations for Translators

So should translators and companies employing translators keep their fingers away from any MT plugin in CAT tools? This is, of course, true for confidential or otherwise classified texts, and whenever use of online MT services for processing texts is explicitly forbidden by the client or company.

If the classification status of the texts is not clear, translators are advised to apply common sense and beware of the dog behind the MT-augmented CATs by taking a closer look at the Terms of Use / Service of online MT services.

While companies and (large) LSPs can buy or build their own secure MT solution (on-premises or in a secure private cloud environment), individual translators could – and should - also benefit from and keep up-to-date with advances in MT technology by:
  • using free (unsecured) online MT for freely available test texts or translation jobs where they have the consent of the author or client
  • use offline MT solutions which can be acquired for a few hundred euros/dollars. In addition to being free from information security issues, they also provide more customization options like terminology import.

Addendum 1: Data Privacy Aspects when Using MT


The recommendation to use offline or secure cloud-based MT solutions also holds true for the data protection aspect – for online MT use in general. The authors of a recent, very informing article on “Data protection in Machine Translation under the GDPR” (GDPR = General Data Protection Regulation) confirm: Offline MT […] does not pose any special problems with regards to personal data processing.”[1]

With regards to online MT services and protection of personal data, they recommend: “The user should generally avoid online MT services where he wishes to have information translated that concerns a third party (or is not sure whether it does or not).” And in a footnote, they even conclude: “This means users may be advised to use online MT services only for translating text from their own language into another language, and not vice versa (where they cannot be sure of the content)”[2].

The General Data Protection Regulation (GDPR) will enter into force on 25 May 2018, and although it is EU legislation, “[…] the GDPR expressly applies to the processing by controllers outside the EU, as long as the controller offers services to EU citizens (art. 3(2) of the GDPR).”[3]

[1] Kamocki, Dr. Tauch (2017): Data protection in Machine Translation under the GDPR, in: Porsiel, Jörg (ed.): Machine Translation - What Language Professionals Need to Know, 2017, p. 71 ff.
[2] ibid. p. 81
[3] ibid. p. 69

 

Addendum 2: DeepL – a TeMpTing New MT Player?

At the end of August 2017, a new NMT player with headquarters in Germany and servers in Iceland (by the way, not an EU member state) has entered the scene – DeepL

Their free MT offer is currently only accessible via their website. However, Jost Zetzsche mentions in his 278th Tool Box Journal published on Sept 9, 2017, that DeepL will develop an API and this “is particularly important if DeepL is to be used by professional translators who would want to use it not on a web page but integrated into a translation environment tool”. Jost writes that according to his conversation with DeepL's CTO there seems to be some hope that “DeepL would commit itself to not using the data that is being translated for training purposes (which it does right now)” – in exchange for payment for this use of such API.

Post Script



Regarding EU aspects and Jost’s comments: Google actually has Terms of Service for Germany (and possibly for other EU countries) that do not include the passage regarding “you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works “

The German TOS are more recent than the international ones, probably they have been updated in order to prepare for the EU General Data Protection Regulation. But given that Google MT is not a regional service, it remains unclear which TOS apply.

I have explained this in my German article in more detail, but left this out in my “international” article as I have considered this too Germany/EU-specific.



=================




Christine Bruckner has more than 20 years of professional experience as a freelance translator and in CAT/term/MT administration, support, training & consulting in the German government/military, corporate and LSP area. Since 2014, she leads the Technical Services team at a German LSP.

Christine holds university degrees in translation and in computational linguistics; she has been one of the early adopters of TM technology in the early 1990s and has introduced and administrated several MT solutions and translation memory and management systems at different employers. She is a member of EAMT (www.eamt.org) and BDÜ (www.bdue.de), enjoys reading MT research and testing TM+MT combinations. She tries to get the best out of the TM+MT coupling in her occasional translations, mostly in the human rights' area.

Web site: http://www.cattmatters.de/English/

Wednesday, September 13, 2017

Data Security Risks with Generic and Free Machine Translation

Together with all the news of catastrophic hurricane activity in the US, that we are being bombarded with recently, we also are seeing stories of serious data security breaches of privileged information, even in the world of business translation. As always, the security and privacy of the data can only be as good as the data security practices and the sophistication of the technological implementations, and thus the knee-jerk response of pointing to “bad” MT technology per se, and assigning blame at MT use practices is not quite fair. It is indeed possible to make MT services available for public and/or corporate community uses safe and secure if you know what you are doing and are careful. It may seem obvious to some, but effective use of any technology requires both competence and skill, and we see too many cases of inept implementations that result in sub-optimal outcomes.
 
Generally, even professional translation work done entirely by humans requires source content to be distributed to translators and editors across the web to enable them to perform their specific tasks. The manner in which sensitive data or confidential information requiring translation can leak is twofold. First, information can be stolen "in transit" by transferring or accessing it over unsecured public Wi-fi hot spots or by storing it on unsecured cloud servers. Such risks have already been widely publicized and it is clear that weak processes and lax oversight are responsible for most of these data leakage cases.

Less considered, however, is what online machine translation providers do with the data users input. This risk was publicized by Slator last week, when employees of Norwegian state-run oil giant Statoil had “discovered text that had been typed in on [translate.com] could be found by anyone conducting a [Google] search.”

Slator reported that: “Anyone doing the same simple two-step Google search will concur. A few searches by Slator uncovered an astonishing variety of sensitive information that is freely accessible, ranging from a physician’s email exchange with a global pharmaceutical company on tax matters, late payment notices, a staff performance report of a global investment bank, and termination letters. In all instances, full names, emails, phone numbers, and other highly sensitive data were revealed.”

In this case, the injured parties apparently have little, or no recourse, as the “Terms of Use” policies of the MT supplier clearly stated that privacy is not guaranteed: “cannot and do not guarantee that any information provided to us by you will not become public under any circumstances. You should appreciate that all information submitted on the website might potentially be publicly accessible.”
Several others in the translation industry have pointed out other examples of the risks and have named other risky MT and shared data players involved with translation data.

Translation technology blogger Joseph Wojowski has written in some detail on the Google and Microsoft terms of use agreements in this post a few years ago. The information he presents is still quite current. From my vantage point these two MT services are the most secure and reliable “free” translation services available on the web today and a significant step above offerings like translate.com and many others. However, if you are really concerned about privacy these too have some risk, as the following analysis points out.

His opening statement is provocative and true at least to some extent:
“An issue that seems to have been brought up once in the industry and never addressed again are the data collection methods used by Microsoft, Google, Yahoo!, Skype, and Apple as well as the revelations of PRISM data collection from those same companies, thanks to Edward Snowden. More and more, it appears that the [translation] industry is moving closer and closer to full Machine Translation Integration and Usage, and with interesting, if alarming, findings being reported on Machine Translation’s usage when integrated into Translation Environments, the fact remains that Google Translate, Microsoft Bing Translator, and other publicly-available machine translation interfaces and APIs store every single word, phrase, segment, and sentence that is sent to them.”


The Google Terms of Service


Both Google and Microsoft very clearly state that any (or at least some) data used on their translation servers is viable for further processing and re-use, generally by machine learning technologies. (I would be surprised if any single individual does actually sit and watch this MT user data stream, even though it may be technically possible to do.) Their terms of use are considerably better than the one at translate.com who might as well have reduced it to: “User Beware: Use at your risk and we are not liable for anything that can go wrong in any way whatsoever.” Many people around the world use Google Translate daily, but very few of them are aware of the Google Terms of Service. Here is the specific legalese from the Google Translate Terms of Use Agreement that I include here, as it good to see it as specifically as possible to properly understand the potential risk.
When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones. This license continues even if you stop using our Services.(Google Terms of Service – April 14th 2014 accessed on September 11th 2017.
Some other highlights from the Google TOS which basically IMO mean, if something goes wrong, tough shit, and if you can somehow prove it is our fault, we only owe you what you paid us unless you can somehow prove it was reasonably foreseeable. The terms are even less favorable if you use the MT service for “Business Use”:
GOOGLE, AND GOOGLE’S SUPPLIERS AND DISTRIBUTORS, WILL NOT BE RESPONSIBLE FOR LOST PROFITS, REVENUES, OR DATA, FINANCIAL LOSSES OR INDIRECT, SPECIAL, CONSEQUENTIAL, EXEMPLARY, OR PUNITIVE DAMAGES.

THE TOTAL LIABILITY OF GOOGLE, AND ITS SUPPLIERS AND DISTRIBUTORS, FOR ANY CLAIMS UNDER THESE TERMS, INCLUDING FOR ANY IMPLIED WARRANTIES, IS LIMITED TO THE AMOUNT YOU PAID US TO USE THE SERVICES (OR, IF WE CHOOSE, TO SUPPLYING YOU THE SERVICES AGAIN).

IN ALL CASES, GOOGLE, AND ITS SUPPLIERS AND DISTRIBUTORS, WILL NOT BE LIABLE FOR ANY LOSS OR DAMAGE THAT IS NOT REASONABLY FORESEEABLE.

The Microsoft Terms of Service


Microsoft is a little better, and they are more forthcoming about their use of your data in general, but you can judge for yourself. Heavy users even have a way to bypass the possibility of their data being used or analyzed at all with a paid, volume subscription. Heavy use is defined as 250 million characters per month or more, which by my calculations is anywhere from 30 million to 50 million words per month. Here are some key selections from the Microsoft Translator Terms of Use statement.
"Microsoft Translator does not use the text or speech audio you submit for translation for any purpose other than to provide and improve the quality of Microsoft’s translation and speech recognition services. For instance, we do not use the text or speech audio you submit for translation to identify specific individuals or for advertising. The text we use to improve Translator is limited to a sample of not more than 10% of randomly selected, non-consecutive sentences from the text you submit, and we mask or delete numeric strings of characters and email addresses that may be present in the samples of text. The portions of text that we do not use to improve Translator are deleted within 48 hours after they are no longer required to provide your translation. If Translator is embedded within another service or product, we may group together all text samples that come from that service or product, but we do not store them with any identifiers associated with specific users. We may keep all speech audio indefinitely for product improvement purposes. We do not share the text or speech audio samples with third parties without your consent or as otherwise described, below.

We may share or disclose personal information with other Microsoft controlled subsidiaries and affiliates, and with suppliers or agents working on our behalf to assist with management and improvement to the Translator service.

In addition, we may access, disclose and preserve information when we have a good faith belief that doing so is necessary to:
  1. comply with applicable law or respond to valid legal process from competent authorities, including from law enforcement or other government agencies; (Like PRISM for the NSA)
  2. protect our customers, for example to prevent spam or attempts to defraud users of the services, or to help prevent the loss of life or serious injury of anyone;"
And for those LSPs and Enterprises who customize (train) the MSFT Translator Baseline engines with their own TM data the following terms additionally apply:
"The Microsoft Translator Hub (the “Hub”) is an optional feature that allows you to create a personalized translation system with your preferred terminology and style by submitting your own documents to train on, or using community translations. The Hub retains and uses submitted documents in full in order to provide your personalized translation system and to improve the Translator service. After you remove a document from your Hub account we may continue to use it for improving the Translator service."
Again, if you are a 50 million words per month kind of user, you can choose (opt-out) for your data not to be used for anything else.

Blogger Joseph Wojowski concludes after his review of these tacit agreements, that translators need to be wary, even though he mentions that there are real and meaningful productivity benefits, in some cases, for translators by using MT.

“In the end, I still come to the same conclusion, we need to be more cognizant of what we send through free, public, and semi-public Machine Translation engines and educate ourselves on the risks associated with their use and the safer, more secure solutions available when working with confidential or restricted-access information.”

Invisible Access via Integration


If your source data already exists on the web anyway, some may say what is the big deal anyway? Some MT use cases I am aware of that focus on translating technical support knowledge bases or eCommerce product listings may not care about these re-use terms. But the larger risk is that once translation infrastructure is connected to MT via an API, users may inadvertently start sending out less suitable documents out for MT without understanding the risks and potential data exposure. For a random unsophisticated user in a translation management system (TMS), it is quite possible to inadvertently send out and translate an upcoming earnings announcement, internal memos to staff about emerging product designs, or other restricted data to an MT server that is governed by these terms of use. In global enterprises, there is an ongoing need to translate many types of truly confidential information. 

Memsource presented research recently on MT usage from within their TMS environment across their whole user base and showed that about 40 million segments are being translated/month via Microsoft Translator and Google through their API. Given that the volume barely meets the opt-out limits, we have to presume all the data is reused and analyzed. A previous post in the ATA Chronicle by Jost Zetzsche (page 26 in December 2014 issue) showed that almost 14,000 translators, were using the same “free” MT services in Memsource. If you add Trados and other TM and TMS systems that have integrated API access to these public MT systems, I am sure the volume of MT use is significant. Thus, if you care about privacy and security, the first thing you might need to do is address these MT API integrations that are cloaked within widely used TM and TMS products. While there are many cases where it might not matter, it would be good for users to understand the risks when it does.
Human error, often inadvertent, is a leading cause of data leakage
 Common Sense Advisory's Don DePalma writes that "employees and your suppliers are unconsciously conspiring to broadcast your confidential information, trade secrets, and intellectual property (IP) to the world.” CSA also reports that in a recent survey of Enterprise localization managers, 64% of them say their fellow employees use free MT frequently or very frequently. 62% also told Common Sense Advisory that they are concerned or very concerned about “sensitive content” (e-mails, text messages, project proposals, legal contracts, merger and acquisition documents) being translated. CSA points out two risks:
  1. Information seen by hackers or geeks in transit across non-secure web connections
  2. Look at the Google TOS section described above to see how and what Google can do even when you are not using the services anymore.
The problem is compounded because, while it may be possible to enforce usage policies within the firewall, suppliers and partners may lack the sophistication to do the same. This is especially so in an ever expanding global market. Many LSPs and their translators are now using MT through the API integration interfaces mentioned above. CSA lists these issues as follows:
  • Service providers may not tell clients that they use MT.
  • Most buyers haven’t caught up yet with data leakage.
  • Subcontractors might not follow the agreed-upon rules.
  • No matter what anyone says, linguists can and will use MT when it is convenient regardless of stated policies.
Within the localization groups, there may be some ways to control this. As CSA again points out by:
  • Locking down content workflows (e.g. turn off MT access within TMS systems)
  • Finding MT providers that will comply with your data security provisions
However, the real risk is in the larger enterprise, outside the localization department, where the acronym TMS is unknown. It may be possible to some extent, to anonymize all translation requests through specialized software, or block all free translation requests, or force them through special gateways that rinse the data before it goes out beyond the firewall. While these anonymization tools might be useful, they are still primitive, and much of the risk can be mitigated by establishing a corporate controlled MT capability that provides universal access to all employees and remains behind the firewall.

In addition to the Secure Corporate MT Service described above, I think we will also see much more use of MT in e-discovery applications. Both in litigation related applications and broad corporate governance and compliance applications. Here is another opinion on the risks of using Generic MT services in the corporate litigation scenario.



Considering Secure MT Deployment Options


Many global organizations are now beginning to realize the information leakage risk presented by unrestricted use and access to free MT. While one way to address this leakage risk is to build your own MT systems, it has also become clear to many that most DIY (Do It Yourself) systems tend to be inferior in terms of output quality to these free generic systems. When users are aware of the quality advantage of Free MT, they will often double-check on these "better" systems, thus defeating the purpose of private and controlled access on DIY systems. Controlled, optimized (for the corporate subject domain) and secure MT solutions from vendors who have certified competence in doing this, seems to me is the most logical and cost effective way to proceed to solve this data leakage problem. “On Premise” systems make sense for those who have the IT staff to do the ongoing management and are available, and able, to protect and manage MT servers at both the customer and the vendor end of the equation. Many large enterprises have this kind of internal IT competence, but very few LSPs do.

It is my opinion that the vendors that allow both on premise and scalable private clouds to be setup are amongst the best options available in the market today. Some say that a private cloud option provides both professional IT management and verifiable data security and is better for those with less qualified IT staff. Most MT vendors tend to provide cloud based solutions today, and for adaptive MT this may be the only option. There are few MT vendors that can do both cloud-based and on premise, and even fewer that can do both competently. MT vendors who tend to provide non-cloud solutions infrequently are less likely to provide reliable and stable offerings. Setting up a corporate MT server that may have hundreds or thousands of users is a non-trivial affair. Like most things in life, it takes repeated practice and broad experience in multiple different user scenarios to do both on premise and cloud solutions well. Thus, one would expect that those vendors who have a large and broad installed base of on premise and private cloud installations (e.g. more than 10 varying types of customers) are preferred to those who do it as an exception and have an installed base of less than at least 10 customer sites.  There are two companies who names start with S that I think meet these requirements best in terms of broad experience and widely demonstrated technical competence. As we head into a world where Neural MT is more pervasive, I think it is likely that Private Clouds will assume more importance in future and be a preferred option to actually having your own IT staff manage GPU or TPU or FPGA arrays and servers on site. However, it is still wise to ask your MT vendor to provide complete details on data security provisions in  their cloud offering.

What seems more and more certain is that MT definitely provides great value in keeping global enterprises actively sharing and communicating, and the need for better, more secure MT solutions has a bright future. And what incidents like this latest translate.com fiasco show is, that broadly available MT services are valuable enough for any globally focused enterprise to explore more seriously and carefully, rather than leave it to naïve users to find their own way to using risky “free” solutions that undermine corporate privacy and expose high-value confidential data to anyone who knows how to use a search engine or has basic hacking skills.