Pages

Tuesday, September 19, 2017

About Clever CATs and TeMpTing Free MT Offers

This is a guest post by Christine Bruckner, that looks into the Data and Information Security issue with free online MT services from a translators perspective. She was kind enough to do a summary translation of a longer article she recently wrote in German, referenced below.This subject is closely related to my previous post and shows that this issue is gaining in visibility and prominence.

Jost Zetzsche has also written about this MT security and data privacy issue, and some of you may have noticed our Twitter banter on the Google Translate security policy. Jost feels that this text from a FAQ suggests that use of the Translator API overrides the "Your Content in our Services" policy, even though the legal language in the Terms of Service very clearly states the following: "When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones." I guess it is a matter of interpretation, and my lack of trust perhaps, given how Google presents facts sometimes. Knowing what I do about how an API works, I see that the GT GUI (which is just a front-end software interface like any other,) connects via an API to the Translate Service, thus, I maintain that the aforementioned TOS is very much in effect whenever your data touches the Translate Service. Good luck to anyone who wants Google to confirm or deny this, because they tend to NOT RESPOND.  Don DePalma's quote from his 2014 article shown below, also seems to support the view that the TOS is in effect. It is up to you to decide for yourself, my sense is that they tell you very clearly what they have  the right to do, so be wary if you don't like this policy. 

It is good that this issue is getting attention, so that all MT use in professional settings can be more informed on the data security issue. I thank Christine for also clarifying the MyMemory privacy policy below and also alerting us to the more stringent security requirements in the EU. We should not fault these MT service providers for using your data if you use their services for free, as it can sometimes be useful to improve the MT capabilities. "There is no such thing as a free lunch" , as they say in America. Given the sheer volume of the MT use on any given day, I doubt it is possible to analyze the translated text in any way other than through some machine learning process. The risks are always higher when you have a case of incompetence like translate.com, or when the MT provider is under-resourced  and unable to implement proper safeguards. Some providers do give you an option to buy the privacy and that too is a fair and reasonable policy, as the options to go on-premise and private cloud also come at a cost.


------------------------


Most of the CAT (computer-aided translation) tools used in the professional translator’s workplace offer integrations with online machine translation (MT) solutions. Better MT quality and self-learning capabilities thanks to neural and adaptive MT technologies make the classic TM (translation memory) and innovative MT combination (also called “augmented translation”) more attractive for professional translators, too.

Much has been written and said about MT post-editing, quality, pricing, process impacts, etc. – but it seems that so far, minimal awareness has been provided about potential and actual information security leaks when professional translators plug-in free online MT into their translation environments.

For an article written in August 2017 for the 04/17 edition of MDÜ (the journal of Germany’s Association of Interpreters and Translators, BDÜ), I have taken a closer look at the most popular MT plugins that are available for common CAT tools and the information security aspects of such MT offers. A free reading sample of my German article is available under http://www.bdue-fachverlag.de/download/mdue/1870).


Recently, Slator called MDÜ’s spotlight on information security “timely”, as one day after publication of this MDÜ edition, news spread about the massive data privacy breach in Norway due to use of free online MT (see https://slator.com/technology/translate-com-exposes-highly-sensitive-information-massive-privacy-breach). In my opinion and experience, such problems have also existed in the past, but have been largely unnoticed by a wider audience. But now with several MT solutions providers heavily advertising their secure MT solutions, the marketing departments of MT solution providers and the media seem to be much more attentive to such issues. (The elevated  concern for data security may also have been exacerbated by incidents like the Equifax and Russian hacker stories that fill our news sources today.)

MT Integration within CAT tools

In my MDÜ article, I have focused on the four CAT tools that, according to a Slator research in April 2017 are the most popular ones among German technical translators: SDL Trados Studio, Across, memoQ and STAR Transit.

The plug-ins available for these CAT tools offer access to MT technology solutions in two modes:
  • batch mode / via pre-translation
  • interactive lookup and use during translation


CAT Tools
Plugins for Free Online MT
Other MT Plugins for Paid MT Solutions
SDL Trados Studio 2017
SDL Language Cloud, Google Cloud Translation, MyMemory, Microsoft Translator (via Enhanced MT Plugin), iTranslate4.eu;
additional free MT plugins available via SDL AppStore
Systran, Omniscien, KantanMT, CrossLang Gateway MT, Promt, LucyLT (now OctaveMT) and other providers (available via SDL AppStore or directly from the MT vendor)

Across 6.3

Google MT

Moses, Omniscien, Reverso, SmartMATE, LucyLT (now OctaveMT), Systran; additional connectors can be ordered from Across
MemoQ 8.1.5
MyMemory, Google MT, Bing/Microsoft Translator, iTranslate4.eu
KantanMT, Omniscien, CrossLang Gateway MT, Iconic IPTranslator MT, Let’sMT!, PangeaMT, Systran, Slate Desktop, tauyou, Tilde MT and other providers
Transit NXT SP 9
Google MT, Microsoft Translator, MyMemory, iTranslate4.eu
(all of them only in interactive mode)
Systran, SmartMATE, Omniscien, STAR-MT (also for pre-translation; only available in Transit Freelance Pro and Professional versions; needs to be activated via license number)




These modes of integrating MT with TM are by no means recent advances, such combinations were already available in the 1990s, for example in the Trados Translator's Workbench (see article by Matthias Heyn on pp. 111-123 of the MT archive).

The interaction between CAT tool and MT solution takes place on the segment (i.e. usually sentence) level. The MT suggestions are returned to the CAT tool both on the segment level, and often also on the sub-segment level: individual words or phrases from the MT system are presented interactively via predictive typing to the translator, or via MT enhanced (repaired) fuzzy matches in pre-translation and / or interactive mode.

I have selected the following online MT services for my further investigation:
  • Google Cloud Translation (integrated in all four CAT tools)
  • Microsoft Translator (available in three of the CAT tools)
  • MyMemory (available in three CAT tools)
  • SDL Language Cloud AdaptiveMT (available since SDL Trados Studio version 2017 for some language directions; free access for owners of a Studio Freelance or Professional license)
I have only taken into consideration the respective free, non-payable MT service offers: This means that the MT service is generally restricted regarding translation volume and, in some solutions, also restricted in features (e.g. only SMT and no NMT). Except for the MyMemory MT service, all of these free MT services require user registration.

Figure 1: MT plug-ins and MT pre-translation in SDL Trados Studio 2017 SR-1 via Google Cloud Translation API with NMT option

Information Security Aspects

When talking about sending/uploading data – or in the context of professional translation, often complete text – on the internet, there are two main aspects that need to be considered: data privacy and information security.

In my BDÜ article, I focused on information security aspects (as this was the topic of this MDÜ edition; for data privacy aspects see Addendum 1 below), and its 3 three basic components: confidentiality, availability, and integrity.

Availability of free MT online services is usually not crucial for professional translators: they will still be able to work if the online MT system fails. And in the free MT offering range, none of the service providers guarantee availability anyway.

Integrity is also a minor concern for professional translators as they do not expect that the machine-translated data will be "complete" and "unchanged" – neither in terms of content nor structure (formatting, tags, etc. are often lost during MT processing).

The crucial topics are related to the question of data confidentiality.

In an article published in 2014, Don DePalma of Common Sense Advisory mentioned two problem areas associated with using online MT in general:
  • a) The “wrong” people can see information in transit.
  • b) MT sites can use your data in ways you did not intend.

My research regarding aspect a) for the TeMpTing MT plugins showed:
The API of some cloud MT providers such as Google or Microsoft offer encrypted data transfer options (e. g. via SSL protocol), but this is not used in most CAT tool integrations or not available in the free version or not recognizable by a normal user. Only the CAT tools MemoQ and Across provide an option to configure the own computer or server as the referrer in the API key.

And concerning aspect b): With the help of AI methods, Internet data collectors are able to re-construct the whole text even when the free TeMpTing MT plug-ins are used only in interactive translation mode, whereby the individual segments are sent to the online MT provider at irregular intervals.

And what Don DePalma found in 2014: "While content ownership remains with the creator, free MT providers claim usage rights under their terms and conditions. For example, Google notes that it “does not claim any ownership in the content that you submit or in the translations of that content returned by the API.” However, as you follow the policy links, you learn that “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.”, still seems to be true.

I started to look for links to the Terms of Service of the free online MT offers in the CAT tool interfaces and documentation, but I have found only a few “road signs” with warnings and links.


Figure 2: MT connection screen in Across Freelance Edition

When I dug deeper and went into the “jungle” of the Terms of Service on the websites of the MT service providers, I found some quite worrisome facts: In their long and dispersed terms of use and service, providers like Google, Microsoft, and SDL try to assure the user that they really care about your data as much as you do and that your data belongs to you. But none of these three providers offer MT as a regional service, which means that the servers are not guaranteed to be located in the EU (even SDL’s servers are located in the US) – and the stricter European terms of service of these vendors and/or European data protection regulations might not apply.

SDL explains in the SDL Language Cloud FAQs that "With SDL Language Cloud Machine Translation you can rest assured that your content is safe. SDL guarantees that your data is not saved or used outside of the scope or timeframe that is necessary to provide you with the service”. However, when you read the Internet Security section of SDL’s Terms and Conditions for Language Cloud Translation Services, they also warn: "Because the Internet is an inherently open and insecure means of communication, any Data or information a user transmits over the Internet may be susceptible to interception and alteration. SDL makes no guarantee regarding, and assumes no liability for, the security and integrity of any Data or information you transmit over the Internet, including any Data or information transmitted via any server designated as "secure". You should not have an expectation of privacy in any content, including accounts of files transmitted through the internet.”

And according to an SDL statement at the European Trados User Group Conference in June 2017, the SDL Servers for Language Cloud Machine Translation are located in the US, and none in Europe.


For further details on the Terms of Service of Microsoft and Google, see also Kirti Vashee's most recent blog on Data Security Risks with Generic and Free Machine Translation.
 
Translated srl, the Italian service provider behind MyMemory, even boldly claims in its Service Terms and Conditions of Use:  "We collect any segment submitted and store it on a long term basis, whether it’s public or private. [..] The contributions to the archive, whether they are "Public Data" or "Private Data", are collected, processed and used by Translated to create statistics, set up new services and improve existing ones.”


Clever CAT Recommendations for Translators

So should translators and companies employing translators keep their fingers away from any MT plugin in CAT tools? This is, of course, true for confidential or otherwise classified texts, and whenever use of online MT services for processing texts is explicitly forbidden by the client or company.

If the classification status of the texts is not clear, translators are advised to apply common sense and beware of the dog behind the MT-augmented CATs by taking a closer look at the Terms of Use / Service of online MT services.

While companies and (large) LSPs can buy or build their own secure MT solution (on-premises or in a secure private cloud environment), individual translators could – and should - also benefit from and keep up-to-date with advances in MT technology by:
  • using free (unsecured) online MT for freely available test texts or translation jobs where they have the consent of the author or client
  • use offline MT solutions which can be acquired for a few hundred euros/dollars. In addition to being free from information security issues, they also provide more customization options like terminology import.

Addendum 1: Data Privacy Aspects when Using MT


The recommendation to use offline or secure cloud-based MT solutions also holds true for the data protection aspect – for online MT use in general. The authors of a recent, very informing article on “Data protection in Machine Translation under the GDPR” (GDPR = General Data Protection Regulation) confirm: Offline MT […] does not pose any special problems with regards to personal data processing.”[1]

With regards to online MT services and protection of personal data, they recommend: “The user should generally avoid online MT services where he wishes to have information translated that concerns a third party (or is not sure whether it does or not).” And in a footnote, they even conclude: “This means users may be advised to use online MT services only for translating text from their own language into another language, and not vice versa (where they cannot be sure of the content)”[2].

The General Data Protection Regulation (GDPR) will enter into force on 25 May 2018, and although it is EU legislation, “[…] the GDPR expressly applies to the processing by controllers outside the EU, as long as the controller offers services to EU citizens (art. 3(2) of the GDPR).”[3]

[1] Kamocki, Dr. Tauch (2017): Data protection in Machine Translation under the GDPR, in: Porsiel, Jörg (ed.): Machine Translation - What Language Professionals Need to Know, 2017, p. 71 ff.
[2] ibid. p. 81
[3] ibid. p. 69

 

Addendum 2: DeepL – a TeMpTing New MT Player?

At the end of August 2017, a new NMT player with headquarters in Germany and servers in Iceland (by the way, not an EU member state) has entered the scene – DeepL

Their free MT offer is currently only accessible via their website. However, Jost Zetzsche mentions in his 278th Tool Box Journal published on Sept 9, 2017, that DeepL will develop an API and this “is particularly important if DeepL is to be used by professional translators who would want to use it not on a web page but integrated into a translation environment tool”. Jost writes that according to his conversation with DeepL's CTO there seems to be some hope that “DeepL would commit itself to not using the data that is being translated for training purposes (which it does right now)” – in exchange for payment for this use of such API.

Post Script



Regarding EU aspects and Jost’s comments: Google actually has Terms of Service for Germany (and possibly for other EU countries) that do not include the passage regarding “you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works “

The German TOS are more recent than the international ones, probably they have been updated in order to prepare for the EU General Data Protection Regulation. But given that Google MT is not a regional service, it remains unclear which TOS apply.

I have explained this in my German article in more detail, but left this out in my “international” article as I have considered this too Germany/EU-specific.



=================




Christine Bruckner has more than 20 years of professional experience as a freelance translator and in CAT/term/MT administration, support, training & consulting in the German government/military, corporate and LSP area. Since 2014, she leads the Technical Services team at a German LSP.

Christine holds university degrees in translation and in computational linguistics; she has been one of the early adopters of TM technology in the early 1990s and has introduced and administrated several MT solutions and translation memory and management systems at different employers. She is a member of EAMT (www.eamt.org) and BDÜ (www.bdue.de), enjoys reading MT research and testing TM+MT combinations. She tries to get the best out of the TM+MT coupling in her occasional translations, mostly in the human rights' area.

Web site: http://www.cattmatters.de/English/

6 comments:

  1. SDL MT products SDL ETS or SDL BeGlobal do not store our use translated content to optimize the MT engine or for other purposes. For all EU clients of SDL TMS, WS, and TP, these services are hosted in a data in Frankfurt, Germany by NTT.

    ReplyDelete
  2. I guess Google and the likes have taken advantage of the general initial blur in interpreting ToS, with policies overiding other policies, to keep their competitive edge in the field until the mass begins to wake up as they read articles like yours and they finally change their practices.

    ReplyDelete
  3. It seems that SDL's caveat does not refer to its own services but to the risks pertaining to transfer over the Internet. And I believe the statement by Google also does not refer to the Cloud Translation API services but to their use in general of the information they gather about me as a user (e.g. to provide what they believe is advertising material of interest to me -- which certainly may be a bother but hardly affects the confidentiality of my translations). So I think one needs to take some closer looks at these things before drawing conclusions.

    ReplyDelete
    Replies
    1. Yes Mats,

      I think the reservations may be overly conservative given the reality today. SDL has no need to collect data for refining advertising targeting, so when they say they do not use your data I think it is quite safe to assume that your data is discarded as soon as your jobs are done within the reasonable program use requirements.

      There are always risks with Internet data transfer that are global and relevant to any app where data is transported or entered. I think many might be surprised to find out how vulnerable email attachments are, which we can assume is the most common way that translation related content is sent around nowadays. Thus, some of the concern for privacy might be a little bit exaggerated.

      Delete
  4. About risks: What do you mean by "when the MT provider is under-resourced and unable to implement proper safeguards"? Lacking what kind of resources? What proper safeguards?

    ReplyDelete
    Replies
    1. Some MT vendors are very small companies, with very limited financial resources who are unable to fund expensive security and data protection protocols that larger vendors can easily fund. When funds are scarce there is a risk that low-cost resources (cloud technology with less advanced security) might be used. The most secure technology tends to be expensive, as multiple layers of security tools and processes are implemented. All these multiple security layerers may not be necessary for the product to function and only needed to prevent hackers and close potential holes in access.

      Delete