有道翻译

分析url 进入有道翻译你会发现它的url是没有变化的,也就是说它的请求是通过ajax异步交互的。点击F12,很容易在XHR中找到这个交互的请求,点击查看信息,你会发现一串参数,其中有几个还是加密了的,啥salt盐啥的。先有个数。

提交的post包含的主要参数如下:

Cookie: YOUDAO_MOBILE_ACCESS_TYPE=1;
[email protected];
JSESSIONID=aaa0_uqYPNWRU73UZz2ux;
___rl__test__cookies=1602935539299;
OUTFOX_SEARCH_USER_ID_NCOO=274122564.1451922

i=hello&
from=AUTO&
to=AUTO&
smartresult=dict&
client=fanyideskweb&
salt=16029355393045&
sign=af7915506d0d292ab8f899c99ce00c81&
lts=1602935539304&
bv=e2a78ed30c66e16a857c5b6486a1d326&
doctype=json&version=2.1&
keyfrom=fanyi.web&
action=FY_BY_REALTlME

需要搞定的有:

[email protected];
JSESSIONID=aaa0_uqYPNWRU73UZz2ux;
___rl__test__cookies=1602935539299;
OUTFOX_SEARCH_USER_ID_NCOO=274122564.1451922

salt=16029355393045&
sign=af7915506d0d292ab8f899c99ce00c81&
lts=1602935539304&
bv=e2a78ed30c66e16a857c5b6486a1d326&

通过打断点的方式找到

  t.translate = function (e, t) {
          k = f('#language').val();
          var n = w.val(),
          r = v.generateSaltSign(n),
          i = n.length;
          if (M(), _.text(i), i > 5000) {
            var a = n;
            n = a.substr(0, 5000),
            r = v.generateSaltSign(n);
            var s = a.substr(5000);
            s = (s = s.trim()).substr(0, 3),
            f('#inputTargetError').text('有道翻译字数限制为5000字,“' + s + '”及其后面没有被翻译!').show(),
            _.addClass('fonts__overed')
          } 
var r = function (e) {
          var t = n.md5(navigator.appVersion),
          r = '' + (new Date).getTime(),
          i = r + parseInt(10 * Math.random(), 10);
          return {
            ts: r,
            bv: t,
            salt: i,
            sign: n.md5('fanyideskweb' + e + i + ']BjuETDhU)zqSxf-=B#7m')
          }
        };

可知:

ts r 时间戳
bv t 浏览器头进行md5加密 navigator.appVersion = 5.0 (Windows)
salt i 时间戳加一个0-10的随机数
sign n.md5('fanyideskweb' + e + i + ']BjuETDhU)zqSxf-=B#7m') "fanyideskweb" + 翻译的字符串 + salt + "]BjuETDhU)zqSxf-=B#7m"这么一串串数字然后md5加密

弄着弄着找到一个很简单的代码(离谱):

import urllib.parse
import json

def youdao_translate(text):
    url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&sessionFrom=http://fanyi.youdao.com/'
    # 有道翻译查询入口
    data = {  # 表单数据
        'i': text,
        'from': 'AUTO',
        'to': 'AUTO',
        'smartresult': 'dict',
        'client': 'fanyideskweb',
        'doctype': 'json',
        'version': '2.1',
        'keyfrom': 'fanyi.web',
        'action': 'FY_BY_CLICKBUTTION',
        'typoResult': 'false'
    }

    data = urllib.parse.urlencode(data).encode('utf-8')
    # 对POST数据进行编码

    response = urllib.request.urlopen(url, data)
    # 发出POST请求并获取HTTP响应

    html = response.read().decode('utf-8')
    # 获取网页内容,并进行解码解码

    target = json.loads(html)
    # json解析

    translateResult = target['translateResult'][0]
    ans = ""
    for i in range(len(translateResult)):
        ans = ans + translateResult[i]['tgt']
    return ans



if __name__ == '__main__':
    text = "B. Git Repository We assume in this section that the reader uses LATEX and BIBTEX. The main idea consists in having a single repository for all the scientific texts: theses, reports, articles, letters, reviews, and miscellaneous documents. Fig. 5 illustrates the recommended basic structure for a repository holding several LATEX files (or projects), along with their associated data and code. Every directory may contain specific subdirectories. For instance, Data may contain CSV, text, and other directories with specific data files. Note that there is a single BIBTEX file (with extension .bib in the Common directory). BIBTEX references can be split into several files, but these files should be common to all projects. This avoids outdated and redundant bibliographic databases."
    ans = youdao_translate(text)
    print(ans)
文章目录