当前位置: 首页 > 图灵资讯 > 技术篇> 多语言混显的问题

多语言混显的问题

来源:图灵教育
时间:2024-03-07 09:41:17

本周末,我研究了Servletttet,因为我一直不相信Java会有不能混排显示多种语言的BUG。、Jsp的多国语言显示的问题,即Servlet的多字符集。因为我对字符集的概念不是很清楚,所以我写的东西可能不准确。我理解Java中的字符集:在运行过程中,每个字符串对象都存储在UNICODE内部代码中(我认为所有语言都有相应的代码,因为计算机内部字符串总是用内部代码表示,但通用计算机语言中的字符串代码与平台相关,而Java使用与平台无关的UNICODE)。

  Java从byte流中读取字符串时,将与平台相关的byte转换为与平台无关的Unicode字符串。在输出过程中,Java将Unicode字符串转换为与平台相关的byte流。如果一个Unicode字符不存在于一个平台上,它将输出一个'?'。例如,在中文Windows中,Java在内存中读取“GB2312”编码的文件(可以是任何流动)来构建字符串对象,并将GB2312编码的文本转换为Unicode编码的字符串,如果输出这个字符串,Unicode字符串将转换为GB2312的byte流或数组:“中文测试”----->"\u4e2d\u6587\u6d4b\u8bd5"----->“中文测试”。

如下例程:byte[] bytes = new byte[]{(byte)0xd6, (byte)0xd0, (byte)0xce, (byte)0xc4, (byte)0xb2, (byte)0xe2, (byte)0xca, (byte)0xd4};“中文测试”///GBK编码java.io.ByteArrayInputStream bin = new java.io.ByteArrayInputStream(bytes);java.io.BufferedReader reader = new java.io.BufferedReader(new java.io. InputStreamReader (bin,"GBK"));String msg = reader.readLine();System.out.println(msg)

  这个程序可以在包含“中文测试”四个字的系统(如中文系统)中正确打印出来。Unicode编码在msg字符串中包含正确的“中文测试”:“\u4e2d\u6587\u6d4b\u8bd5,打印时转换为操作系统的默认字符集,是否能正确显示依赖操作系统的字符集,只有在支持相应字符集的系统中,才能正确输出我们的信息,否则就会得到垃圾。

  让我们来看看Servlet/Jsp中的多语言问题。我们的目标是通过Form向Server发送信息,Server将信息存储在数据库中,客户仍然可以看到他们在检索时发送的正确信息。事实上,我们应该确保最终保存在Server中的SQL语句中,包括客户端发送文本的正确Unicode编码;DBC与数据库通信中使用的编码方法可以包含客户端发送的文本信息。事实上,JDBC最好直接使用UNICODE/UTF8与数据库通信!这样就可以保证信息不会丢失;Server向客户端发送信息时,也可以是Unicode/UTF8。

  如果Form的Enctype属性没有指定,Form将根据当前页面的编码字符集urlencode提交输入内容,服务器端获得urlencoding字符串。编码后获得的urlencoding字符串与页面编码有关,如提交“中文测试”的gb2312编码页面,获得“%D6%D0%”CE%C4%B2%E2%CA%D4",每个"%“后跟是16进制的字符串;UTF8编码时得到的是“%E4%B8%”AD%E6%96%87%E6%B5%8B%AF%95”,因为GB2312编码中的一个汉字是16位,而UTF8中的一个汉字是24位。中国、日本和韩国的ie4以上浏览器都支持UTF8编码,这必须包含这三种语言,所以如果我们让HTML页面使用UTF8编码,我们至少可以支持这三种语言。

  然而,如果我们的html/Jsp页面使用UTF8代码,因为应用程序服务器可能不知道这种情况,因为如果浏览器发送的信息不包含charset信息,最多server知道阅读acept-language要求投标,我们知道浏览器使用的代码不能仅仅通过这个投标来获得,因此应用程序服务器无法正确分析提交的内容,为什么?因为Java中的所有字符串都是Unicode16位编码,HttpServletRequest.request(String)其功能是将客户端提交的Urlencode编码信息转换为Unicode字符串,部分Server只能认为客户端编码与Server平台相同,简单使用URLDecoder.decode(String)该方法直接解码。如果客户端代码与server相同,则可以获得正确的字符串。否则,如果提交的字符串中包含当地字符,则将导致垃圾信息。

  在我提出的解决方案中,UTF8编码已经被指定,所以我们可以自己定制decode方法来避免这个问题:

public static String decode(String s,String encoding) throws Exception {StringBuffer sb = new StringBuffer();for(int i=0; i<s.length(); i++) {char c = s.charAt(i);switch (c) {case '+':sb.append(' ');break;case '%':try {sb.append((char)Integer.parseInt(s.substring(i+1,i+3),16));}catch (NumberFormatException e) {throw new IllegalArgumentException();}i += 2;break;default:sb.append(c);break;}}// Undo conversion to external encodingString result = sb.toString();byte[] inputBytes = result.getBytes("8859_1");return new String(inputBytes,encoding);}

  如果将其指定为UTF8,则可以指定encoding,以满足我们的需要。例如,用它来分析:“%E4%B8%AD%E6%96%87%E6%B5%8%E8%AF%正确的汉字“中文测试”Unicode字符串可以通过95“获得。现在的问题是,我们必须得到客户端提交的Urlencode字符串。对于method为getform提交的信息,可以使用Httpservletrequest.getQueryString()方法读取,post方法form提交的信息只能从servletinputstream中读取。事实上,在第一次调用标准getparameter方法后,form提交的信息被读取,而servletinputstream不能重复。因此,在首次使用getParameter方法之前,我们应该读取和分析form提交的信息。

  我就是这么做的。建立一个覆盖service方法的service基类,在调用父类service方法之前阅读和分析form提交的内容。请参见以下源代码:

package com.hto.servlet;import javax.servlet.http.HttpServletRequest;import java.util.*;/*** Insert the type's description here.* Creation date: (2001-2-4 15:43:46)* @author: 钱卫春*/public class UTF8ParameterReader {Hashtable pairs = new Hashtable();/*** UTF8ParameterReader constructor comment.*/public UTF8ParameterReader(HttpServletRequest request) throws java.io.IOException{super();parse(request.getQueryString());parse(request.getReader().readLine());}/*** UTF8ParameterReader constructor comment.*/public UTF8ParameterReader(HttpServletRequest request,String encoding) throws java.io.IOException{super();parse(request.getQueryString(),encoding);parse(request.getReader().readLine(),encoding);}public static String decode(String s) throws Exception {StringBuffer sb = new StringBuffer();for(int i=0; i<s.length(); i++) {char c = s.charAt(i);switch (c) {case '+':sb.append(' ');break;case '%':try {sb.append((char)Integer.parseInt(s.substring(i+1,i+3),16));}catch (NumberFormatException e) {throw new IllegalArgumentException();}i += 2;break;default:sb.append(c);break;}}// Undo conversion to external encodingString result = sb.toString();byte[] inputBytes = result.getBytes("8859_1");return new String(inputBytes,UTF8);}public static String decode(String s,String encoding) throws Exception {StringBuffer sb = new StringBuffer();for(int i=0; i<s.length(); i++) {char c = s.charAt(i);switch (c) {case '+':sb.append(' ');break;case '%':try {sb.append((char)Integer.parseInt(s.substring(i+1,i+3),16));}catch (NumberFormatException e) {throw new IllegalArgumentException();}i += 2;break;default:sb.append(c);break;}}// Undo conversion to external encodingString result = sb.toString();byte[] inputBytes = result.getBytes("8859_1");return new String(inputBytes,encoding);}/*** Insert the method's description here.* Creation date: (2001-2-4 17:30:59)* @return java.lang.String* @param name java.lang.String*/public String getParameter(String name) {if (pairs == null || !pairs.containsKey(name)) return null;return (String)(((ArrayList) pairs.get(name)).get(0));}/*** Insert the method's description here.* Creation date: (2001-2-4 17:28:17)* @return java.util.Enumeration*/public Enumeration getParameterNames() {if (pairs == null) return null;return pairs.keys();}/*** Insert the method's description here.* Creation date: (2001-2-4 17:33:40)* @return java.lang.String[]* @param name java.lang.String*/public String[] getParameterValues(String name) {if (pairs == null || !pairs.containsKey(name)) return null;ArrayList al = (ArrayList) pairs.get(name);String[] values = new String[al.size()];for(int i=0;i<values.length;i++)values[i] = (String) al.get(i);return values;}/*** Insert the method's description here.* Creation date: (2001-2-4 20:34:37)* @param urlenc java.lang.String*/private void parse(String urlenc) throws java.io.IOException{if (urlenc == null) return;StringTokenizer tok = new StringTokenizer(urlenc,"&");try{while (tok.hasMoreTokens()){String aPair = tok.nextToken();int pos = aPair.indexOf("=");String name = null;String value = null;if(pos != -1){name = decode(aPair.substring(0,pos));value = decode(aPair.substring(pos+1));}else{name = aPair;value = "";}if(pairs.containsKey(name)){ArrayList values = (ArrayList)pairs.get(name);values.add(value);}else{ArrayList values = new ArrayList();values.add(value);pairs.put(name,values);}}}catch(Exception e){throw new java.io.IOException(e.getMessage());}}/*** Insert the method's description here.* Creation date: (2001-2-4 20:34:37)* @param urlenc java.lang.String*/private void parse(String urlenc,String encoding) throws java.io.IOException{if (urlenc == null) return;StringTokenizer tok = new StringTokenizer(urlenc,"&");try{while (tok.hasMoreTokens()){String aPair = tok.nextToken();int pos = aPair.indexOf("=");String name = null;String value = null;if(pos != -1){name = decode(aPair.substring(0,pos),encoding);value = decode(aPair.substring(pos+1),encoding);}else{name = aPair;value = "";}if(pairs.containsKey(name)){ArrayList values = (ArrayList)pairs.get(name);values.add(value);}else{ArrayList values = new ArrayList();values.add(value);pairs.put(name,values);}}}catch(Exception e){throw new java.io.IOException(e.getMessage());}}}这类功能是读取和保存form提交的信息,并实现常用的getparameter方法。package com.hto.servlet;import java.io.*;import javax.servlet.*;import javax.servlet.http.*;/*** Insert the type's description here.* Creation date: (2001-2-5 8:28:20)* @author: 钱卫春*/public class UtfBaseServlet extends HttpServlet {public static final String PARAMS_ATTR_NAME = "PARAMS_ATTR_NAME";/*** Process incoming HTTP GET requests** @param request Object that encapsulates the request to the servlet* @param response Object that encapsulates the response from the servlet*/public void doGet(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {performTask(request, response);}/*** Process incoming HTTP POST requests** @param request Object that encapsulates the request to the servlet* @param response Object that encapsulates the response from the servlet*/public void doPost(HttpServletRequest request, HttpServletResponse response)throws ServletException, IOException {performTask(request, response);}/*** Insert the method's description here.* Creation date: (2001-2-5 8:52:43)* @return int* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue int*/public static java.sql.Date getDateParameter(HttpServletRequest request, String name, boolean required, java.sql.Date defValue) throws ServletException{

String value = getParameter(request,name,required,String.valueOf(defValue));

return java.sql.Date.valueOf(value);}/*** Insert the method's description here.* Creation date: (2001-2-5 8:52:43)* @return int* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue int*/public static double getDoubleParameter(HttpServletRequest request, String name, boolean required, double defValue) throws ServletException{String value = getParameter(request,name,required,String.valueOf(defValue));

return Double.parseDouble(value);}/*** Insert the method's description here.* Creation date: (2001-2-5 8:52:43)* @return int* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue int*/public static float getFloatParameter(HttpServletRequest request, String name, boolean required, float defValue) throws ServletException{String value = getParameter(request,name,required,String.valueOf(defValue));

return Float.parseFloat(value);}/*** Insert the method's description here.* Creation date: (2001-2-5 8:52:43)* @return int* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue int*/public static int getIntParameter(HttpServletRequest request, String name, boolean required, int defValue) throws ServletException{String value = getParameter(request,name,required,String.valueOf(defValue));

return Integer.parseInt(value);}/*** Insert the method's description here.* Creation date: (2001-2-5 8:43:36)* @return java.lang.String* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue java.lang.String*/public static String getParameter(HttpServletRequest request, String name, boolean required, String defValue) throws ServletException{if(request.getAttribute(UtfBaseServlet.PARAMS_ATTR_NAME) != null) {UTF8ParameterReader params = (UTF8ParameterReader)request.getAttribute(UtfBaseServlet.PARAMS_ATTR_NAME);if (params.getParameter(name) != null) return params.getParameter(name);if (required) throw new ServletException("The Parameter "+name+" Required but not provided!");else return defValue;}else{if (request.getParameter(name) != null) return request.getParameter(name);if (required) throw new ServletException("The Parameter "+name+" Required but not provided!");else return defValue;}}/*** Returns the servlet info string.*/public String getServletInfo() {return super.getServletInfo();}/*** Insert the method's description here.* Creation date: (2001-2-5 8:52:43)* @return int* @param request javax.servlet.http.HttpServletRequest* @param name java.lang.String* @param required boolean* @param defValue int*/public static java.sql.Timestamp getTimestampParameter(HttpServletRequest request, String name, boolean required, java.sql.Timestamp defValue) throws ServletException{String value = getParameter(request,name,required,String.valueOf(defValue));

return java.sql.Timestamp.valueOf(value);}/*** Initializes the servlet.*/public void init() {// insert code to initialize the servlet here}/*** Process incoming requests for information** @param request Object that encapsulates the request to the servlet* @param response Object that encapsulates the response from the servlet*/public void performTask(HttpServletRequest request, HttpServletResponse response) {try{// Insert user code from here.}catch(Throwable theException){// uncomment the following line when unexpected exceptions// are occuring to aid in debugging the problem.file://theException.printStackTrace();}}/*** Insert the method's description here.* Creation date: (2001-2-5 8:31:54)* @param request javax.servlet.ServletRequest* @param response javax.servlet.ServletResponse* @exception javax.servlet.ServletException The exception description.* @exception java.io.IOException The exception description.*/public void service(ServletRequest request, ServletResponse response) throwsjavax.servlet.ServletException, java.io.IOException {String content = request.getContentType();if(content == null || content != null && content.toLowerCase().startsWith("application/x-www-form-urlencoded"))request.setAttribute(PARAMS_ATTR_NAME,new UTF8ParameterReader((HttpServletRequest)request));super.service(request,response);}}

  这是servlet基类,它涵盖了父类的service方法。在调用父类service之前,创建了UTF8Parameterrereader对象,其中保存了form中提交的信息。然后将此对象作为Atribute保存到request对象中。然后还是调用父类的service方法。

  对于继承这一类的Servlet,需要注意的是,“标准”getparameter也不能读取post的数据,因为在此之前,这一类已经从servletinputstream中读取了数据。因此,应使用该类中提供的getparameter方法。

  剩下的就是输出问题。我们需要将输出信息转换为UTF8的二进制流输出。只要我们在设置Content-Type时将Charset指定为UTF8,然后使用PrintWriter输出,这些转换就会自动进行。Servlet设置如下:

  response.setContentType("text/html;charset=UTF8");

在Jsp中这样设置:

  <%@ page contentType="text/html;charset=UTF8"%>

  这样可以保证输出是UTF8流,客户端能否显示取决于客户端。

  multipart/form-我还提供了一个类别来处理dataform提交的内容。charset可以在此类结构中指定页面,默认为UTF-8,仅限于不发布源代码的长度。如果你感兴趣,你可以使用mail to:[email protected]和我探讨。