LeetCode 393. UTF-8 Validation

原题链接在这里:https://leetcode.com/problems/utf-8-validation/

题目:

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

  1. For 1-byte character, the first bit is a 0, followed by its unicode code.
  2. For n-bytes character, the first n-bits are all one‘s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.

Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

Example 2:

data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.

Return false.
The first 3 bits are all one‘s and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that‘s correct.
But the second continuation byte does not start with 10, so it is invalid.

题解:

We need to first check if we are currently in the preCount of previous bytes.

If not, there are 2 cases:

First, the current byte is 1-byte, skip.

Second, the current byte is leading byte of multiple bytes. Calculate how many bytes following and assign it to preCount.

If current byte is within preCount, then need check if it is starting with 10.

Note: when checking leading 1, we need to use (num & (1 << 7)) != 0, but not == 1, because it is not 1, it is 10000000.

Time Complexity: O(n). n = data.length.

Space: O(1).

AC Java:

 1 class Solution {
 2     public boolean validUtf8(int[] data) {
 3         if(data == null || data.length == 0){
 4             return true;
 5         }
 6
 7         int preCount = 0;
 8         int mask1 = 1 << 7;
 9         int mask2 = 1 << 6;
10         for(int num : data){
11             if(preCount == 0){
12                 // 1 - byte
13                 if((num & mask1) == 0){
14                     continue;
15                 }
16
17                 int count = 0;
18                 int mask = 1 << 7;
19                 while((num & mask) != 0 && count <= 5){
20                     count++;
21                     mask = mask >> 1;
22                 }
23
24                 if(count == 1 || count > 4){
25                     return false;
26                 }
27
28                 preCount = count - 1;
29             }else{
30                 if(!((num & mask1) != 0 && (num & mask2) == 0)){
31                     return false;
32                 }
33
34                 preCount--;
35             }
36         }
37
38         return preCount == 0;
39     }
40 }

原文地址:https://www.cnblogs.com/Dylan-Java-NYC/p/12154530.html

时间: 2024-10-12 00:36:55

LeetCode 393. UTF-8 Validation的相关文章

【LeetCode】位运算 bit manipulation(共32题)

p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica } [78]Subsets [136]Single Number [137]Single Number II [169]Majority Element [187]Repeated DNA Sequences [190]Reverse Bits [191]Number of 1 Bits [201]Bitwise AND of Numbers Range [231]Pow

[LeetCode] UTF-8 Validation 编码验证

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 by

Leetcode: UTF-8 Validation

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 by

393. UTF-8 Validation

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 by

LeetCode 66. Plus One(加1)

Given a non-negative integer represented as a non-empty array of digits, plus one to the integer. You may assume the integer do not contain any leading zero, except the number 0 itself. The digits are stored such that the most significant digit is at

LeetCode 53. Maximum Subarray(最大的子数组)

Find the contiguous subarray within an array (containing at least one number) which has the largest sum. For example, given the array [-2,1,-3,4,-1,2,1,-5,4],the contiguous subarray [4,-1,2,1] has the largest sum = 6. click to show more practice. Mor

(LeetCode)Pascal&#39;s Triangle --- 杨辉三角

Given numRows, generate the first numRows of Pascal's triangle. For example, given numRows = 5, Return [ [1], [1,1], [1,2,1], [1,3,3,1], [1,4,6,4,1] ] Subscribe to see which companies asked this question 解题分析: 题目的这个帕斯卡(1623----1662)是在1654年发现这一规律的,比杨辉

jquery插件讲解:轮播(SlidesJs)+验证(Validation)

转自:http://www.cnblogs.com/chenrf/p/5654093.html#undefined SlidesJs(轮播支持触屏)--官网(http://slidesjs.com) 1.简介 SlidesJs是基于Jquery(1.7.1+)的响应幻灯片插件.支持键盘,触摸,css3转换. 2.代码 <!doctype html> <head> <style> /* Prevents slides from flashing */ #slides {

jQuery 表单验证插件——Validation(基础)

这个插件不错,是用jquery写的.能进行表单验证.我喜欢它的原因是因为 1.他有自带的验证规则 2.你可以自己写验证规则 3.可以通过ajax与后台交互,与后台数据比较.最后返回结果!我在表单中要验证是不是存在这个账号的时候需要与后台进行交互,使用ajax是最好不过的! 使用的方法很简单:我简单说一下“ 1.写jsp页面,js文件 2.引入的这个类库,一个是juquery的类库文件,一个是validation的文件----->http://pan.baidu.com/s/1c04nN5u 3.