Leetcode: UTF-8 Validation

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For 1-byte character, the first bit is a 0, followed by its unicode code.
For n-bytes character, the first n-bits are all one‘s, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
This is how the UTF-8 encoding would work:

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.

Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2:

data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.

Return false.
The first 3 bits are all one‘s and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that‘s correct.
But the second continuation byte does not start with 10, so it is invalid.

这道题题干给出了判断 one single UTF-8 char的方法,然后给一个UTF-8 char sequence,判断是不是正确sequence. (读题读了很久)

这道题关键是要学到用 & 取出一个bit sequence当中几位的方法

二进制数表示法:在前面加 0b, 八进制加0o, 十六进制加0x

 1 public class Solution {
 2     public boolean validUtf8(int[] data) {
 3         if (data==null || data.length==0) return false;
 4         for (int i=0; i<data.length; i++) {
 5             if (data[i] > 255) return false;
 6             int moreChecks = 0; //moreCheck is the number of more bytes that need to check for this char
 7             if ((data[i] & 0b10000000) == 0) moreChecks = 0;
 8             else if ((data[i] & 0b11100000) == 0b11000000) moreChecks = 1;
 9             else if ((data[i] & 0b11110000) == 0b11100000) moreChecks = 2;
10             else if ((data[i] & 0b11111000) == 0b11110000) moreChecks = 3;
11             else return false;
12             for (int j=1; j<=moreChecks; j++) {
13                 if (i+j >= data.length) return false;
14                 if ((data[i+j] & 0b11000000) != 0b10000000) return false;
15             }
16             i = i + moreChecks;
17         }
18         return true;
19     }
20 }
时间: 2024-10-19 15:16:23

Leetcode: UTF-8 Validation的相关文章

LeetCode 393. UTF-8 Validation

原题链接在这里:https://leetcode.com/problems/utf-8-validation/ 题目: A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code. For n-bytes character, the firs

[LeetCode] UTF-8 Validation 编码验证

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 by

LeetCode 66. Plus One(加1)

Given a non-negative integer represented as a non-empty array of digits, plus one to the integer. You may assume the integer do not contain any leading zero, except the number 0 itself. The digits are stored such that the most significant digit is at

LeetCode 53. Maximum Subarray(最大的子数组)

Find the contiguous subarray within an array (containing at least one number) which has the largest sum. For example, given the array [-2,1,-3,4,-1,2,1,-5,4],the contiguous subarray [4,-1,2,1] has the largest sum = 6. click to show more practice. Mor

[LeetCode] 349 Intersection of Two Arrays &amp; 350 Intersection of Two Arrays II

这两道题都是求两个数组之间的重复元素,因此把它们放在一起. 原题地址: 349 Intersection of Two Arrays :https://leetcode.com/problems/intersection-of-two-arrays/description/ 350 Intersection of Two Arrays II:https://leetcode.com/problems/intersection-of-two-arrays-ii/description/ 题目&解法

LeetCode 442. Find All Duplicates in an Array (在数组中找到所有的重复项)

Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others appear once. Find all the elements that appear twice in this array. Could you do it without extra space and in O(n) runtime? Example: Input: [4,3,2,7,

LeetCode OJ - Sum Root to Leaf Numbers

这道题也很简单,只要把二叉树按照宽度优先的策略遍历一遍,就可以解决问题,采用递归方法越是简单. 下面是AC代码: 1 /** 2 * Sum Root to Leaf Numbers 3 * 采用递归的方法,宽度遍历 4 */ 5 int result=0; 6 public int sumNumbers(TreeNode root){ 7 8 bFSearch(root,0); 9 return result; 10 } 11 private void bFSearch(TreeNode ro

LeetCode OJ - Longest Consecutive Sequence

这道题中要求时间复杂度为O(n),首先我们可以知道的是,如果先对数组排序再计算其最长连续序列的时间复杂度是O(nlogn),所以不能用排序的方法.我一开始想是不是应该用动态规划来解,发现其并不符合动态规划的特征.最后采用类似于LRU_Cache中出现的数据结构(集快速查询和顺序遍历两大优点于一身)来解决问题.具体来说其数据结构是HashMap<Integer,LNode>,key是数组中的元素,所有连续的元素可以通过LNode的next指针相连起来. 总体思路是,顺序遍历输入的数组元素,对每个

LeetCode OJ - Surrounded Regions

我觉得这道题和传统的用动规或者贪心等算法的题目不同.按照题目的意思,就是将被'X'围绕的'O'区域找出来,然后覆盖成'X'. 那问题就变成两个子问题: 1. 找到'O'区域,可能有多个区域,每个区域'O'都是相连的: 2. 判断'O'区域是否是被'X'包围. 我采用树的宽度遍历的方法,找到每一个'O'区域,并为每个区域设置一个value值,为0或者1,1表示是被'X'包围,0则表示不是.是否被'X'包围就是看'O'区域的边界是否是在2D数组的边界上. 下面是具体的AC代码: class Boar