How to code a URL shortener?

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to “http://www.example.org/abcdef“. Instead of “abcdef” there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.

My approach:

I have a database table with three columns:

  1. id, integer, auto-increment
  2. long, string, the long URL the user entered
  3. short, string, the shortened URL (or just the six characters)

I would then insert the long URL into the table. Then I would select the auto-increment value for “id” and build a hash of it. This hash should then be inserted as “short“. But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don’t use these algorithms, I think. A self-built algorithm will work, too.

My idea:

For “http://www.google.de/” I get the auto-increment id 239472. Then I do the following steps:

short = ‘‘;
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.

That could be repeated until the number isn’t divisible any more. Do you think this is a good approach? Do you have a better idea?

Edit: Due to the ongoing interest in this topic, I’ve uploaded the code that I used to GitHub, with implementations for JavaPHP and JavaScript. Add your solutions if you like :)

I would continue your “convert number to string” approach. However you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.

Theoretical background

You need a Bijective Function f. This is necessary so that you can find a inverse functiong(‘abc’) = 123 for your f(123) = ‘abc’ function. This means:

  • There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
  • and for every y you must be able to find an x so that f(x) = y.

How to convert the ID to a shortened URL

  1. Think of an alphabet we want to use. In your case that’s [a-zA-Z0-9]. It contains 62 letters.
  2. Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).

    For this example I will use 12510 (125 with a base of 10).
  3. Now you have to convert 12510 to X62 (base 62).

    12510 = 2×621 + 1×620 = [2,1]

    This requires use of integer division and modulo. A pseudo-code example:

    digits = []
    
    while num > 0
      remainder = modulo(num, 62)
      digits.push(remainder)
      num = divide(num, 62)
    
    digits = digits.reverse
    

    Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:

    0  → a
    1  → b
    ...
    25 → z
    ...
    52 → 0
    61 → 9
    

    With 2 → c and 1 → b you will receive cb62 as the shortened URL.

    http://shor.ty/cb
    

How to resolve a shortened URL to the initial ID

The reverse is even easier. You just do a reverse lookup in your alphabet.

  1. e9a62 will be resolved to “4th, 61st, and 0th letter in alphabet”.

    e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810

  2. Now find your database-record with WHERE id = 19158 and do the redirect.

Some implementations (provided by commenters)

时间: 2024-07-31 13:04:42

How to code a URL shortener?的相关文章

SharePoint 2010 Url Shortener --SharePoint 2010 短URL生成器

SharePoint 2010 Url Shortener --SharePoint 2010 短URL生成器 项目描写叙述 本项目加入了这种功能.在SP站点中能够生成短URLs. 这些URLs指向列表或文档. 比如http://smallville-pc/url/nnefhmo. 本项目的目的是同意用户创建短URLs.指向文档或列表/库,这样能够轻松分享. wsp下载地址(免积分) SharePoint 2010 短URL生成器 部署方法 參照部署.收回和删除解决方式----STSADM和Po

In search of the perfect URL validation regex

To clarify, I’m looking for a decent regular expression to validate URLs that were entered as user input with. I have no interest in parsing a list of URLs from a given string of text (even though some of the regexes on this page are capable of doing

PHP实现URL长连接转短连接方法总结

短链接,通俗来说,就是将长的URL 网址,通过程序计算等方式,转换为简短的网址字符串. 这样的话其好处为:1.内容需要:2.用户友好:3.便于管理. 实现短网址(short URL)系统比较流行的算法有两种 自增序列算法. 摘要算法 自增序列算法: 自增序列算法 也叫永不重复算法 设置 id 自增,一个 10进制 id 对应一个 62进制的数值,1对1,也就不会出现重复的情况.这个利用的就是低进制转化为高进制时,字符数会减少的特性. 摘要算法: 1.将长网址 md5 生成 32 位签名串,分为

Design Tiny URL

Part 1: 前言: 最近看了一些关于短址(short URL)方面的一些博客,有些博客说到一些好的东西,但是,也不是很全,所以,这篇博客算是对其它博客的一个总结吧. 介绍: 短址,顾名思义,就是把长的 URL 转成短的 URL, 现在提供这种服务的有很多公司,我们以google家的 URL shortener 服务: http://goo.gl/ 为例. 首先我们到 http://goo.gl/,然后把本文博客的地址http://blog.csdn.net/beiyeqingteng 输入进

去除magento多店铺URL地址中的“___from_store=”

magento 的多店铺功能,大多数情况下是根据语言来进行选择的,当添加了多店铺之后,一般情况下我们会选择开启添加store code到url地址中. Magento 自带的这种功能算是比较不错了,但是 magento的多店铺功能,大多数情况下是根据语言来进行选择的,当添加了多店铺之后,一般情况下我们会选择开启添加store code到url地址中.Magento自带的这种功能算是比较不错了,但是有个问题非常头疼.在切换不同店铺的时候,URL地址中会包含“___from_store=”的字符串.

微信公众号开发之网页授权登录及code been used 解决!

首先微信公众号开发网页授权登录使用环境: 开发工具:eclipse:服务器:tomcat8,开发语言:JAVA. 我写的网页授权登录时用开发者模式自定义view类型按钮点击跳转链接的. 微信网页授权登录首先以官方微信开发文档为准,大体共分为4步: 先说第一步获取code: code说明:code作为换取access_token的票据,每次用户授权带上的code将不一样,code只能使用一次,5扽这未被使用自动过期. 微信公众开发文档给的有获取code的链接,建议直接复制来用,然后替换其中相应的参

JavaScript code modules

https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules Non-standardThis feature is non-standard and is not on a standards track. Do not use it on production sites facing the Web: it will not work for every user. There may also be la

Go语言(golang)开源项目大全

转http://www.open-open.com/lib/view/open1396063913278.html内容目录Astronomy构建工具缓存云计算命令行选项解析器命令行工具压缩配置文件解析器控制台用户界面加密数据处理数据结构数据库和存储开发工具分布式/网格计算文档编辑器Encodings and Character SetsGamesGISGo ImplementationsGraphics and AudioGUIs and Widget ToolkitsHardwareLangu

如何用Google APIs和Google的应用系统进行集成(2)----发现Google APIs的RESTFul服务

上篇文章,我提到了,Google APIs暴露了86种不同种类和版本的API.我们可以通过在浏览器里面输入https://www.googleapis.com/discovery/v1/apis这个URL地址,其将会把所有Google API支持的不同种类和版本的API全部列出来.其具体信息如下: 序号 API 标题 名字 版本 RestFul请求的URL RestFul请求的URL 1 Ad Exchange Buyer API adexchangebuyer v1 https://www.g