INTERVIEW: YARCDATA ON BIG DATA AND SEMANTIC WEB

11 August 2013 by Ian Jacobs | Posted in: InterviewsSemantic Web

I recently spoke with Shoaib Mufti, YarcData Vice President of R&D, about Big Data and Semantic Web technology. YarcData is Cray subsidiary, accustomed to crunching lots of data. YarcData recently joined W3C.

Ian: Why did YarcData join W3C?

SM: YarcData has products, including the Urika data appliance, for manipulating graphs. Instead of reinventing the wheel, we found benefits in existing standards. We also decided to contribute to the significant semantic effort at W3C.

Ian: How do you see Big Data relating to the Semantic Web?

SM: We think that for Big Data you need standards, and that linked data standards fit the bill. What’s exciting is the opportunity to find in data something of value that was non-obvious. For this you need tools to join terms, reason and inference, and query. These are the fundamentals for getting value from big data. There are open standards for these capabilities, so we have moved away from proprietary solutions.

Ian: Let’s start with a story.

SM: One of our customers is in the financial sector. They are very interested in understanding how changes in one part of their portfolio affect the rest of their portfolio. For example, if a company goes bankrupt, what is the downstream effect on other assets in a mutual fund? What happens to the companies that depended on the now bankrupt company? How should the financial institution reorganize its portfolio based on these events?

SM: The first challenge the financial institution faces is data integration. They want to integrate public and private data, in large amounts. Because the market moves quickly and the financial stakes are enormous, they need fast integration. They cannot afford to wait months for the results of an analysis.

SM: The second challenge relates to query. They have multiple questions to ask over 50 billion triples, and they need to reduce the cost of running those queries. This particular company identified some “forbidden queries” they would lead to a huge performance hit on their servers.

Ian: How did Semantic Web technology help?

SM: It makes them less dependent on database optimization. RDF and SPARQL are schema-less, which makes it much faster and easier to ask ad-hoc questions without the performance hit. The flexibility to do ad-hoc queries efficiently has given this company a big competitive advantage.

SM: The story doesn’t end there. Although their initial interest concerned portfolio optimization, the company found another use for the technology. There are legal penalties and public relations nightmares around insider trading. Detecting insider trading is challenging and can happen in many ways (such as someone providing a friend with insider information). This financial institution realized they could use Semantic Web technology to detect insider trading effectively and improve compliance.

Ian: This is the second time in a recent months people have told me about Semantic Web technology and compliance; see my interview with Paul Groth and Luc Moreau.

SM: That example of serendipity is not unique. In 2012 we held a contest — the YarcData Graph Analytics Challenge — for people to solve some Big Data graph problems. The winners, from The Institute for Systems Biology (ISB), studied drug repurposing.

Ian: What is drug repurposing?

SM: I’ll explain with an example: Viagra. Viagra was originally developed for managing heart problems. The trials revealed an interesting side effect. And so the drug was repurposed.

Ian: Yay, science!

SM: Drug companies realize that for a number of “failed” projects, there are great opportunities to repurpose the drugs. Our contest winners studied some data sets and found that a particular HIV drug could be repurposed to treat breast cancer. By querying diverse data sets from research literature and clinical trials, they were able to find a common pathway. The whole project took about six weeks, which is astonishing compared to the usual time it takes to develop a drug. What’s more, FDA approval time for repurposed drugs is much shorter than for new drugs.

Ian: How have these technologies benefited YarcData?

SM: First, in cost savings. We can use software available in the ecosystem instead of writing proprietary utilities. For instance, we used to convert relational data to graphs with a proprietary tool; now we can use something like d2r. There are many such tools, related to inference and other capabilities.

SM: The second benefit is value. RDF is simpler than our former custom format, and this has made data integration both simpler and faster for us. One of our engineers ran a data integration project using other techniques; the integration required several months. With RDF it took a week. And, we can more easily reuse existing data sets.

SM: I think many organizations face similar data integration challenges. In the enterprise, people use a bunch of heterogeneous systems: email, plain text, unstructured data, and structured data. Managing all of the data is a big challenge for any organization.

Ian: Do any of your customers choose to migrate to RDF after you’ve worked with them?

SM: Absolutely. People use our appliance on premises since much of their data is sensitive for them. We work with them to convert their data to RDF, which we feed to our appliance. They see firsthand our efficient conversion process and how fast we can do integration. One large clinic with a lot of unstructured data in data warehouses was so impressed they issued a directive that any new data they create will be available in RDF. Once people understand the simplicity, they say “Let’s make this the way we do things going forward.” Only a few of our customers are doing this now, but we see it increasing.

Ian: Which industries do you see adopting RDF?

SM: I think the farthest along is life sciences, then financial, and then US government.

Ian: Shoaib, thank you very much for sharing those stories!

Post navigation

← News from the Automotive and Web Platform Business Group | Blog home | Interview: Alcatel-Lucent on WebRTC with Anne Lee →

时间: 2024-11-10 15:00:12

INTERVIEW: YARCDATA ON BIG DATA AND SEMANTIC WEB的相关文章

CO7216 Semantic Web

CO7216 Semantic WebCoursework 2SPARQL and OWL(Individual work)Important Dates:Handed out: 20-Feb-2020Deadline: 10-March-2020 at 17:00 GMTPlease ensure that you submit your work on time.• This coursework counts as 10% of your final module mark (25% of

Time Aware and Data Sparsity Tolerant Web Service Recommendation Based on Improved Collaborative Filtering

论文原文:https://pan.baidu.com/s/1D1xjySQD25qaQXKMdJp7eA 1 introduction面向服务计算(Service-Oriented Computing,SOC)在进几年广泛使用,其中web服务就是其中的基石,因为面向服务计算需要web服务封装应用功能以及提供标准接口. 一个web应有意味着需要满足用户的一系列任务,而每一个任务就可以对应到一个web服务上边,所以,在实际应用中就产生了一个不得不面对的问题,为了需要在一堆功能相当的web服务当中找到

Spring boot+Mysql+Spring data JPA一个Web的Demo

1.概述 因为要用spring boot,最近刚刚学习.这是一个web项目的配合mysq+Hibernate+tomcat的简单示例demo,很容易在此基础上扩展成自己的项目. 2.创建初始spring demo 作者用的IDE是IDEA,新建一个工程,选择Spring Initalizer.下一步的话由于作者需要部署到服务器上面,选择了war.不需要的话可以选择jar.选择spring web+spring Data JPA+Mysql Driver. 然后完成即可,首次运行的话配置依赖需要一

Smart Health 2015 Call for Papers

Advancing Informatics for healthcare and healthcare applications has become an international research priority. There is increased effort to transform reactive care to proactive and preventive care, clinic-centric to patient-centered practice, traini

International Conference for Smart Health 2015 Call for Papers

Advancing Informatics for healthcare and healthcare applications has become an international research priority. There is increased effort to transform reactive care to proactive and preventive care, clinic-centric to patient-centered practice, traini

6 Useful Databases to Dig for Data (and 100 more)

6 Useful Databases to Dig for Data (and 100 more) You already know that data is the bread and butter of reports and presentations. Data makes your presentation solid. It backs up the ideas you are selling. It gives people reasons to listen to you. Ho

Notes of Linked Data concept and application

Motivation [反正债多了不愁,再开个方向.] Data plays a core role in most bussiness systems, data storage and retrieval tasks seem unchanllagable to reguar application developers, even managers, while how to connect or link data to gain more interesting patterns(mo

Varnish基础配置实现动静分离web站点

由于一个web站点的程序的访问具有局部性特征:时间上的局部性:一个数据被访问过之后,可能很快会被再次访问到:空间局部性:一个数据被访问时,其周边的数据也有可能被访问到;varnish可将这部分数据缓存下来.缓存的数据存在被访问较频繁的数据可以称其为热区:缓存同样存在局部性:时效性:如果缓存空间耗尽:则采用LRU,最近最少使用算法:将过期的缓存清理掉 varnish的基本工作原理: Varnish通过类似于HTPP反向代理的方式将可以用来缓存的数据缓存下来直接响应给客户端的缓存数据,如果缓存中没有

HTML5与移动端web学习笔记

HTML5 提供了很多新的功能,主要有: 新的 HTML 元素,例如 section, nav, header, footer, article 等 用于绘画的 Canvas 元素 用于多媒体播放的 video 和 audio 元素 用于定位的 Geolocation API 本地存储以及离线应用 Web Workers.Web WebSocket API 移动前端开发可分为: 手机网页开发.这部分跟web前端开发差别不大,使用的技术都是html+css+js.区别为手机浏览器是webkit的天