Full-Stack Blockchain Analytics with BlockStack
Blockchains contain valuable data describing transactions of digital assets. For example, Bitcoin’s raw blockchain data alone is 180 GB as of Jan 2019, and it is growing rapidly. This data holds the key to understanding different aspects of blockchain applications, such as cryptocurrency privacy and market dynamics.
Blockchain analysis systems, such as BlockSci and BitIodine, have enabled blockchain science by addressing three pain points, namely poor performance, limited capabilities, and a cumbersome programming interface. However, such systems remain focused on analyzing core blockchain data, and are not designed to systematically incorporate auxiliary data into their analysis pipelines. This limitation makes it difficult to investigate issues related to privacy and security of the blockchain ecosystem, which depend on linking users and services through blockchain transactions.
Description of Technology
We propose BlockStack, a full-stack search, tagging, and analysis system for blockchains. With BlockStack, analysts can get quick answers to queries, such as
“which Twitter user accounts made Bitcoin payments to the Silk Road darknet marketplace,”
BlockStack defines a layered system architecture, where search, tagging, and analysis have separate layers with well-defined and extendable interfaces between them.
BlockStack enables blockchain analytics and intelligence for many applications. Based on an early feedback from trade commission agencies (e.g., U.S. Federal Trade Commission) and financial regulatory authorities (e.g., Qatar Financial Centre Regulatory Authority), BlockStack is expected to be extremely helpful for risk profiling, fraud detection / customer protection, know you customer (KYC) and anti-money laundering (AML) law compliance, and drafting new investor-friendly blockchain regulations for the financial sector.
Dr. Yazan Boshmaf
Dr. Mashael Al Sabah
Dr. Saravanan Thirumuruganathan
Husam Al Jawaheri
Hasan Al Jawaheri
BlockTag: Design and Applications of a Tagging System for Blockchain Analysis
Yazan Boshmaf, Husam Al Jawaheri, and Mashael Al Sabah
Proc. of 34th International Conference on ICT Systems Security and Privacy Protection
IFIP SEC ’19, Lisbon, Portugal, Jun 2019
Deanonymizing Tor Hidden Service Users Through Bitcoin Transactions Analysis
Husam Al Jawaheri, Mashael Al Sabah, Yazan Boshmaf, and Aiman Erbad
Characterizing Bitcoin Donations to Open Source Software on GitHub
Yury Zhauniarovich, Yazan Boshmaf, Husam Al Jawaheri, and Mashael Al Sabah
Fully automatic map generation
Road networks are constantly evolving in two ways:
-Topological/geographical-wise, when new roads are building built/upgraded and existing roads are closed or modified.
-Traffic-wise, as roads and their utility continuously depend on the state of the traffic.
Description of Technology
Qarta allows building/updating/running map services independent of commercial maps, which can be utilized for taxi/ride-hailing, logistics/delivery, government, motor insurance and similar applications. This is achieved by using an array of data sources, from large-scale GPS trajectories to satellite imagery, to detect new roads and road closures in close-to-real-time and fuse the detected changes to the existing map or build the map from scratch.
Dr. Sofiane Abbar
Dr. Rade Stanojevic
Dr. Mohamed Mokbel
R. Stanojevic et al. Road Network Fusion for Incremental Map Updates. LBS 2018.
F. Bastani et al. RoadTracer: Automatic Extraction of Road Networks From Aerial Images. CVPR 2018.
R. Stanojevic et al. Robust Road Map Inference through Network Alignment of Trajectories. SDM 2018.
R. Stanojevic, S. Abbar, M. Mokbel. W-edge: weighing the edges of the road network. SIGSPATIAL/GIS 2018.
R. Stanojevic, S. Abbar, M. Mokbel. MapReuse: Recycling routing API queries. MDM 2019.
A revenue management system for cargo that combines machine learning prediction, decision-making and data cleaning
The technology addresses a problem that is unique to the air-cargo business, namely the wide discrepancy between the quantity (weight or volume) that a shipper will book and the actual received amount at departure time by the airline.
Description of technology
AI Cargo Prediction provides a complete pipeline for the air-cargo revenue management problem: given an incoming booking it
1. Identifies if there might exist a substantial difference between the booked volume and the one that might be tendered using a novel disguised missing value detection method.
2. Predicts the weight and volume that will be tendered using gradient boosting machines trained on historical data
3. Considers such prediction to make an acceptance/rejection suggestion.
Results and achievements
• The number of flights with a booking error lower than 10%
• The offloading cost is lower almost by a factor of ten and with a much lower standard deviation.
• The implemented dashboard allows to manage bookings, identify disguised missing values, get reject/accept suggestions, visualize historical and predicted data at the shipment, flight and shipping agent levels.
Dr. Stefano Giovanni Rizzo
Ji K. Lucas
Dr. Sanjay Chawla
Dr. Zoi Kaoudi
Dr. Jorge Quiané-Ruiz
Rizzo, S.G., Lucas, J., Kaoudi, Z., Quiané-Ruiz, J., & Chawla, S. (2019). AI-CARGO: A Data-Driven Air-Cargo Revenue Management System. ArXiv, abs/1905.09130.
A general solution for personalized big data analytics
The big data ecosystem is diverse – one system is unlikely to cater to every need. In addition, solving business problems increasingly requires applications to go beyond the limits of a single data processing platform, such as Hadoop or a DBMS. As a result, organizations typically perform tasks to juggle their code and data across different platforms. However, these tasks are not only tedious and costly, but also they require expert developers who know all the intricacies of the underlying platforms. Thus, there is a clear need to achieve automatic cross-platform data processing in order to enable organizations to efficiently get value from their big data assets.
Rheem have been to developed to cover this need for a general-purpose cross-platform data processing system. It decouples applications from the underlying platforms. In a glance, Rheem splits an incoming task into subtasks and assigns each to a specific platform to minimize its overall runtime. For Rheem to offer cross-platform functionality, it provides (i) a robust interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms.
Application & Level of Development
Rheem have been piloted by a leading airline, as well as preparing a proposal to make Rheem an Apache project.
• Cross-Platform Data Analytics Made Easy
Ji Lucas, Yasser Idris, Bertty Contreras-Rojas, Jorge-Arnulfo Quiane-Ruiz, Sanjay Chawla.
ICDE 2018 (Demo Paper)
• Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Panos Kalnis.
Fast and Scalable Inequality Joins.
• A Cost-based Optimizer for Gradient Descent Optimization
Zoi Kaoudi, Jorge-Arnulfo Quian ́e-Ruiz, Saravanan Thirumuruganathan, Sanjay Chawla,Divy Agrawal
SIGMOD 2017, Chicago, USA
• Interoperating a Zoo of Data Processing Platforms Using Rheem
Spark Summit 2017 (Tech talk)
• Rheem: Enabling Multi-Platform Task Execution
Divy Agrawal, Lamine Ba, Laure Berti-Equille, Sanjay Chawla, Ahmed Elmagarmid, Hossam Hammady, Yasser Idris, Zoi Kaoudi, Zuhair Khayyat, Sebastian Kruse, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Mohammed J. Zaki
SIGMOD 2016 (Demo Paper)
• Road to Freedom in Data Analytics
Divy Agrawal, Sanjay Chawla, Ahmed Elmagarmid, Zoi Kaoudi, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Mohammed J. Zaki.
EDBT 2016 (Vision Paper)
• Lightning Fast and Space Efficient Inequality Joins
Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Panos Kalnis.
• BigDansing: A System for Big Data Cleansing
Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang.
Secure and reliable cloud storage
Cloud data services offer compelling benefits but are challenged by the problem of lack of trust. Clients lose control of their valuable data but are still required to trust the cloud service provider. This creates a real hurdle that prevents using the cloud for storing sensitive corporate and government data. With SafeDrive, the client does not need to trust anyone. SafeDrive offers an effective and secure solution that guarantees cloud data confidentiality, reliability, availability and high performance.
Application & Level of Development
The technology is currently at the level of a basic prototype with minimal features. The aim is to release a minimal viable product that will be tested with individuals before moving to the next level of testing with corporate users. It is worth mentioning that testing and future product release will not require any expensive infrastructure as the technology is leveraging existing cloud storage services.
Description of Technology
SafeDrive is an efficient, reliable, and secure multi-cloud file storage system that keeps your data safe and accessible even if you do not fully trust the service provider, whose systems can be compromised or become unavailable. SafeDrive relies on distributed storage on multiple clouds and employs novel fast and optimally cost-effective techniques for tolerating multiple service outages. Data remain seamlessly accessible as long as m-out-of-n clouds are available. With SafeDrive, data and access credentials never exist in an unencrypted form outside the client’s machine. A service provider learns nothing about the cloud-shared data, its structure, or its encryption keys. All of the above advantages are offered without any noticeable impact on performance. SafeDrive continues to offer the standard cloud benefits such as sharing, scalability, and any-where accessibility.
1 / 1
Domain Maliciousness Assessment via Real-Time Graph Inference
Malicious websites that spreads malware and other unwanted or harmful software is increasing, while the technologies used to identify them are slow to follow the fast-moving world of malicious websites. QCRI have created a novel technology that provides early assessment of the maliciousness of any domain seen in the Internet as well as periodically publishes a list of malicious domains called DNSBL, which are mainly used by email exchange servers to block spams and other undesirable emails, or by Network Operation Centers (NOC) and Security operation Centers (SOC) to promptly identify and block access to potential malicious websites that distribute malware and other harmful and unwanted software.
Description of the Technology
The technology can detect or predict malicious domains much ahead of similar technologies out in the market due to its capability to identify indirect associations among domains based on the passive DNS data, that consists of most of the domain-IP resolutions seen across the Internet. For domains that do not have sufficient associations with other domains, the system uses a heuristic based approach to assess their maliciousness. Together with these techniques, the technology acts as a first line of defense against malicious activities in the Internet as many of them originate from malicious domains.
Dr. Issa M. Khalil
Dr. Ting Yu
A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference, I Khalil, B Guan, M Nabeel, T Yu. ACM Codaspy 2018
Discovering malicious domains through passive DNS data graph analysis, I Khalil, T Yu, B Guan. ACM AsiaCCS 2016
Low-cost aerogel synthesis and usage
Aerogel is a novel material that hold the potential to improve a range of products and processes, due to its superior properties related to insulation and structural integrity. Current adoption of aerogel has so far been limited due the ability to produce it cost-efficiently and in custom shapes and forms.
Description of technology
The developed technology can be used in the following applications:
Rapid Fabrication of Strong Aerogel
Aerogels are highly porous and lightest materials. Such physical properties make them unique materials for several diverse applications ranging from aerospace to the items used in daily life. This makes it a challenging area of today’s science and technology to explore new ways to design new materials and to find ways to tailor the properties of developed materials to meet today’s and future needs.
In a strive to develop extraordinary materials/processes in this field, a solution has been created for one of the major cause of their unacceptability by industries for decades as a material of choice due to their very slow and time-consuming fabrication process.The optimized synthesis parameters have been identified in such a way that the aerogel materials that were earlier fabricated in days can now be made in just a few hours’ even seconds. Furthermore, using the developed method, materials in any shape and size can be developed just in few hours in one pot which was not possible earlier. Another important development is the ability to produce mechanically strong aerogels by using common visible light or laser source. Recent experiment shows some success in developing aerogel through photo-polymerization that can convert the precursor liquid solution to highly porous and strong solid just in seconds. Keeping in view this unique attribute of our process, it is expected that such material will replace other high-density existing materials being used for other thermal insulation materials.
2. High Heat and Anti-Flammable coating
Anti-Flammable surface coating is a mixture of film-forming materials plus pigments, solvents, and other additives, which, when applied to a surface and cured or dried, yields a thin film that is functional and often decorative. Surface coatings include paints, drying oils and varnishes, clear synthetic coatings, and other products whose primary function is to protect the surface of an object from the environment. These products can also enhance the aesthetic appeal of an object by accentuating its surface features or even by concealing them from view.
The technology developed is based on Aerogel-polymer composite to develop a fire-retardant coating system. Herein, a coating system based on silica aerogel and epoxy resin as the binder has been developed. Epoxy resin is a synthetic, thermosetting polymer with an epoxide group. Epoxy resins exhibit rewarding thermal insulation properties and good mechanical strength. Hence, it is included as a useful ingredient in our formulation. To further enhance the thermal insulation behavior, varying amounts of silica aerogel was added. We successfully formulate a composite with a good film forming, high strength material is therefore the most viable route to harvest the benefit of silica aerogel as a thermal insulation material. The physical and chemical properties of the prepared coating samples were studied using thermo gravimetric analysis (TGA), surface area measurements and furrier-transform infra-red (FTIR).
3. Fabrication of 3D printed resin with tailored properties
Current 3D printing and additive manufacturing witnessed rapid growth in recent years. Commercial 3D printing materials (resins), especially for stereolithography (SLA) technology suffer many limitations. The most significant limitations are usually low mechanical strength, high density, and massive shrinkage. Therefore, these materials have limited applications. Broaden the scope of SLA 3Dprinting requires the development of new robust and lightweight material with minimum shrinkage. This results in the fabrication of 3D printed objects with endless applications.
To address these issues, a new silica aerogel resin has been formulated to produce new material lightweight material which could be printed and used in the medical field such as customized prosthetic devices, diabetic foot insole, and medical implants. This technology is based on the formulation of preformed aerogel and polymer composite system to fabricate a resin for commercial stereolithography printers. The produced material is light weight, porous and flexible Silica aerogel composites. The developed material was found to be four times light than the materials available in the market today and have superior mechanical strength.
Patent: Methods of forming aerogels
Fabrication of strong and ultra-lightweight silica-based aerogel materials with tailored properties, KM Saoud, S Saeed, MF Bertino, LS White - Journal of Porous Materials, 2017
Fabrication of native silica, cross-linked, and hybrid aerogel monoliths with customized geometries, KM Saoud, S Saeed, MF Bertino, LS White - IOP Publishing Ltd Translational Materials Research, Volume 3, Translational Materials Research 3, 2016.
Rapid fabrication of cross-linked silica aerogel by laser induced gelation, Shaukat Saeed, Rola M Al Soubaihi, Lauren S White, Massimo F Bertino, Khaled M Saoud, . - Microporous and Mesoporous Materials, 2016.
Laser induced instantaneous gelation: Aerogels for 3D printing, Shaukat Saeed, Rola Al-Soubaihi, Massimo F Bertino, Lauren S White and Khaled M Saoud - J. Mater. Chem. A, 2015.
Influence of silica derivatizer and monomer functionality and concentration on the mechanical properties of rapid synthesis cross-linked aerogels, L.S. White, M.F. Bertino, S. Saeed, K. Saoud - Microporous and Mesoporous Materials Volume 217, 15 November 2015,
Shortened aerogel fabrication times using an ethanol–water azeotrope as a gelation and drying solvent, L. S. White, M. F. Bertino, G. Kitchen, J. Young, C. Newton, R. Al-Soubaihi, S. Saeed and K. Saoud - Journal of Materials Chemistry A (2015)
Arabic Language Technologies
Qatar Computing Research Institute (QCRI) has developed a suite of language technologies, with a focus on the processing of Arabic.
Farasa – Analysis and processing of Arabic texts
Farasa (which means “insight” in Arabic), is a fast and accurate text processing toolkit for Arabic texts in MSA (modern standard Arabic) as well as various dialects. Farasa achieves state-of-the-art results in terms of accuracy and typically runs much faster than other toolkits. http://qatsdemo.cloudapp.net/farasa/
QATS – QCRI Advanced Transcription System
The QATS system comes in two variants: offline processing and online, real-time processing. It uses state-of-the-art deep learning techniques to automatically transcribe spoken Arabic into text. Transcription of English is available as additional option. https://qats.qcri.org
Tarjama – Machine Translation
The machine translation system can translate from Arabic into English and English into Arabic. Tarjama is based on high-dimensional embeddings for the vocabulary and on deep learning. https://mt.qcri.org/api
QCRI Live Speech Translation System.
Fahim Dalvi, Yifan Zhang, Sameer Khurana, Nadir Durrani, Hassan Sajjad, Ahmed Abdelali, Hamdy Mubarak, Ahmed Ali, Stephan Vogel.
European Chapter of the Association for Computational Linguistics (EACL), Valencia, 3-7 April 2017.
The demo system is available here : https://st.qcri.org/demos/livetranslation
Stephan Vogel (PI)
Hassan Sajjad (machine translation)
Ahmed Ali (speech recognition)
Fahim Imaduddin (system integration)
Social Media Monitoring
The proliferation of social media solutions poses new problems and opportunities to better understand news and opinions, in terms of automating processes and better understand the stances and trust of various news sources. Qatar Computing Research Institute (QCRI) has as a result created two complementary tools in order to achieve better insights.
Description of Technology
SAQR – Stance Detection
SAQR is a customizable social media monitoring platform that supports continuous social media monitoring and analysis. It automates the process of crawling data and applying in-house state-of-the-art algorithms. It offers a powerful query engine and generates user-friendly reports.
Tanbih – Detecting Fake News
Fake News has become a major problem. QCRI have developed (and continue to improve) the Tanbih system to address this problem. As it is very difficult and also time consuming to verify or refute individual claims, the approach of Tanbih is to characterize news sources in terms of reliability, factuality, bias, and bipartisanship. On the document level it can distinguish between claims and opinions, and is able identify propaganda elements in individual stories.
The Tanbih analytics further includes detailed information on how the different news outlets report on individual topics like Brexit, immigration, gun control, or Qatar blockade. Integrating social media analytics, it displays the left/right audience bias. On the document level it can distinguish between claims and opinions. It can also identify propaganda elements in individual stories.
These various analytical tools have been integrated into a news aggregate system, which can be used to consume the daily news, while at the same time being able to understand the characteristics of the different news channels.
Dr. Praslav Nakov
Stance detection publication
Unsupervised User Stance Detection on Twitter
Highly Effective Arabic Diacritization using Sequence to Sequence Modeling
A fast and furious segmenter for Arabic
Dialect detection publication
Exploiting convolutional neural networks for phonotactic based dialect identification
1 / 1
If you are a CEO, CMO, CTO or other proven C-level startup executive
Application is now closed