Mohamad Dolatshah

Bio

Experienced Data Scientist with a demonstrated history of working in the higher education industry. Skilled in Statistical Data Analysis, Big Data Programming, Machine Learning, and Apache Spark.
Strong research professional with a Master's degree focused in Computer Science from Simon Fraser University.

Education

Sep 2016 Sep 2018

Master's in Computing Science

Simon Fraser University
Department of computing science
August 2016 Sep 2011

B.S. in Computer Engineering, Software major

Iran University of Science and Technology, Department of computer engineering
2011 2007

Diploma, Mathematic

Kamal HighSchool GPA: 19.3/20 (4/4)

Teaching Experience

Fall 2015

Artificial Intelligence

instructed by Dr.Behrouz Minaei
Fall2015

Information Retrieval

instructed by Dr.Hasan Naderi

Activities

November 2014

Conference Committee, PyCon 2014

PyCon is the largest annual gathering for the community using and developing the open-source Python programming language.
April 2013

Elected as the Chief Director of Computer Engineering Department Student Association (CESA)

CESA is the student committee concerned with directing the department’s extra-curriculum activities.
April 2013

Elected as a member of Central Scientific Student Association (CSSA)

CSSA is the scientific student organization of Iran University of Science and Technology aims to facilitate students scientific activities and educational process.

Research Interests

Data Cleaning, Big Data Analytics
Database and Query Processing
Parallel and Distributed Computing
Machine Learning, Pattern Recognition
Computational Geometry

Research Experience

Research Assistant in the SFU Data Science Laboratory (Sep 2016 - Sep 2018)
• Explored a variation of the typical active learning setting where a learning algorithm is able to interactively query the user to obtain the desired label for training data.
• Developed an intelligent system that would reduce the cost of manually obtaining labels with the estimation error up to 3× smaller than the existing methods resulted in publishing a paper titled Cleaning Crowdsourced Labels Using Oracles For Supervised Learning in VLDB 2019 conference.

Research Assistant in the IUST Data Mining laboratory (2014 - March 2016)
A joint work with Mr.Ali Hadian, under supervision of Dr.Behrouz Minae; Started as an extensive research on various spatial data structures. Our intention is to propose a taxonomy of already existing methods for indexing geometrical spaces along with a survey on the literature.

Work Experience

Data Scientist at Traction on Demand

Mar 2018 - Sep 2018

• Built data pipeline to extract and map different sources of information related to customers and employees resulted in 800% speedup in data collection.
• Designed machine learning models for customer lifetime value (LTV) analysis, Sales lead scoring, customer churn prediction, project assignment and etc, using Pandas, Scikit, Tensorflow, Keras in order to increase profitability and throughput.
https://tractionondemand.com/blog/are-you-ready-for-ai/
Database Manager of www.zirend.com

2013 - 2016

• Managed data warehousing for an online outsourcing marketplace which allows employers to post projects for freelancers. Created by Django(Python MVC web framework), Zirend lets anyone to post works to get done by thousands of active skilled users.
Founder & Software Manager of Elexir

Apr 2015 - Jul 2015

• Used Neo4j for handling user relations and MongoDB for document-based objects in a social network mobile app, resulted in cutting the access time in half compared to traditional RDBMS.
Database Manager Emersun Industries Co

Apr 2014 - Jul 2014

• Refined the pipeline for data transmission between two Microsoft SQL Server databases in one of the biggest manufacturer of home appliances in middle-east led to reducing the human effort.

Publication and Projects

Filter by type:

Sort by year:

Cleaning Crowdsourced Labels Using Oracles For Supervised Learning

Mohamad Dolatshah, Mathew Teoh, Jiannan Wang

Conference Paper Proceedings of the 2019 International Conference on Very Large Data Bases (VLDB 2019)

Abstract

Nowadays, crowdsourcing is being widely used to collect training data for solving classification problems. However, crowdsourced labels are often noisy, and there is a performance gap between classification with noisy labels and classification with true labels. In this paper, we consider how to apply oracle-based label cleaning to reduce the gap. We propose TARS, a label-cleaning advisor that can provide two pieces of valuable advice for data scientists when they need to train or test a model using noisy labels. Firstly, in the model testing stage, given a test dataset with noisy labels, and a classification model, TARS can use the test data to estimate how well the model will perform w.r.t. true labels. Secondly, in the model training stage, given a training dataset with noisy labels, and a classification algorithm, TARS can determine which label should be sent to an oracle to clean such that the model can be improved the most. For the first advice, we propose an effective estimation technique, and study how to compute confidence intervals to bound its estimation error. For the second advice, we propose a novel cleaning strategy along with two optimization techniques, and illustrate that it is superior to the existing cleaning strategies. We evaluate TARS on both simulated and real-world datasets. The results show that (1) TARS can use noisy test data to accurately estimate a model’s true performance for various evaluation metrics; and (2) TARS can improve the model accuracy by a larger margin than the existing cleaning strategies, for the same cleaning budget

Ball*-tree: Efficient spatial indexing for constrained nearest-neighbor search

Mohamad Dolatshah, Ali Hadian, Behrouz Minaei-Bidgoli

Conference Paper 3rd International Conference on Mathematical Sciences & Computer Engineering (ICMSCE 2016)

Abstract

Emerging location-based systems and data analysis frameworks requires efficient management of spatial data for approximate and exact search. Exact similarity search can be done using space partitioning data structures, such as Kd-tree, R*-tree, and Ball-tree. In this paper, we focus on Ball-tree, an efficient search tree that is specific for spatial queries which use euclidean distance. Each node of a Ball-tree defines a ball, i.e. a hypersphere that contains a subset of the points to be searched. In this paper, we propose Ball*-tree, an improved Ball-tree that is more efficient for spatial queries. Ball*-tree enjoys a modified space partitioning algorithm that considers the distribution of the data points in order to find an efficient splitting hyperplane. Also, we propose a new algorithm for KNN queries with restricted range using Ball*-tree, which performs better than both KNN and range search for such queries. Results show that Ball*-tree performs 39%-57% faster than the original Ball-tree algorithm.

Pattern Recognition in the 2016 Presidential Election

Skills

Programming Languages

C C++ Python CUDA R SQL Java

Data Science

Hadoop Hive SparkSQL Cassandra TensorFlow Pandas Amazon Web services

Tools

Tableau Trifacta Keras Microsoft Azure IBM Watson Jupyter WEKA MATLAB

Database Technologies

MySQL Microsoft SQL Server SQLite MongoDB OrientDB Neo4j

Web Technologies

PHP Django HTML CSS JavaScript jQuery AJAX

Operating Systems

Linux Ubuntu Server CentOS Windows

Typesettings

Latex XeTex

Hobbies

Sports
Playing Soccer, Table tennis, Chess.

Camping and mountain climbing.

Contact

Location: Burnaby, Canada Email #1: mdolatsh@sfu.ca
Email #2: dolatshah.mohammad@gmail.com
Phone: +1-604-710-1261

References

Ali Hadian

Research Assistant

Email: ali.hadian@gmail.com

webpage

Lara Gilchrist

Product Director, Traction on Demand

Email: info@tractionondemand.com

webpage

Jiannan Wang

Assistant Professor

Email: jnwang@sfu.ca

webpage

Behrouz Minaei-Bidgoli

Associate Professor, (BSc & MSc Supervisor)

Email: b_minaei@iust.ac.ir

webpage

Course	Grade (20)	Grade (4)
Computer Laboratory	19.5	4
Fundamentals of Programming	17.75	4
Advanced Programming	17	4
Discrete Mathematics	19.5	4
Data Structure and Algorithms	17	4
Design of Algorithms	20	4
Database Design	20	4
Information Retrieval	17.94	4
Artificial Intelligence	17.92	4
Software Engineering 1	14.82	3
Software Engineering 2	16.83	4
Operating Systems	19	4
Compilers	17.2	4