
MScAC Data Platform Developer Interns
Who we are:
Who you are:
We are always looking for amazing talent who can contribute to our growth and deliver results! Geotab is seeking 4 Data Platform Developer Interns from the MScAC program at UofT who will immediately contribute to the organization and the Data and Analytics team efforts. If you love technology, are well organized, have great communication skills and are keen to join an industry leader — we would love to hear from you!
What you'll do:
As a Data Platform Developer Intern, your key area of responsibility will be to develop and maintain new data infrastructure platforms that manage the data ingestion process of Geotab’s internal data lake. Your day-to-day may include creating processes, implementing logging, monitoring, and alerting services, working with data scientists to understand data processing needs and develop infrastructure solutions to support these initiatives, creating and maintaining documentation for architecture, requirements, and process flows, supporting internal Geotab teams to assist with data integration with newly developed big data platforms. You will need to work closely with members of the Data Platform, Data Science, and MyGeotab teams.
To be successful in this role you will need to be a motivated individual with strong written and verbal communication skills, and have the ability to quickly understand complex, technical concepts. In addition, the successful candidate will design robust and scalable solutions while being able to manage multiple projects and priorities to ensure timely results.
The opportunity:
- Please note that this posting is for students from UofT's MScAC program only.
- 8 month work-term beginning May 2026.
- Full-time, paid internship: Monday - Friday, 37.5hrs/week.
- Your first week at Geotab begins with 'GEO Launch' - a one-week Employee Orientation. Click here to learn more!
- Learn more about the Geotab Campus Program here.
- This posting is for an existing vacancy.
Projects:
Research and Development of an LLM-Based Agent Platform in the Geotab IoT Big Data Environment
- This internship aims to pioneer Geotab's foundational, integrated LLM agent platform. The primary objective is to research the latest advancements and challenges in multi-agent systems and autonomous LLM agent architectures, including complex task planning, long-term memory integration, and dynamic tool use. The intern will experiment with, design, and develop core platform services, ensuring they integrate robustly with Geotab's existing data sources, authentication, and infrastructure. This platform will be the central hub to support, deploy, and manage various internal agents. To validate the platform's capabilities and gain a deeper, hands-on understanding of its requirements, the intern will also work on developing proof-of-concept agents. While the specific applications will be defined during the internship, these agents will target high-impact areas for internal operational improvement. Success will be measured by the successful initial deployment of a functional and scalable agent platform, demonstrating its clear potential to automate processes and enhance operational efficiency at Geotab.
Research and Development of a Scalable AI Infrastructure and Platforms for Diverse ML Workloads at Geotab
- This internship aims to optimize Geotab's machine learning infrastructure for speed, cost-efficiency, and scalability. The primary objective is to research current industry challenges and state-of-the-art solutions for accelerating the training, testing, and inference of diverse ML models. The intern will investigate and benchmark various techniques, such as model quantization, pruning, distributed training frameworks, and high-performance inference serving (e.g., vLLM, TensorRT) on top of an open source framework named Ray. A key goal is to develop and prototype solutions that efficiently manage and orchestrate GPU and CPU resources, tailored to Geotab's specific model portfolio. The intern will work towards integrating these validated optimizations into Geotab's core machine learning platforms, creating automated, efficient, and scalable integration that empower data scientists to iterate and deploy models faster.
Scalable Geospatial Data Management for Large Distributed IoT Systems
- The aim of this project is to investigate scalable approaches for managing large geospatial datasets in distributed IoT systems. Specific objectives include:
- Geospatial Data Subdivision: To examine methods for partitioning global geospatial datasets into smaller, region-specific geogrids, enabling independent analysis and processing of spatial regions.
- Spatial Indexing Optimization: To evaluate and compare spatial indexing structures, such as R-Trees, Quadtrees, and Geohashes, with respect to their efficiency in supporting high-volume geospatial queries.
- Distributed Data Sharding: To explore strategies for distributing geospatial data across multiple nodes, with the goal of balancing load, minimizing query latency, and maintaining data consistency.
- Performance Characterization: To systematically measure the effects of geogrid subdivision, indexing structures, and distributed sharding on query performance and system scalability, identifying the trade-offs between computational efficiency and resource utilization.
- Compute Resource Consumption Reduction: targeting an approximate 30% decrease in compute-related costs.
- Synthesis and Recommendations: To derive generalizable insights and recommendations regarding the design of scalable geospatial data management systems in IoT contexts, with a focus on methodological rigor and potential applicability to large-scale distributed datasets.
Design of an LLM-Driven Cross-Language Framework for Real-Time Big Data at Geotab
- The key objective of this internship is to design and implement a Generative AI-powered framework that assists in the lifecycle of real-time data processing pipelines. The intern will leverage Large Language Models (LLMs) to build a system capable of translating high-level specifications (such as SQL queries or Python scripts) into optimized, production-ready Flink/Java code. Specifically, the intern will:
- Develop a Cross-Language Translation Layer: Build an LLM-based agent that understands data transformation logic in various languages (SQL, Python) and generates equivalent, highly performant Java/Scala code for the existing streaming infrastructure.
- Implement Context-Aware Code Generation: Utilize Retrieval-Augmented Generation (RAG) to ground the LLM in Geotab’s specific internal libraries, schema definitions, and best practices, ensuring generated pipelines are compliant and efficient.
- Create a "Copilot" for Pipeline Engineering: Develop a prototype interface or CLI where engineers can interactively iterate on pipeline designs, receiving real-time suggestions for optimization, error handling, and resource scaling.
How you'll make an impact:
- Develop and maintain new data infrastructure platforms managing the data ingestion and stream processing of Geotab’s internal data lake.
- Develop processes to enrich Geotab’s big data with telematics data at scale.
- Develop processes and implement logging, monitoring, and alerting services to ensure the health of Geotab’s big data infrastructure.
- Work with data scientists to understand data processing needs and develop infrastructure solutions to support these initiatives.
- Create and maintain documentation for architecture, requirements, and process flows.
- Support internal Geotab teams to assist with data integration with newly developed big data platforms.
What you'll bring to the role:
- Pursuing a Degree/Diploma in Computer Science, Software or Computer Engineering, or a related field.
- Experience in Data Engineering or a similar role.
- Experience using Python or Java.
- Knowledge of large language models and generative AI related frameworks such as Langchain, LlamaIndex, AutoGen is a big plus.
- Knowledge of machine learning related frameworks such as Ray, MLflow, Airflow is a big plus.
- Knowledge of application containerization, such as Docker, Kubernetes is a big plus.
- Knowledge of Linux/Unix OS and shell/command language is preferred.
- Experience with API design and implementation is preferred.
- Familiar with Big Data environments (e.g. Compute Engine, Google BigQuery).
- Knowledge of data management fundamentals and data storage principles is preferred.
- Excellent oral and written communication skills.
- Strong analytical skills with the ability to problem solve well-judged decisions.
- Highly organized and able to manage multiple tasks and projects simultaneously.
- Strong team player with the ability to engage with all levels of the organization.
- Entrepreneurial mindset and comfortable in a flat organization.
How we work:
The annual base salary for this position is the expected annual salary for this role, and may be subject to change. Geotab offers various perks and benefits and other compensation components that an individual may be eligible for. The actual base salary for this position depends on a variety of factors such as but not limited to skills, qualifications, education and overall experience, including the location the applicant lives while performing the job. This also includes equity with other team members and alignment with local market data. All offers of employment are contingent upon proof of eligibility to work and the individual's ability to pass a background check.
Hiring Range
$45,000 - $75,000 CAD
Apply for this job
*
indicates a required field