Join forces with data scientists, engineers, and analysts. As a pre-cursor to the upcoming Data + AI summit, we invite you to create unique and novel applications, use cases, and/or techniques to showcase open-source LLM models (e.g., OpenAssistant, MPT, Dolly, etc.) and/or Spark Connect. While we can't get enough of Dolly, we also suggest you check out LangChain, PandasAI, and vector databases.

Three finalist teams and five honorable mention awards will be selected and announced during the Data+ AI Summit 2023 keynote, and you will have the opportunity to win a cash prize to be split among your team. 

Get started

We encourage you to create your own novel application or use case using the following resources:

LLM Learning Resources:

Hugging Face Open Assistant:
Mosaic ML MPT-7B:
     ▹ mosaicml/mpt-7b · Hugging Face
• Databricks Lab Dolly GitHub repo:
     ▹ Hugging Face > Databricks
     ▹ Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
     ▹ Hello Dolly: Democratizing the magic of ChatGPT with open models

Spark Connect Learning Resources:

• Reference documentation
     ▹ Spark Connect Overview
     ▹ Spark Connect Quick Start
• Blogs
     ▹ Introducing Spark Connect - The Power of Apache Spark, Everywhere
     ▹ Spark Connect Available in Apache Spark 3.4

Potential Project Ideas:

• Work with parsing audio transcriptions such as OpenAI's Whisper
• Create AI Agents with compute and search capabilities (LangChain is a great place to work on these kinds of tools)
• Build a QA bot with vector databases leveraging similarity searching
• Tune a DLite or Dolly model using Databricks.

The judges will assess your projects' performance with a standard battery of diverse prompts while accounting for quantitative metrics like latency.

To get started, you can use the following example notebooks as your guide:

Build your Chat Bot with Dolly
AI Functions: query LLM with DBSQL

We encourage you to use open-source models and datasets such as (but not limited to):
Dolly 15K dataset
Red Pajama dataset
OpenAssistant Conversations dataset (OASST1)
LongForm dataset
Alpaca Libra dataset
Eleuther.AI datasets
Fun beginner-friendly datasets on Kaggle
Hugging Face instruct_me dataset (highly rated, general purpose open-source, Apache v2)

Additional Guidance

If you are building LLM applications, we’d recommend tools like LangChain, Pandas AI, and vector databases


Step 1. BUILD YOUR TEAM.  Your team can have up to 4 participants and must be registered here on Devpost as participating in the hackathon.

Step 2. Create a new application using an open-source large language model (LLM) like OpenAssistant, MPT, Dolly, or others, or create a new one using Spark Connect. Your application must have been created after May 18th, and all work must be completed during the hackathon timeframe. Create a compelling project that showcases open LLM models in new and useful way. It is preferred, but not required, to showcase these use cases within a Databricks or Jupyter notebook. 

Step 3. Record a video screencast (<= 280 seconds) demonstrating the application, providing commentary answering the following questions:

  • Why did you choose this topic?
  • Which open-source LLM and any additional open-source datasets did you use? Explain why.
  • Or what is your Spark Connect application and any open-source datasets did you use?  Explain why.
  • How does your project provide relevant and insightful information to the end user?

Step 4. Complete your submission on Devpost before 5 PM PT on June 16th. This includes a project description with the following:

  • Your hosted video
  • A URL to your application open-source source code. Your GitHub repository should not have any contributions before May 18th and include an open-source license.
  • Invite all of your teammates and make sure they accept it.

Hackathon Sponsors


$20,000 in prizes

Grand Prize Winning Team

• $10,000 USD
• Project Highlight at Data + AI Summit

2nd Place Winning Team

• $5,000 USD
• Project Highlight at Data + AI Summit

3rd Place Winning Team

• $2,500 USD
• Project Highlight at Data + AI Summit

Honorable Mentions (5)

• $500 USD

Devpost Achievements

Submitting to this hackathon could earn you:


Mike Conover

Mike Conover
Staff Software Engineer, Databricks

Stefania Leone

Stefania Leone
Sr. Manager, Product Management, Databricks

Martin Grund

Martin Grund
Senior Staff Software Engineer, Databricks

Conor B. Murphy

Conor B. Murphy
Sr. Data Science Manager, Databricks

Benjamin Harvey, Ph.D.

Benjamin Harvey, Ph.D.
Founder and CEOFounder & CEO, AI Squared

Sean Owen

Sean Owen
Principal Specialist for Data Science and ML, Databricks

Judging Criteria

  • Creativity
    Is this a new and original idea, or has this been done before?
  • Relevance
    How have you combined relevant and interesting datasets and tools?
  • Thoroughness
    Is your application easy for the end user to understand? Does it provide relevant and insightful information?
  • Quality of submission
    How well-written and organized are your description, video explanation, and any provided visual presentation?

Questions? Email the hackathon manager

Invite others to compete

Hackathon sponsors

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.