Join forces with data scientists, engineers, and analysts. As a pre-cursor to the upcoming Data + AI summit, we invite you to create unique and novel applications, use cases, and/or techniques to showcase open-source LLM models (e.g., OpenAssistant, MPT, Dolly, etc.) and/or Spark Connect. While we can't get enough of Dolly, we also suggest you check out LangChain, PandasAI, and vector databases.

Three finalist teams and five honorable mention awards will be selected and announced during the Data+ AI Summit 2023 keynote, and you will have the opportunity to win a cash prize to be split among your team.

Get started

We encourage you to create your own novel application or use case using the following resources:

LLM Learning Resources:

• Hugging Face Open Assistant:
▹ https://open-assistant.io/
▹ https://huggingface.co/OpenAssistant
• Mosaic ML MPT-7B:
▹ https://www.mosaicml.com/blog/mpt-7b
▹ https://github.com/mosaicml/llm-foundry
▹ mosaicml/mpt-7b · Hugging Face
• Databricks Lab Dolly GitHub repo: https://github.com/databrickslabs/dolly
▹ Hugging Face > Databricks
▹ Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
▹ Hello Dolly: Democratizing the magic of ChatGPT with open models

Spark Connect Learning Resources:

• Reference documentation
▹ Spark Connect Overview
▹ Spark Connect Quick Start
• Blogs
▹ Introducing Spark Connect - The Power of Apache Spark, Everywhere
▹ Spark Connect Available in Apache Spark 3.4

Potential Project Ideas:

• Work with parsing audio transcriptions such as OpenAI's Whisper
• Create AI Agents with compute and search capabilities (LangChain is a great place to work on these kinds of tools)
• Build a QA bot with vector databases leveraging similarity searching
• Tune a DLite or Dolly model using Databricks.

The judges will assess your projects' performance with a standard battery of diverse prompts while accounting for quantitative metrics like latency.

To get started, you can use the following example notebooks as your guide:

• Build your Chat Bot with Dolly
• AI Functions: query LLM with DBSQL

We encourage you to use open-source models and datasets such as (but not limited to):
• Dolly 15K dataset
• Red Pajama dataset
• OpenAssistant Conversations dataset (OASST1)
• LongForm dataset
• Alpaca Libra dataset
• Eleuther.AI datasets
• Fun beginner-friendly datasets on Kaggle
• Hugging Face instruct_me dataset (highly rated, general purpose open-source, Apache v2)
• SlimPajama-627B dataset

Additional Guidance

If you are building LLM applications, we’d recommend tools like LangChain, Pandas AI, and vector databases

Requirements

Step 1. BUILD YOUR TEAM. Your team can have up to 4 participants and must be registered here on Devpost as participating in the hackathon.

Step 2. Create a new application using an open-source large language model (LLM) like OpenAssistant, MPT, Dolly, or others, or create a new one using Spark Connect. Your application must have been created after May 18th, and all work must be completed during the hackathon timeframe. Create a compelling project that showcases open LLM models in new and useful way. It is preferred, but not required, to showcase these use cases within a Databricks or Jupyter notebook.

Step 3. Record a video screencast (<= 280 seconds) demonstrating the application, providing commentary answering the following questions:

Why did you choose this topic?
Which open-source LLM and any additional open-source datasets did you use? Explain why.
Or what is your Spark Connect application and any open-source datasets did you use? Explain why.
How does your project provide relevant and insightful information to the end user?

Step 4. Complete your submission on Devpost before 11:45 PM PT on June 18th. This includes a project description with the following:

Your hosted video
A URL to your application open-source source code. Your GitHub repository should not have any contributions before May 18th and include an open-source license.
Invite all of your teammates and make sure they accept it.

Hackathon Sponsors

Prizes

$20,000 in prizes

Grand Prize Winning Team

1 winner

• $10,000 USD
• Project Highlight at Data + AI Summit

2nd Place Winning Team

1 winner

• $5,000 USD
• Project Highlight at Data + AI Summit

3rd Place Winning Team

1 winner

• $2,500 USD
• Project Highlight at Data + AI Summit

Honorable Mentions

5 winners

• $500 USD

Judges

Mike Conover
Staff Software Engineer, Databricks

Stefania Leone
Sr. Manager, Product Management, Databricks

Martin Grund
Senior Staff Software Engineer, Databricks

Conor B. Murphy
Sr. Data Science Manager, Databricks

Benjamin Harvey, Ph.D.
Founder and CEO, AI Squared

Sean Owen
Principal Specialist for Data Science and ML, Databricks

Jan van der Vegt
Staff Product Manager

Judging Criteria

Creativity
Is this a new and original idea, or has this been done before?
Relevance
How have you combined relevant and interesting datasets and tools?
Thoroughness
Is your application easy for the end user to understand? Does it provide relevant and insightful information?
Quality of submission
How well-written and organized are your description, video explanation, and any provided visual presentation?

Online	Public
$20,000 in prizes	766 participants

So you think you can hack

Join us for a hackathon to showcase open-source LLMs (e.g., OpenAssistant, MPT, Dolly, etc.) and or Spark Connect.

Who can participate

Get started

LLM Learning Resources:

Spark Connect Learning Resources:

Potential Project Ideas:

Requirements

Hackathon Sponsors

Prizes

Grand Prize Winning Team

2nd Place Winning Team

3rd Place Winning Team

Honorable Mentions

Devpost Achievements

Judges

Judging Criteria

Hackathon sponsors