Join forces with data scientists, engineers, and analysts. As a pre-cursor to the upcoming Data + AI summit, we invite you to create unique and novel applications, use cases, and/or techniques to showcase open-source LLM models (e.g., OpenAssistant, MPT, Dolly, etc.) and/or Spark Connect. While we can't get enough of Dolly, we also suggest you check out LangChain, PandasAI, and vector databases.
Three finalist teams and five honorable mention awards will be selected and announced during the Data+ AI Summit 2023 keynote, and you will have the opportunity to win a cash prize to be split among your team.
Get started
We encourage you to create your own novel application or use case using the following resources:
LLM Learning Resources:
• Hugging Face Open Assistant:
▹ https://open-assistant.io/
▹ https://huggingface.co/OpenAssistant
• Mosaic ML MPT-7B:
▹ https://www.mosaicml.com/blog/mpt-7b
▹ https://github.com/mosaicml/llm-foundry
▹ mosaicml/mpt-7b · Hugging Face
• Databricks Lab Dolly GitHub repo: https://github.com/databrickslabs/dolly
▹ Hugging Face > Databricks
▹ Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM
▹ Hello Dolly: Democratizing the magic of ChatGPT with open models
Spark Connect Learning Resources:
• Reference documentation
▹ Spark Connect Overview
▹ Spark Connect Quick Start
• Blogs
▹ Introducing Spark Connect - The Power of Apache Spark, Everywhere
▹ Spark Connect Available in Apache Spark 3.4
Potential Project Ideas:
• Work with parsing audio transcriptions such as OpenAI's Whisper
• Create AI Agents with compute and search capabilities (LangChain is a great place to work on these kinds of tools)
• Build a QA bot with vector databases leveraging similarity searching
• Tune a DLite or Dolly model using Databricks.
The judges will assess your projects' performance with a standard battery of diverse prompts while accounting for quantitative metrics like latency.
To get started, you can use the following example notebooks as your guide:
• Build your Chat Bot with Dolly
• AI Functions: query LLM with DBSQL
We encourage you to use open-source models and datasets such as (but not limited to):
• Dolly 15K dataset
• Red Pajama dataset
• OpenAssistant Conversations dataset (OASST1)
• LongForm dataset
• Alpaca Libra dataset
• Eleuther.AI datasets
• Fun beginner-friendly datasets on Kaggle
• Hugging Face instruct_me dataset (highly rated, general purpose open-source, Apache v2)
Additional Guidance
If you are building LLM applications, we’d recommend tools like LangChain, Pandas AI, and vector databases
Requirements
Step 1. BUILD YOUR TEAM. Your team can have up to 4 participants and must be registered here on Devpost as participating in the hackathon.
Step 2. Create a new application using an open-source large language model (LLM) like OpenAssistant, MPT, Dolly, or others, or create a new one using Spark Connect. Your application must have been created after May 18th, and all work must be completed during the hackathon timeframe. Create a compelling project that showcases open LLM models in new and useful way. It is preferred, but not required, to showcase these use cases within a Databricks or Jupyter notebook.
Step 3. Record a video screencast (<= 280 seconds) demonstrating the application, providing commentary answering the following questions:
- Why did you choose this topic?
- Which open-source LLM and any additional open-source datasets did you use? Explain why.
- Or what is your Spark Connect application and any open-source datasets did you use? Explain why.
- How does your project provide relevant and insightful information to the end user?
Step 4. Complete your submission on Devpost before 5 PM PT on June 16th. This includes a project description with the following:
- Your hosted video
- A URL to your application open-source source code. Your GitHub repository should not have any contributions before May 18th and include an open-source license.
- Invite all of your teammates and make sure they accept it.
Prizes
$20,000 in prizes
Grand Prize Winning Team
• $10,000 USD
• Project Highlight at Data + AI Summit
2nd Place Winning Team
• $5,000 USD
• Project Highlight at Data + AI Summit
3rd Place Winning Team
• $2,500 USD
• Project Highlight at Data + AI Summit
Honorable Mentions
(5)
• $500 USD
Devpost Achievements
Submitting to this hackathon could earn you:
Judges

Mike Conover
Staff Software Engineer, Databricks

Stefania Leone
Sr. Manager, Product Management, Databricks

Martin Grund
Senior Staff Software Engineer, Databricks

Conor B. Murphy
Sr. Data Science Manager, Databricks

Benjamin Harvey, Ph.D.
Founder and CEOFounder & CEO, AI Squared

Sean Owen
Principal Specialist for Data Science and ML, Databricks
Judging Criteria
-
Creativity
Is this a new and original idea, or has this been done before? -
Relevance
How have you combined relevant and interesting datasets and tools? -
Thoroughness
Is your application easy for the end user to understand? Does it provide relevant and insightful information? -
Quality of submission
How well-written and organized are your description, video explanation, and any provided visual presentation?
Questions? Email the hackathon manager
Invite others to compete
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.