Job ID: JOB_ID_9537
Job Description:
We are looking for a talented Software Developer/Engineer with mid-level experience to join our team. This is a hybrid role, requiring onsite presence 3 days a week in Philadelphia, PA. The primary focus will be on implementing on-premise Large Language Models (LLMs) and Vector Databases.
The ideal candidate will have hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral/Mixtral in on-premise or private environments. Strong proficiency in Python for LLM inference, prompt engineering, and integration is essential. Experience with CPU-based inference, model quantization, and performance tuning is also required.
Candidates should possess practical experience with open-source vector databases like Qdrant, Chroma, Milvus, or pgvector, and have proven implementation of Retrieval-Augmented Generation (RAG) pipelines. Experience generating and managing embeddings and metadata filtering is expected.
A strong understanding of data privacy, air-gapped deployments, and enterprise security requirements is necessary, along with experience implementing access controls and audit logging.
Consultant Requirements:
- Hands-on experience deploying open-source LLMs (e.g., Meta Llama 3, Mistral/Mixtral) in on-prem or private environments.
- Strong proficiency in Python for LLM inference, prompt engineering, and integration.
- Experience with CPU-based inference, model quantization, and performance tuning.
- Practical experience with open-source vector databases (e.g., Qdrant, Chroma, Milvus, pgvector).
- Proven implementation of Retrieval-Augmented Generation (RAG) pipelines.
- Experience generating and managing embeddings and metadata filtering.
- Understanding of data privacy, air-gapped deployments, and enterprise security requirements.
- Experience implementing access controls and audit logging.
Nice to Have:
- Experience with LangChain or LlamaIndex.
- Exposure to Rust, Go, or C++ for high-performance services.
- Familiarity with Docker and Kubernetes for on-prem deployments.
- Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers).
- Prior work in regulated or enterprise environments.
Deliverables:
- Reference architecture and deployment guidance.
- Working prototype (LLM + vector DB + RAG).
- Documentation and knowledge transfer to internal teams.
Note: This is a hybrid role requiring onsite work 3 days a week in Philadelphia, PA. Candidates must be local (within 40-50 miles) and NOT from New Jersey.
Special Requirements
Interview on site – Need local candidates within 40-50 Miles. NO NJ candidates. Keywords: cplusplus database golang New Jersey Pennsylvania
Compensation & Location
Salary: $110,000 – $150,000 per year
Location: Philadelphia, PA
Recruiter / Company – Contact Information
Email: t.kumar@vyzeinc.com
Recruiter Notice:
To remove this job posting, please send an email from
t.kumar@vyzeinc.com with the subject:
DELETE_JOB_ID_9537