NEWPosted 4 hours ago

Job ID: JOB_ID_9537

Job Description:

We are looking for a talented Software Developer/Engineer with mid-level experience to join our team. This is a hybrid role, requiring onsite presence 3 days a week in Philadelphia, PA. The primary focus will be on implementing on-premise Large Language Models (LLMs) and Vector Databases.

The ideal candidate will have hands-on experience deploying open-source LLMs such as Meta Llama 3 and Mistral/Mixtral in on-premise or private environments. Strong proficiency in Python for LLM inference, prompt engineering, and integration is essential. Experience with CPU-based inference, model quantization, and performance tuning is also required.

Candidates should possess practical experience with open-source vector databases like Qdrant, Chroma, Milvus, or pgvector, and have proven implementation of Retrieval-Augmented Generation (RAG) pipelines. Experience generating and managing embeddings and metadata filtering is expected.

A strong understanding of data privacy, air-gapped deployments, and enterprise security requirements is necessary, along with experience implementing access controls and audit logging.

Consultant Requirements:

  • Hands-on experience deploying open-source LLMs (e.g., Meta Llama 3, Mistral/Mixtral) in on-prem or private environments.
  • Strong proficiency in Python for LLM inference, prompt engineering, and integration.
  • Experience with CPU-based inference, model quantization, and performance tuning.
  • Practical experience with open-source vector databases (e.g., Qdrant, Chroma, Milvus, pgvector).
  • Proven implementation of Retrieval-Augmented Generation (RAG) pipelines.
  • Experience generating and managing embeddings and metadata filtering.
  • Understanding of data privacy, air-gapped deployments, and enterprise security requirements.
  • Experience implementing access controls and audit logging.

Nice to Have:

  • Experience with LangChain or LlamaIndex.
  • Exposure to Rust, Go, or C++ for high-performance services.
  • Familiarity with Docker and Kubernetes for on-prem deployments.
  • Knowledge of inference frameworks (e.g., vLLM, llama.cpp, Hugging Face Transformers).
  • Prior work in regulated or enterprise environments.

Deliverables:

  • Reference architecture and deployment guidance.
  • Working prototype (LLM + vector DB + RAG).
  • Documentation and knowledge transfer to internal teams.

Note: This is a hybrid role requiring onsite work 3 days a week in Philadelphia, PA. Candidates must be local (within 40-50 miles) and NOT from New Jersey.


Special Requirements

Interview on site – Need local candidates within 40-50 Miles. NO NJ candidates. Keywords: cplusplus database golang New Jersey Pennsylvania


Compensation & Location

Salary: $110,000 – $150,000 per year

Location: Philadelphia, PA


Recruiter / Company – Contact Information

Email: t.kumar@vyzeinc.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
t.kumar@vyzeinc.com with the subject:

DELETE_JOB_ID_9537

to delete@join-this.com.