NEWPosted 5 hours ago

Job ID: JOB_ID_5228

Job Summary:

We are looking for a highly specialized FPGA & Trading Stack Specialist with extensive experience in high-performance and trading systems. This onsite role requires a deep understanding of C++17/20, Linux kernel internals, and FPGA development to optimize trading systems for nanosecond-level latency and determinism. You will be responsible for designing, coding, and optimizing market-data handlers, order routing engines, and pre-trade risk services. The role involves building lock-free, wait-free, and cache-aligned software components, developing exchange protocol stacks, and engineering bare-metal, deterministic Linux environments. A significant part of the job includes performing kernel, driver, and interrupt-path optimization, implementing CPU isolation, NUMA locality strategies, and architecting kernel-bypass networking stacks. You will also design and develop FPGA-accelerated feed handlers and order gateways, collaborating closely with traders, quants, and exchange operations teams to meet stringent performance requirements.

Key Responsibilities:

  • Design, code, and optimize high-performance C++17/20 trading systems, including market-data handlers, order routing engines, and pre-trade risk services.
  • Build lock-free, wait-free, cache-aligned software components and custom memory allocators.
  • Develop exchange protocol stacks (NASDAQ, NYSE, CME, ICE, OPRA) and high-throughput feed normalization pipelines.
  • Deliver measurable improvements in tick-to-trade latency, tail latency, and throughput.
  • Engineer bare-metal, deterministic Linux environments optimized for real-time trading workloads.
  • Perform kernel, driver, and interrupt-path optimization, including IRQ routing, RCU tuning, scheduler tuning, and context-switch minimization.
  • Implement CPU isolation, NUMA locality strategies, cache-coherent layouts, and huge-page memory architectures.
  • Produce stable, low-jitter execution profiles across trading systems.
  • Architect and implement kernel-bypass networking stacks using DPDK, Mellanox VMA, Solarflare OpenOnload.
  • Develop RDMA-enabled and multicast market-data pipelines.
  • Tune NIC firmware, DMA paths, PCIe configurations, and network queues.
  • Build and maintain exchange connectivity platforms and colocation-optimized data paths.
  • Design and develop FPGA-accelerated feed handlers, order gateways, and packet-filtering engines.
  • Implement ultra-low-latency pipelines using Xilinx UltraScale+/Versal or Intel Stratix/Agilex platforms.
  • Collaborate on hardware/software co-design, including PCIe, DMA, HBM, and SmartNIC architectures.
  • Deliver nanosecond-scale latency improvements through hardware offload.
  • Engineer deterministic trading platforms where timing, jitter, and physical constraints are first-class design inputs.
  • Design systems accounting for cache behavior, memory latency, bus contention, and hardware clocks.
  • Apply PTP / IEEE-1588 synchronization, hardware timestamping, and rdtsc-based measurement frameworks.
  • Build and maintain nanosecond-resolution profiling, tracing, and telemetry tooling.
  • Use perf, eBPF, ftrace, flame graphs, and hardware counters to isolate latency.
  • Drive continuous reduction of variance, tail latency, and execution jitter.
  • Work directly with traders, quants, and exchange operations teams to support strategy requirements.
  • Optimize platform behavior for market-data ingestion, order flow, and pre-trade risk controls.
  • Support production environments with rapid latency triage and optimization cycles.

Required Technical Expertise:

  • 15+ years in high-performance or trading systems.
  • Prior experience in HFT, exchanges, or market-data firms.
  • Demonstrated history of nanosecond-level optimization.
  • Deep coding background with hardware adjacency.
  • Comfortable debugging production systems under live trading conditions.
  • Modern C++17/20 (lock-free, cache-aligned, zero-copy architectures).
  • Linux kernel internals (scheduler, IRQs, RCU, huge pages).
  • CPU pinning, NUMA engineering, cache topology optimization.
  • rdtsc/tsc synchronization, PTP / IEEE-1588.
  • Kernel bypass: DPDK, Solarflare OpenOnload, Mellanox VMA.
  • RDMA (RoCE, iWARP).
  • Multicast market-data optimization.
  • Custom TCP/UDP stacks.
  • NIC firmware tuning.
  • Exchange connectivity stacks.
  • Xilinx UltraScale+, Alveo, Versal.
  • Intel Stratix, Agilex.
  • Vivado, Quartus, ModelSim, Questa.
  • Verilog / SystemVerilog / VHDL.
  • PCIe, DMA, HBM, on-NIC processing.
  • FPGA feed handlers, order gateways, packet filtering.
  • Real-time Linux.
  • BIOS tuning.
  • PCIe lane configuration.
  • SR-IOV.
  • HugeTLB, transparent huge pages.
  • CPU microarchitecture tuning.
  • perf, ftrace, flame graphs, eBPF.
  • Hardware timestamping.
  • Nanosecond-level profiling.
  • Jitter elimination.
  • Deterministic system design.
  • Exchange protocols: NASDAQ, NYSE, CME, ICE, OPRA.
  • Market-data normalization.
  • Order routing engines.
  • Pre-trade risk systems.
  • Tick-to-trade optimization.
  • Microwave / millimeter-wave trading networks.
  • GPS-disciplined clocks.
  • Custom NIC firmware.
  • Co-location data-center optimization.
  • Bare-metal Kubernetes for HFT.
  • P4 programmable networking.
  • SmartNIC development.
  • ASIC prototyping.

Keywords:

information technology, California, Colorado, Pennsylvania.


Compensation & Location

Salary: $80 – $110 per year (Estimated)

Location: Bala Cynwyd, PA


Recruiter / Company – Contact Information

Email: usjobs@nvoids.com


Interested in this position?
Apply via Email

Recruiter Notice:
To remove this job posting, please send an email from
usjobs@nvoids.com with the subject:

DELETE_JOB_ID_5228

to delete@join-this.com.