Job ID: JOB_ID_5228
Job Summary:
We are looking for a highly specialized FPGA & Trading Stack Specialist with extensive experience in high-performance and trading systems. This onsite role requires a deep understanding of C++17/20, Linux kernel internals, and FPGA development to optimize trading systems for nanosecond-level latency and determinism. You will be responsible for designing, coding, and optimizing market-data handlers, order routing engines, and pre-trade risk services. The role involves building lock-free, wait-free, and cache-aligned software components, developing exchange protocol stacks, and engineering bare-metal, deterministic Linux environments. A significant part of the job includes performing kernel, driver, and interrupt-path optimization, implementing CPU isolation, NUMA locality strategies, and architecting kernel-bypass networking stacks. You will also design and develop FPGA-accelerated feed handlers and order gateways, collaborating closely with traders, quants, and exchange operations teams to meet stringent performance requirements.
Key Responsibilities:
- Design, code, and optimize high-performance C++17/20 trading systems, including market-data handlers, order routing engines, and pre-trade risk services.
- Build lock-free, wait-free, cache-aligned software components and custom memory allocators.
- Develop exchange protocol stacks (NASDAQ, NYSE, CME, ICE, OPRA) and high-throughput feed normalization pipelines.
- Deliver measurable improvements in tick-to-trade latency, tail latency, and throughput.
- Engineer bare-metal, deterministic Linux environments optimized for real-time trading workloads.
- Perform kernel, driver, and interrupt-path optimization, including IRQ routing, RCU tuning, scheduler tuning, and context-switch minimization.
- Implement CPU isolation, NUMA locality strategies, cache-coherent layouts, and huge-page memory architectures.
- Produce stable, low-jitter execution profiles across trading systems.
- Architect and implement kernel-bypass networking stacks using DPDK, Mellanox VMA, Solarflare OpenOnload.
- Develop RDMA-enabled and multicast market-data pipelines.
- Tune NIC firmware, DMA paths, PCIe configurations, and network queues.
- Build and maintain exchange connectivity platforms and colocation-optimized data paths.
- Design and develop FPGA-accelerated feed handlers, order gateways, and packet-filtering engines.
- Implement ultra-low-latency pipelines using Xilinx UltraScale+/Versal or Intel Stratix/Agilex platforms.
- Collaborate on hardware/software co-design, including PCIe, DMA, HBM, and SmartNIC architectures.
- Deliver nanosecond-scale latency improvements through hardware offload.
- Engineer deterministic trading platforms where timing, jitter, and physical constraints are first-class design inputs.
- Design systems accounting for cache behavior, memory latency, bus contention, and hardware clocks.
- Apply PTP / IEEE-1588 synchronization, hardware timestamping, and rdtsc-based measurement frameworks.
- Build and maintain nanosecond-resolution profiling, tracing, and telemetry tooling.
- Use perf, eBPF, ftrace, flame graphs, and hardware counters to isolate latency.
- Drive continuous reduction of variance, tail latency, and execution jitter.
- Work directly with traders, quants, and exchange operations teams to support strategy requirements.
- Optimize platform behavior for market-data ingestion, order flow, and pre-trade risk controls.
- Support production environments with rapid latency triage and optimization cycles.
Required Technical Expertise:
- 15+ years in high-performance or trading systems.
- Prior experience in HFT, exchanges, or market-data firms.
- Demonstrated history of nanosecond-level optimization.
- Deep coding background with hardware adjacency.
- Comfortable debugging production systems under live trading conditions.
- Modern C++17/20 (lock-free, cache-aligned, zero-copy architectures).
- Linux kernel internals (scheduler, IRQs, RCU, huge pages).
- CPU pinning, NUMA engineering, cache topology optimization.
- rdtsc/tsc synchronization, PTP / IEEE-1588.
- Kernel bypass: DPDK, Solarflare OpenOnload, Mellanox VMA.
- RDMA (RoCE, iWARP).
- Multicast market-data optimization.
- Custom TCP/UDP stacks.
- NIC firmware tuning.
- Exchange connectivity stacks.
- Xilinx UltraScale+, Alveo, Versal.
- Intel Stratix, Agilex.
- Vivado, Quartus, ModelSim, Questa.
- Verilog / SystemVerilog / VHDL.
- PCIe, DMA, HBM, on-NIC processing.
- FPGA feed handlers, order gateways, packet filtering.
- Real-time Linux.
- BIOS tuning.
- PCIe lane configuration.
- SR-IOV.
- HugeTLB, transparent huge pages.
- CPU microarchitecture tuning.
- perf, ftrace, flame graphs, eBPF.
- Hardware timestamping.
- Nanosecond-level profiling.
- Jitter elimination.
- Deterministic system design.
- Exchange protocols: NASDAQ, NYSE, CME, ICE, OPRA.
- Market-data normalization.
- Order routing engines.
- Pre-trade risk systems.
- Tick-to-trade optimization.
- Microwave / millimeter-wave trading networks.
- GPS-disciplined clocks.
- Custom NIC firmware.
- Co-location data-center optimization.
- Bare-metal Kubernetes for HFT.
- P4 programmable networking.
- SmartNIC development.
- ASIC prototyping.
Keywords:
information technology, California, Colorado, Pennsylvania.
Compensation & Location
Salary: $80 – $110 per year (Estimated)
Location: Bala Cynwyd, PA
Recruiter / Company – Contact Information
Email: usjobs@nvoids.com
Recruiter Notice:
To remove this job posting, please send an email from
usjobs@nvoids.com with the subject:
DELETE_JOB_ID_5228