Identifying the ideal customer profile (ICP) for sales and marketing often feels like finding a needle in a haystack. Typical marketing automation tools provide vast datasets, but their metadata isn't always accurate, and the characteristics don't allow you to precisely describe unique ICPs, especially for small to mid-sized businesses (SMBs) that go after niche markets with limited resources. Pre-configured filters or static categorizations won't do for these use cases, instead a dynamic, data-driven approach is needed for effective targeting.
The Challenge: Difficult to Identify ICP with Standard Tools
Take a typical scenario for a mid-sized business with a small sales team (3-5 people) targeting niche markets. Standard marketing tools might deliver 100,000 potential leads, but sifting through this firehose of data is both overwhelming and ineffective. Worse yet, the most valuable leads often don’t fit neatly into predefined categories or filters. Without clear markers, sales teams rely on guesswork and costly trial-and-error approaches, such as pursuing cold calls with uncertain outcomes.
Even worse, the data itself can be noisy. For instance, a prospect might look promising based on broad characteristics but prove irrelevant after a lengthy sales conversation. Alternatively, the ICP may be so unique that it evolves over time, further complicating targeting efforts. Traditional tools lack the agility to adapt in such dynamic environments.
New Approach: Embeddings and Iterative Targeting
Embeddings, a type of representation derived from machine learning, offer a radically different way to approach lead and customer data. Instead of relying on static attributes, embeddings encode nuanced relationships between data points into a high-dimensional space. This enables the system to uncover patterns and clusters invisible to traditional filters or SQL queries.
Here’s how this can work in practice:
- Starting with Small Batches: Users label a handful of examples—positive and negative leads—based on preliminary research. Embeddings are used to model these examples in a semantic space, and the system identifies similar profiles that merit further labeling.
- Iterative Refinement: As more labels are added, the model improves. When labels are scarce, the system prioritizes ambiguous examples—those near the decision boundary—for labeling. This approach, inspired by active learning, minimizes effort while maximizing the model’s ability to discern fit from non-fit.
- Dynamic Learning: When labeling can’t happen upfront—such as when outcomes depend on sales calls—the system applies something similar to reinforcement learning. Over time, as sales teams gather results (e.g., which prospects converted), the system incorporates this feedback to refine its targeting continuously.
Hands-on Example
To illustrate this process, we used the Customer Personality Analysis dataset from Kaggle, a dataset designed to help businesses understand their ideal customers by analyzing spending behaviors and campaign responses (Link to dataset, about 2,000 samples, courtesy Dr. Omar Remero-Hernandez).
Step 1: Training an Initial Model
We divided the dataset into two halves. The first half trained an initial predictive model designed to identify customers likely to engage with a marketing campaign. Below screenshot from the Featrix model UI shows the performance of the initial model, where 85% accuracy seems fairly decent, but less than 39% recall is really problematic - the model fails to prioritize a lot of good leads.
Step 2: Simulating Real-World Feedback
The second half of the dataset was treated as "new data," simulating real-world scenarios where customer engagement outcomes become available over time. When applied to this data, the initial model misclassified 19 of the 169 customers who ultimately engaged with the campaign. Simulating how a lead scoring model could be used in practice, we treated these 19 misclassified samples as feedback from sales interactions.
Step 3: Incorporating Feedback and Retraining
To illustrate what iterative refinement can accomplish, we added these misclassified samples to the training set - with corrected labels - and retrained the model. The results were telling. The model’s recall—its ability to correctly identify engaged customers—jumped from 39% to over 50%, while precision remained high - which is critical for ensuring that the predicted leads were high quality. This improvement, driven by minimal additional labeling, demonstrates the power of iterative refinement in improving lead scoring accuracy.
Why this approach outperforms standard tools
Traditional tools like ZoomInfo or Klaviyo rely on pre-built filters and attributes, limiting their adaptability. In contrast, embeddings enable discovery of nuanced customer segments that can’t be captured through basic characteristics like location or predicted spend. For example, a startup targeting Shopify merchants recently used this approach in a proof-of-concept. By analyzing their customer data, embeddings revealed meaningful clusters that manual filters could never identify, significantly improving their ability to target high-potential leads (learn more about this case study).
Moreover, the iterative process ensures that the system learns and improves over time, even in noisy or sparse environments. For instance, when applied to outbound campaigns, the system reduced the time needed to secure a meeting from over 20 hours to under 8—delivering clear, business-impacting results.
Conclusion: A New Era of Marketing Automation
Embeddings empower businesses to move beyond rigid categorizations and embrace a dynamic, data-driven way of identifying and refining their ICP. By iteratively learning from sparse, noisy data, this approach offers a powerful alternative to traditional tools, making lead generation more precise and efficient.
Whether you’re sifting through a spreadsheet of 100,000 rows or iterating your sales strategy one call at a time, embeddings bring clarity and focus to the chaos, ensuring your efforts drive real business value
Where to go from here?
- Check out Featrix Smart Start, our new offering that gets you up an running within 30 days
- Learn about Featrix Haystack that lets you build your own targeting model
As always, we appreciate your feedback and questions!. You can send them to hello@featrix.ai!