Trust & Safety

Effective: March 2026

1. Our Commitment

Dyva is a platform where people create, share, and interact with AI characters. That only works if people feel safe using it. Our Trust & Safety program exists to make sure they do.

We take a proactive approach -- identifying risks before they become incidents, responding quickly when something goes wrong, and continuously improving our systems as the platform scales. We combine automated detection, human review, and clear policies that we actually enforce.

This page covers how we moderate content, the specific safety measures we have built for AI, how to report problems, what happens when we take action, and how we protect vulnerable users. For specific content rules, see our Acceptable Use Policy.

2. Content Moderation

We use a layered moderation system. No single method catches everything, so we stack multiple approaches:

Automated detection. Machine learning classifiers and rule-based filters run in real time across text, images, voice outputs, and character configurations. These systems flag content that may violate our Acceptable Use Policy. They are continuously retrained to address new abuse patterns and reduce false positives.
Human review. Automated flags and user reports are escalated to our Trust & Safety team. Trained reviewers evaluate content against our policies, considering context, intent, and potential harm. A human makes the final call on every enforcement action.
Marketplace pre-review. All Marketplace characters are reviewed before they go live. We evaluate configs, descriptions, sample interactions, and behavioral boundaries for compliance. Characters that do not meet standards are rejected with feedback -- creators can fix and resubmit.
Ongoing monitoring. Moderation does not stop at publication. We continuously monitor published characters, conversations, feed posts, and community spaces for emerging violations. Patterns that evade initial detection get caught through ongoing analysis.

3. AI Safety Measures

Running an AI platform means dealing with risks that traditional content platforms do not face. We have built specific defenses for the unique challenges AI creates:

Guardrails. Every AI character on Dyva operates within platform-level safety boundaries that cannot be overridden by creators or users. These guardrails prevent characters from generating CSAM, credible threats, instructions for illegal activities, and other prohibited content -- regardless of how the character is configured or prompted.
Prompt injection defense. Multiple layers protect against prompt injection attacks: input sanitization, system prompt isolation, context boundary enforcement, and adversarial testing. These prevent users from manipulating characters into violating their constraints or extracting system-level information.
Content filtering. AI responses pass through output filters that detect and block prohibited content in real time. This applies to text, voice, image generation, and all other output modalities. Filters are updated continuously as new evasion techniques emerge.
Rate limiting and abuse detection. We limit message volume, character creation, and API requests to prevent automated abuse. Limits are adjusted dynamically based on usage patterns. Anomalous behavior triggers additional scrutiny.
Character boundaries. Creators can configure interaction styles and topics for their characters, but platform safety rules always take precedence. No creator configuration can override our prohibitions. Platform safety is the ceiling, not the floor.
Red-teaming. Our security team regularly runs adversarial testing against our safety systems -- simulating attack patterns, jailbreak attempts, and novel exploitation vectors. Vulnerabilities found during red-team exercises are patched before they can be exploited in the wild.

4. User Reporting

Our systems catch a lot, but we rely on the community to catch what we miss. Every report is reviewed.

In-app reporting. Every character page, Marketplace listing, profile, chat interface, and feed post has a report button. In-app reports go directly to our Trust & Safety queue with the full context attached -- no need to screenshot or explain where you found it.
Email reporting. For situations that need more context, detailed evidence, or descriptions of behavior patterns across multiple interactions, send a report to [email protected].
What we investigate. Every report is assigned to a Trust & Safety team member who evaluates the content against our policies, considers context, and determines appropriate action. We investigate reported content, the account behind it, and whether the behavior is part of a larger pattern.
Response timelines. Reports are prioritized by severity. Imminent threats, child exploitation, and emergencies are reviewed within 24 hours. All other reports are reviewed within 72 hours. High volume may extend standard timelines, but critical reports always get priority treatment.
Confidentiality. Reports are confidential. We do not reveal reporters to the reported party except where law requires it. We do not tolerate retaliation against good-faith reporters.

5. Enforcement Actions

When we confirm a violation, we take action proportionate to the severity, the user's history, and the potential for ongoing harm:

Warnings. A formal notice documenting the violation and the expected corrective behavior. Warnings are permanently recorded and factor into future enforcement decisions.
Content removal. Violating content is removed or access is disabled. You are notified of what was removed and which policy it violated. For time-sensitive violations, content may be removed before notification.
Temporary suspension. Account access suspended for 24 hours to 30 days depending on severity. During suspension, you cannot access your account, interact with characters, use the API, or publish content.
Permanent ban. Account permanently terminated and all content removed. Reserved for the most serious violations: child safety offenses, credible threats of violence, and repeated violations after prior enforcement. Banned users may not create new accounts.
Law enforcement referral. For illegal content or activity -- CSAM, credible threats of violence, terrorism, or other criminal conduct -- we report to law enforcement and cooperate fully with investigations. We preserve and produce user data in response to valid legal process.

We do not need to warn you before acting, especially when safety or legal obligations are involved. The severity of the violation determines the response, not how many chances you have had.

6. Appeal Process

We get it right most of the time, but not every time. If you believe an enforcement action was made in error, you can appeal.

How to appeal. Email [email protected] within 30 days of the action. Include your username, the action you are contesting, any reference numbers from the notification, and a clear explanation of why you believe it was incorrect.
Review process. Your appeal is reviewed by a senior Trust & Safety team member who was not involved in the original decision. We acknowledge receipt within 2 business days and issue a decision within 5 business days. Complex cases may take longer -- we will keep you informed.
One appeal per action. Each enforcement action may be appealed once. The appeal decision is final. We send the outcome and our reasoning to your account email.

7. Protecting Minors

Protecting minors is not a feature -- it is a foundation. We take active steps to prevent child exploitation and comply with all applicable child safety laws.

Age verification. You must be at least 13 years old to create a Dyva account (or older if required by your jurisdiction). We verify age during registration and may request additional verification at any time. Accounts with misrepresented ages are immediately terminated.
COPPA compliance. We comply with the Children's Online Privacy Protection Act and do not knowingly collect personal information from children under 13 without verifiable parental consent. If we discover such data, we immediately delete it and terminate the account. Parents who believe their child has provided information without consent should contact [email protected].
Restricted content. Certain content categories are restricted to users who meet age thresholds above the platform minimum. We enforce these restrictions through age-gating and content classification systems.
CSAM response. Any content that sexually exploits or endangers minors is immediately removed, the account is terminated, and the material is reported to NCMEC and law enforcement. This is a zero-tolerance policy with no exceptions and no appeals.
Reporting child safety issues. If you encounter anything on Dyva that exploits, endangers, or harms a minor, report it immediately via the in-app report button or email [email protected] with the subject line "Child Safety Report." These reports receive the highest priority and are reviewed within 24 hours.

8. Transparency

Trust requires transparency. We believe you should be able to see how we enforce our policies and hold us accountable for doing it fairly.

Transparency reports. We publish periodic transparency reports with aggregate data on content moderation, enforcement actions, and report volumes. These are available on our website.
What we report on. Reports include enforcement breakdowns by category (violence, hate speech, harassment, child safety, spam, IP violations, AI-specific violations), total reports received and actioned, average response times, proactive detections, appeals received, and reversal rates.
How we communicate actions. When we take enforcement action against your account, we tell you what happened, which policy was violated, and what you can do about it. We do not leave you guessing. The only exception is when disclosing details would compromise an ongoing investigation or create safety risks.

9. Contact

For safety questions, concerns, or reports:

Trust & Safety Team

Dyva, Inc.

Email: [email protected]

For general legal inquiries, contact [email protected]. For copyright issues, see our DMCA & Copyright Policy. For appeals, email [email protected].

Helpful?

Cookie Policy

Creator Agreement