# WIA-PET-010: Pet Translator Specification v2.0

**Status:** Current  
**Release Date:** 2025-Q1  
**Previous Version:** v1.2

## Abstract

WIA-PET-010 v2.0 defines an advanced standardized framework for bi-directional pet-human communication using state-of-the-art AI/ML technologies. This major revision introduces emotion detection, context-aware translation, multi-species support expansion, and enhanced privacy controls.

## 1. Scope

### 1.1 Supported Species

- **Canines:** Dogs (all recognized breeds)
- **Felines:** Cats (domestic and specific breeds)
- **Avians:** Parrots, parakeets, cockatiels, canaries, songbirds
- **Lagomorphs:** Rabbits, guinea pigs
- **Rodents:** Hamsters, mice, rats (including ultrasonic vocalizations)

### 1.2 Communication Modalities

- Acoustic vocalizations (20 Hz - 192 kHz)
- Visual body language and facial expressions
- Physiological signals (heart rate, temperature)
- Spatial behavior and movement patterns
- Temporal patterns and behavioral sequences

## 2. Architecture Overview

```
┌─────────────────┐
│  Sensor Layer   │ (Microphones, Cameras, Wearables)
└────────┬────────┘
         │
┌────────▼────────┐
│ Preprocessing   │ (Noise Reduction, Normalization)
└────────┬────────┘
         │
┌────────▼────────┐
│ Feature Extract │ (Audio: MFCCs, Visual: Pose)
└────────┬────────┘
         │
┌────────▼────────┐
│ AI/ML Inference │ (Classification, Emotion, Intent)
└────────┬────────┘
         │
┌────────▼────────┐
│ Context Fusion  │ (Temporal, Environmental, Historical)
└────────┬────────┘
         │
┌────────▼────────┐
│ NLG Translation │ (Multi-language Output)
└────────┬────────┘
         │
┌────────▼────────┐
│  API Gateway    │ (REST, WebSocket, GraphQL)
└─────────────────┘
```

## 3. Enhanced Data Format

### 3.1 Comprehensive Input Event

```json
{
  "eventId": "evt_2025_abc123",
  "timestamp": "2025-01-15T14:30:45.123Z",
  "petProfile": {
    "petId": "pet_golden_max",
    "species": "dog",
    "breed": "Golden Retriever",
    "age": 36,
    "sex": "male",
    "neutered": true,
    "healthConditions": ["hip dysplasia"],
    "personalityTraits": ["friendly", "energetic"]
  },
  "modalities": {
    "audio": {
      "sampleRate": 192000,
      "channels": 2,
      "bitDepth": 24,
      "duration": 2500,
      "format": "wav",
      "encoding": "base64",
      "data": "...",
      "metadata": {
        "deviceId": "mic_001",
        "location": "living_room",
        "backgroundNoise": 45.3
      }
    },
    "video": {
      "width": 1920,
      "height": 1080,
      "fps": 60,
      "duration": 2500,
      "format": "h265",
      "encoding": "base64",
      "frames": "...",
      "metadata": {
        "deviceId": "cam_002",
        "lightingCondition": "natural",
        "cameraAngle": "front"
      }
    },
    "sensors": {
      "heartRate": 95,
      "temperature": 38.5,
      "activityLevel": 0.7,
      "location": {"lat": 37.5665, "lon": 126.9780}
    }
  },
  "context": {
    "temporal": {
      "timeOfDay": "afternoon",
      "dayOfWeek": "monday",
      "isHoliday": false,
      "timeSinceLastMeal": 7200,
      "timeSinceLastWalk": 3600
    },
    "environmental": {
      "location": "home_living_room",
      "temperature": 22.5,
      "humidity": 55,
      "weather": "sunny",
      "noiseLevel": 45,
      "presentIndividuals": ["owner_john", "pet_cat_luna"]
    },
    "historical": {
      "recentActivities": [
        {"activity": "play", "timestamp": "2025-01-15T13:00:00Z"},
        {"activity": "meal", "timestamp": "2025-01-15T12:30:00Z"}
      ],
      "behavioralBaseline": {
        "typicalVocalizationsPerHour": 5.2,
        "averageArousalLevel": 0.6
      }
    }
  }
}
```

### 3.2 Rich Translation Output

```json
{
  "translationId": "trans_2025_xyz789",
  "eventId": "evt_2025_abc123",
  "timestamp": "2025-01-15T14:30:45.456Z",
  "processingTime": 247,
  "translations": {
    "en": "I hear something unusual outside. Should we check it out?",
    "ko": "밖에서 이상한 소리가 들려요. 확인하러 가볼까요?",
    "es": "Escucho algo inusual afuera. ¿Deberíamos comprobarlo?",
    "ja": "外で変な音が聞こえます。確認しに行きましょうか？"
  },
  "emotion": {
    "primary": "alert",
    "secondary": ["curious", "slightly_anxious"],
    "confidence": 0.943,
    "dimensional": {
      "valence": 0.2,
      "arousal": 0.75,
      "dominance": 0.6
    },
    "timeline": [
      {"emotion": "calm", "timestamp": -2000},
      {"emotion": "alert", "timestamp": 0}
    ]
  },
  "intent": {
    "category": "alerting",
    "specific": "unusual_sound_detected",
    "confidence": 0.89,
    "urgency": "medium",
    "requiresResponse": true
  },
  "behaviors": {
    "detected": ["ear_forward", "tail_mid_wag", "head_tilt"],
    "posture": "standing_alert",
    "gaze": "toward_window"
  },
  "confidence": {
    "overall": 0.921,
    "modality": {
      "audio": 0.95,
      "video": 0.88,
      "context": 0.93
    }
  },
  "alternatives": [
    {
      "translation": "Someone is at the door!",
      "confidence": 0.15
    }
  ],
  "insights": {
    "comparedToBaseline": "Higher than typical arousal",
    "predictedNextBehavior": "approach_window",
    "recommendedResponse": "Investigate stimulus together"
  }
}
```

## 4. Advanced Processing Pipeline

### 4.1 Multi-Modal Feature Extraction

#### Audio Features
- **Spectral:** MFCCs (20 coefficients), mel-spectrograms (128 bins)
- **Prosodic:** F0 contours, intensity, speaking rate
- **Temporal:** Duration, rhythm, pauses
- **Quality:** Jitter, shimmer, harmonic-to-noise ratio

#### Visual Features
- **Pose:** 25 body keypoints, 3D spatial coordinates
- **Facial:** 15 facial landmarks, action units
- **Motion:** Optical flow, trajectory analysis
- **Scene:** Object detection, spatial relationships

#### Sensor Features
- **Physiological:** Heart rate variability, temperature trends
- **Activity:** Acceleration patterns, gait analysis  
- **Location:** Indoor positioning, proximity to objects

### 4.2 Deep Learning Models

#### Vocalization Classifier
- **Architecture:** Transformer encoder with convolutional front-end
- **Input:** Log-mel spectrogram (128×variable)
- **Output:** 50 vocalization types per species
- **Accuracy:** 94.7% (v2.0 benchmark)
- **Latency:** <100ms on GPU

#### Emotion Detector
- **Architecture:** Multi-modal fusion network
- **Modalities:** Audio CNN + Video 3D-CNN + Context LSTM
- **Output:** 10 primary emotions + dimensional scores
- **Accuracy:** 91.3% (v2.0 benchmark)
- **Latency:** <150ms on GPU

#### Intent Predictor  
- **Architecture:** BERT-style transformer with cross-attention
- **Input:** Feature embeddings + context encoding
- **Output:** 100 intent categories
- **Accuracy:** 88.5% (v2.0 benchmark)
- **Latency:** <200ms on GPU

### 4.3 Context-Aware Translation

```python
# Pseudo-code for context integration
def translate(event, pet_profile, context):
    # Extract features from all modalities
    audio_features = extract_audio_features(event.audio)
    video_features = extract_video_features(event.video)
    sensor_features = extract_sensor_features(event.sensors)
    
    # Classify base communication signals
    vocalization = classify_vocalization(audio_features)
    behavior = recognize_behavior(video_features)
    
    # Detect emotion with multi-modal fusion
    emotion = detect_emotion(
        audio_features,
        video_features,
        sensor_features
    )
    
    # Integrate contextual information
    context_encoding = encode_context(
        context.temporal,
        context.environmental,
        context.historical,
        pet_profile
    )
    
    # Predict intent with full context
    intent = predict_intent(
        vocalization,
        behavior,
        emotion,
        context_encoding
    )
    
    # Generate natural language translation
    translation = generate_translation(
        intent,
        emotion,
        context,
        target_languages=['en', 'ko', 'es', 'ja']
    )
    
    return translation
```

## 5. API Specification v2

### 5.1 REST API

**Base URL:** `https://api.wia-pet.org/v2`

#### POST /v2/translate

Real-time translation with full context.

**Headers:**
```
Authorization: Bearer {api_key}
Content-Type: application/json
X-Request-ID: {uuid}
```

**Request:** See Section 3.1

**Response:** See Section 3.2

**Rate Limits:**
- Free tier: 1000 requests/day
- Pro tier: 100000 requests/day
- Enterprise: Unlimited

#### GET /v2/pets/{petId}/emotions

Emotion timeline for analysis.

**Response:**
```json
{
  "petId": "pet_golden_max",
  "timeRange": {
    "start": "2025-01-15T00:00:00Z",
    "end": "2025-01-15T23:59:59Z"
  },
  "timeline": [
    {
      "timestamp": "2025-01-15T08:00:00Z",
      "emotion": "content",
      "arousal": 0.3,
      "valence": 0.8
    }
  ],
  "summary": {
    "averageValence": 0.65,
    "stressEpisodes": 2,
    "peakHappiness": "2025-01-15T18:00:00Z"
  }
}
```

### 5.2 WebSocket Protocol v2

**URL:** `wss://api.wia-pet.org/v2/stream`

Supports streaming audio/video for continuous translation.

**Connection:**
```json
{
  "type": "connect",
  "apiKey": "...",
  "petId": "pet_golden_max",
  "streamConfig": {
    "audio": true,
    "video": true,
    "sensors": false
  }
}
```

**Streaming Data:**
```json
{
  "type": "stream_chunk",
  "sequenceId": 1234,
  "audio": "base64...",
  "video": "base64..."
}
```

**Translation Events:**
```json
{
  "type": "translation",
  "sequenceId": 1234,
  "translation": {...}
}
```

### 5.3 GraphQL API (New in v2.0)

**Endpoint:** `https://api.wia-pet.org/v2/graphql`

```graphql
query GetPetTranslations($petId: ID!, $startDate: DateTime!) {
  pet(id: $petId) {
    id
    name
    species
    translations(startDate: $startDate) {
      edges {
        node {
          id
          timestamp
          text
          emotion {
            primary
            confidence
          }
          intent
        }
      }
    }
    emotionTimeline(startDate: $startDate) {
      timestamp
      emotion
      valence
      arousal
    }
  }
}
```

## 6. Performance Requirements

| Metric | Target | v2.0 Actual |
|--------|--------|-------------|
| Audio-only latency | <200ms | 156ms |
| Audio+video latency | <500ms | 387ms |
| Full pipeline latency | <1000ms | 742ms |
| Vocalization accuracy | >90% | 94.7% |
| Emotion detection accuracy | >85% | 91.3% |
| Intent prediction accuracy | >80% | 88.5% |
| API uptime | 99.9% | 99.94% |
| Concurrent streams | 10000 | 15000 |

## 7. Security & Privacy

### 7.1 Encryption
- TLS 1.3 for all network communication
- AES-256-GCM for data at rest
- End-to-end encryption option for sensitive data

### 7.2 Authentication
- OAuth 2.0 / OpenID Connect
- API key rotation every 90 days
- Multi-factor authentication for administrative access

### 7.3 Privacy Controls
- **Local Processing Mode:** All processing on-device, no cloud upload
- **Data Retention:** Configurable (1 day to permanent)
- **Right to Deletion:** User can delete all data
- **Anonymization:** PII removed from training datasets
- **Consent Management:** Granular permissions per data type

### 7.4 Compliance
- GDPR compliant
- CCPA compliant
- SOC 2 Type II certified
- ISO 27001 certified

## 8. Supported Languages (v2.0)

**Tier 1 (Full Support):**  
English, Korean, Spanish, French, German, Japanese, Chinese (Simplified)

**Tier 2 (Good Support):**  
Italian, Portuguese, Russian, Arabic, Hindi, Dutch

**Tier 3 (Basic Support):**  
40+ additional languages

## 9. Integration Examples

### 9.1 TypeScript SDK

```typescript
import { PetTranslator, Species } from '@wia/pet-translator';

const translator = new PetTranslator({
  apiKey: process.env.WIA_API_KEY,
  websocketUrl: 'wss://api.wia-pet.org/v2/stream'
});

// Subscribe to pet
await translator.subscribe({
  petId: 'pet_golden_max',
  species: Species.Dog,
  breed: 'Golden Retriever'
});

// Listen for translations
translator.on('translation', (result) => {
  console.log(`Translation: ${result.text}`);
  console.log(`Emotion: ${result.emotion.primary} (${result.emotion.confidence})`);
  console.log(`Intent: ${result.intent}`);
});

// Process audio stream
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
translator.processAudioStream(stream);
```

### 9.2 Python SDK

```python
from wia_pet_translator import PetTranslator, Species

translator = PetTranslator(
    api_key=os.environ['WIA_API_KEY'],
    websocket_url='wss://api.wia-pet.org/v2/stream'
)

# Subscribe
translator.subscribe(
    pet_id='pet_golden_max',
    species=Species.DOG,
    breed='Golden Retriever'
)

# Event handler
@translator.on('translation')
def handle_translation(result):
    print(f"Translation: {result.text}")
    print(f"Emotion: {result.emotion.primary}")

# Process audio file
translator.process_audio_file('recording.wav')
```

## 10. Conformance

Systems claiming conformance to WIA-PET-010 v2.0 MUST:

1. Support all data formats in Section 3
2. Implement at least 2 communication modalities
3. Achieve minimum accuracy on benchmark dataset
4. Provide REST API as specified in Section 5.1
5. Meet latency requirements in Section 6
6. Implement security controls in Section 7

Optional features:
- WebSocket streaming
- GraphQL API
- Local processing mode
- Multi-language output (>3 languages)

## 11. Migration from v1.x

### Breaking Changes
- Event schema restructured (Section 3.1)
- API endpoint paths changed (/v1 → /v2)
- Authentication now requires OAuth 2.0

### Backward Compatibility
- v1 API available at `/v1` endpoints (deprecated, EOL 2026-Q1)
- v1 to v2 conversion library available
- Migration guide: https://docs.wia-pet.org/migration/v1-to-v2

## Appendix A: Benchmark Dataset

**Dataset:** WIA-PET-BENCH-2.0  
**URL:** https://data.wia-pet.org/v2.0/benchmark

**Contents:**
- 5M labeled pet communication events
- 50K individual pets
- 15 species
- 100+ breeds
- Ground truth by 500+ expert annotators

**Evaluation Splits:**
- Training: 4M events (80%)
- Validation: 500K events (10%)
- Test: 500K events (10%)

## Appendix B: Model Cards

Detailed model cards available at: https://models.wia-pet.org/v2.0/

- Vocalization Classifier v2.0
- Emotion Detector v2.0
- Intent Predictor v2.0
- Body Language Analyzer v2.0

Each model card includes:
- Architecture details
- Training methodology
- Performance metrics
- Limitations and biases
- Intended use cases

---

© 2025 SmileStory Inc. / WIA  
弘益人間 (홍익인간) · Benefit All Humanity

**Version History:**
- v2.0 (2025-Q1): Major revision with multi-modal support
- v1.2 (2024-Q4): Minor improvements
- v1.1 (2024-Q3): Bug fixes
- v1.0 (2024-Q1): Initial release